%0 Journal Article %@ 2564-1891 %I JMIR Publications %V 5 %N %P e57455 %T Infodemic Versus Viral Information Spread: Key Differences and Open Challenges %A Cinelli,Matteo %A Gesualdo,Francesco %K infodemic %K information spreading %K infodemiology %K misinformation %K artificial intelligence %K information virality %K public health %K multidisciplinary %K data science %K AI %K difference %K challenge %D 2025 %7 7.5.2025 %9 %J JMIR Infodemiology %G English %X As we move beyond the COVID-19 pandemic, the risk of future infodemics remains significant, driven by emerging health crises and the increasing influence of artificial intelligence in the information ecosystem. During periods of apparent stability, proactive efforts to advance infodemiology are essential for enhancing preparedness and improving public health outcomes. This requires a thorough examination of the foundations of this evolving discipline, particularly in understanding how to accurately identify an infodemic at the appropriate time and scale, and how to distinguish it from other processes of viral information spread, both within and outside the realm of public health. In this paper, we integrate expertise from data science and public health to examine the key differences between information production during an infodemic and viral information spread. We explore both clear and subtle distinctions, including context and contingency (ie, the association of an infodemic and viral information spread with a health crisis); information dynamics in terms of volume, spread, and predictability; the role of misinformation and information voids; societal impact; and mitigation strategies. By analyzing these differences, we highlight challenges and open questions. These include whether an infodemic is solely associated with pandemics or whether it could arise from other health emergencies; if infodemics are limited to health-related issues or if they could emerge from crises initially unrelated to health (like climate events); and whether infodemics are exclusively global phenomena or if they can occur on national or local scales. Finally, we propose directions for future quantitative research to help the scientific community more robustly differentiate between these phenomena and develop tailored management strategies. %R 10.2196/57455 %U https://infodemiology.jmir.org/2025/1/e57455 %U https://doi.org/10.2196/57455 %0 Journal Article %@ 2562-7600 %I JMIR Publications %V 8 %N %P e74345 %T 2024: A Year of Nursing Informatics Research in Review %A Borycki,Elizabeth %K nursing informatics %K health informatics %K research %K practice %K education %K trends %K artificial intelligence %K data science %D 2025 %7 7.5.2025 %9 %J JMIR Nursing %G English %X Each year, nursing informatics researchers contribute to nursing and health informatics knowledge. The year 2024 emerged as yet another year of significant advances. In this editorial, I describe and highlight some of the key trends in nursing informatics research as published in JMIR Nursing in 2024. Artificial intelligence (AI), data science, mobile health (mHealth), and the integration of technology into nursing education and practice remain key research themes in the literature. Nursing informatics publications continue to grow in number. A greater number of AI and data science articles are being published, while at the same time, mHealth and technology research continues to be conducted in nursing education and practice contexts. %R 10.2196/74345 %U https://nursing.jmir.org/2025/1/e74345 %U https://doi.org/10.2196/74345 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67830 %T Comparing Artificial Intelligence–Generated and Clinician-Created Personalized Self-Management Guidance for Patients With Knee Osteoarthritis: Blinded Observational Study %A Du,Kai %A Li,Ao %A Zuo,Qi-Heng %A Zhang,Chen-Yu %A Guo,Ren %A Chen,Ping %A Du,Wei-Shuai %A Li,Shu-Ming %+ , Beijing Hospital of Traditional Chinese Medicine, 23 Meishuguan Houjie, Dongcheng District, Beijing, 100010, China, 86 13810986862, lishuming@bjzhongyi.com %K artificial intelligence in health care %K large language models %K knee osteoarthritis %K self-management %K personalized medicine %K patient education %K artificial intelligence %K AI-generated %K knee %K osteoarthritis %K observational study %K GPT-4 %K ChatGPT %K LLMs %K orthopedics %D 2025 %7 7.5.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Knee osteoarthritis is a prevalent, chronic musculoskeletal disorder that impairs mobility and quality of life. Personalized patient education aims to improve self-management and adherence; yet, its delivery is often limited by time constraints, clinician workload, and the heterogeneity of patient needs. Recent advances in large language models offer potential solutions. GPT-4 (OpenAI), distinguished by its long-context reasoning and adoption in clinical artificial intelligence research, emerged as a leading candidate for personalized health communication. However, its application in generating condition-specific educational guidance remains underexplored, and concerns about misinformation, personalization limits, and ethical oversight remain. Objective: We evaluated GPT-4’s ability to generate individualized self-management guidance for patients with knee osteoarthritis in comparison with clinician-created content. Methods: This 2-phase, double-blind, observational study used data from 50 patients previously enrolled in a registered randomized trial. In phase 1, 2 orthopedic clinicians each generated personalized education materials for 25 patient profiles using anonymized clinical data, including history, symptoms, and lifestyle. In phase 2, the same datasets were processed by GPT-4 using standardized prompts. All content was anonymized and evaluated by 2 independent, blinded clinical experts using validated scoring systems. Evaluation criteria included efficiency, readability (Flesch-Kincaid, Gunning Fog, Coleman-Liau, and Simple Measure of Gobbledygook), accuracy, personalization, and comprehensiveness and safety. Disagreements between reviewers were resolved through consensus or third-party adjudication. Results: GPT-4 outperformed clinicians in content generation speed (530.03 vs 37.29 words per min, P<.001). Readability was better on the Flesch-Kincaid (mean 11.56, SD 1.08 vs mean 12.67 SD 0.95), Gunning Fog (mean 12.47, SD 1.36 vs mean 14.56, SD 0.93), and Simple Measure of Gobbledygook (mean 13.33, SD 1.00 vs mean 13.81 SD 0.69) indices (all P<.001), though GPT-4 scored slightly higher on the Coleman-Liau Index (mean 15.90, SD 1.03 vs mean 15.15, SD 0.91). GPT-4 also outperformed clinicians in accuracy (mean 5.31, SD 1.73 vs mean 4.76, SD 1.10; P=.05, personalization (mean 54.32, SD 6.21 vs mean 33.20, SD 5.40; P<.001), comprehensiveness (mean 51.74, SD 6.47 vs mean 35.26, SD 6.66; P<.001), and safety (median 61, IQR 58-66 vs median 50, IQR 47-55.25; P<.001). Conclusions: GPT-4 could generate personalized self-management guidance for knee osteoarthritis with greater efficiency, accuracy, personalization, comprehensiveness, and safety than clinician-generated content, as assessed using standardized, guideline-aligned evaluation frameworks. These findings underscore the potential of large language models to support scalable, high-quality patient education in chronic disease management. The observed lexical complexity suggests the need to refine outputs for populations with limited health literacy. As an exploratory, single-center study, these results warrant confirmation in larger, multicenter cohorts with diverse demographic profiles. Future implementation should be guided by ethical and operational safeguards, including data privacy, transparency, and the delineation of clinical responsibility. Hybrid models integrating artificial intelligence–generated content with clinician oversight may offer a pragmatic path forward. %M 40332991 %R 10.2196/67830 %U https://www.jmir.org/2025/1/e67830 %U https://doi.org/10.2196/67830 %U http://www.ncbi.nlm.nih.gov/pubmed/40332991 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e65028 %T Forecasting Subjective Cognitive Decline: AI Approach Using Dynamic Bayesian Networks %A Etholén,Antti %A Roos,Teemu %A Hänninen,Mirja %A Bouri,Ioanna %A Kulmala,Jenni %A Rahkonen,Ossi %A Kouvonen,Anne %A Lallukka,Tea %+ Department of Public Health, University of Helsinki, PO Box 20, Tukholmankatu 8 B, Helsinki, 00014, Finland, 358 445105010, antti.etholen@helsinki.fi %K artificial intelligence %K AI %K dementia %K aging %K smoking %K alcohol consumption %K leisure time physical activity %K consumption of fruit and vegetables %K body mass index %K BMI %K insomnia symptoms %D 2025 %7 6.5.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Several potentially modifiable risk factors are associated with subjective cognitive decline (SCD). However, developmental patterns of these risk factors have not been used before to forecast later SCD. Practical tools for the prevention of cognitive decline are needed. Objective: We examined multifactorial trajectories of risk factors and their associations with SCD using an artificial intelligence (AI) approach to build a score calculator that forecasts later SCD. In addition, we aimed to develop a new risk score tool to facilitate personalized risk assessment and intervention planning and to validate SCD against register-based dementia diagnoses and dementia-related medications. Methods: Five repeated surveys (2000-2022) of the Helsinki Health Study (N=8960; n=7168, 80% women, aged 40-60 years in phase 1) were used to build dynamic Bayesian networks for estimating the odds of SCD. The model structure was developed using expert knowledge and automated techniques, implementing a score-based approach for training dynamic Bayesian networks with the quotient normalized maximum likelihood criterion. The developed model was used to predict SCD (memory, learning, and concentration) based on the history of consumption of fruit and vegetables, smoking, alcohol consumption, leisure time physical activity, BMI, and insomnia symptoms, adjusting for sociodemographic covariates. Model performance was assessed using 5-fold cross-validation to calculate the area under the receiver operating characteristic curve. Bayesian credible intervals were used to quantify uncertainty in model estimates. Results: Of the participants, 1842 of 5865 (31%) reported a decline in memory, 2818 of 5879 (47.4%) in learning abilities, and 1828 of 5888 (30.7%) in concentration in 2022. Physical activity was the strongest predictor of SCD in a 5-year interval, with an odds ratio of 0.76 (95% Bayesian credible interval 0.59-0.99) for physically active compared to inactive participants. Alcohol consumption showed a U-shaped relationship with SCD. Other risk factors had minor effects. Moreover, our validation confirmed that SCD has prognostic value for diagnosed dementia, with individuals reporting memory decline being over 3 times more likely to have dementia in 2017 (age 57-77 years), and this risk increased to more than 5 times by 2022 (age 62-82 years). The receiver operating characteristic curve analysis further supported the predictive validity of our outcome, with an area under the curve of 0.78 in 2017 and 0.75 in 2022. Conclusions: A new risk score tool was developed that enables individuals to inspect their risk profiles and explore potential targets for interventions and their estimated contributions to later SCD. Using AI-driven predictive modeling, the tool can aid health care professionals in providing personalized prevention strategies. A dynamic decision heatmap was presented as a communication tool to be used at health care consultations. Our findings suggest that early identification of individuals with SCD could improve targeted intervention strategies for reducing dementia risk. Future research should explore the integration of AI-based risk prediction models into clinical workflows and assess their effectiveness in guiding lifestyle interventions to mitigate SCD and dementia. %M 40327854 %R 10.2196/65028 %U https://www.jmir.org/2025/1/e65028 %U https://doi.org/10.2196/65028 %U http://www.ncbi.nlm.nih.gov/pubmed/40327854 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e70545 %T Diagnosis of Sarcopenia Using Convolutional Neural Network Models Based on Muscle Ultrasound Images: Prospective Multicenter Study %A Chen,Zi-Tong %A Li,Xiao-Long %A Jin,Feng-Shan %A Shi,Yi-Lei %A Zhang,Lei %A Yin,Hao-Hao %A Zhu,Yu-Li %A Tang,Xin-Yi %A Lin,Xi-Yuan %A Lu,Bei-Lei %A Wang,Qun %A Sun,Li-Ping %A Zhu,Xiao-Xiang %A Qiu,Li %A Xu,Hui-Xiong %A Guo,Le-Hang %+ Department of Medical Ultrasound, Shanghai Tenth People’s Hospital, 301 Yanchang Road, Jing 'an District, Shanghai, 200072, China, 86 13764538305, gopp1314@hotmail.com %K ultrasound %K sarcopenia %K artificial intelligence %K convolutional neural network %K multicenter study %D 2025 %7 6.5.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Early detection is clinically crucial for the strategic handling of sarcopenia, yet the screening process, which includes assessments of muscle mass, strength, and function, remains complex and difficult to access. Objective: This study aims to develop a convolutional neural network model based on ultrasound images to simplify the diagnostic process and promote its accessibility. Methods: This study prospectively evaluated 357 participants (101 with sarcopenia and 256 without sarcopenia) for training, encompassing three types of data: muscle ultrasound images, clinical information, and laboratory information. Three monomodal models based on each data type were developed in the training cohort. The data type with the best diagnostic performance was selected to develop the bimodal and multimodal model by adding another one or two data types. Subsequently, the diagnostic performance of the above models was compared. The contribution ratios of different data types were further analyzed for the multimodal model. A sensitivity analysis was performed by excluding 86 cases with missing values and retaining 271 complete cases for robustness validation. By comprehensive comparison, we finally identified the optimal model (SARCO model) as the convenient solution. Moreover, the SARCO model underwent an external validation with 145 participants (68 with sarcopenia and 77 without sarcopenia) and a proof-of-concept validation with 82 participants (19 with sarcopenia and 63 without sarcopenia) from two other hospitals. Results: The monomodal model based on ultrasound images achieved the highest area under the receiver operator characteristic curve (AUC) of 0.827 and F1-score of 0.738 among the three monomodal models. Sensitivity analysis on complete data further confirmed the superiority of the ultrasound images model (AUC: 0.851; F1-score: 0.698). The performance of the multimodal model demonstrated statistical differences compared to the best monomodal model (AUC: 0.845 vs 0.827; P=.02) as well as the two bimodal models based on ultrasound images+clinical information (AUC: 0.845 vs 0.826; P=.03) and ultrasound images+laboratory information (AUC: 0.845 vs 0.832, P=0.035). On the other hand, ultrasound images contributed the most evidence for diagnosing sarcopenia (0.787) and nonsarcopenia (0.823) in the multimodal models. Sensitivity analysis showed consistent performance trends, with ultrasound images remaining the dominant contributor (Shapley additive explanation values: 0.810 for sarcopenia and 0.795 for nonsarcopenia). After comprehensive clinical analysis, the monomodal model based on ultrasound images was identified as the SARCO model. Subsequently, the SARCO model achieved satisfactory prediction performance in the external validation and proof-of-concept validation, with AUCs of 0.801 and 0.757 and F1-scores of 0.727 and 0.666, respectively. Conclusions: All three types of data contributed to sarcopenia diagnosis, while ultrasound images played a dominant role in model decision-making. The SARCO model based on ultrasound images is potentially the most convenient solution for diagnosing sarcopenia. Trial Registration: Chinese Clinical Trial Registry ChiCTR2300073651; https://www.chictr.org.cn/showproj.html?proj=199199 %R 10.2196/70545 %U https://www.jmir.org/2025/1/e70545 %U https://doi.org/10.2196/70545 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e65272 %T Wearable Artificial Intelligence for Sleep Disorders: Scoping Review %A Aziz,Sarah %A A M Ali,Amal %A Aslam,Hania %A A Abd-alrazaq,Alaa %A AlSaad,Rawan %A Alajlani,Mohannad %A Ahmad,Reham %A Khalil,Laila %A Ahmed,Arfan %A Sheikh,Javaid %+ AI Center for Precision Health, Weill Cornell Medicine-Qatar, Education City, Street 2700, Doha, Qatar, 974 44928827, saa4038@qatar-med.cornell.edu %K sleep disorders %K wearable devices %K artificial intelligence %K machine learning %K scoping review %D 2025 %7 6.5.2025 %9 Review %J J Med Internet Res %G English %X Background: Worldwide, 30%-45% of adults have sleep disorders, which are linked to major health issues such as diabetes and cardiovascular disease. Long-term monitoring with traditional in-lab testing is impractical due to high costs. Wearable artificial intelligence (AI)–powered solutions offer accessible, scalable, and continuous monitoring, improving the identification and treatment of sleep problems. Objective: This scoping review aims to provide an overview of AI-powered wearable devices used for sleep disorders, focusing on study characteristics, wearable technology features, and AI methodologies for detection and analysis. Methods: Seven electronic databases (MEDLINE, PsycINFO, Embase, IEEE Xplore, ACM Digital Library, Google Scholar, and Scopus) were searched for peer-reviewed literature published before March 2024. Keywords were selected based on 3 domains: sleep disorders, AI, and wearable devices. The primary selection criterion was the inclusion of studies that utilized AI algorithms to detect or predict various sleep disorders using data from wearable devices. Study selection was conducted in 2 steps: first, by reviewing titles and abstracts, followed by full-text screening. Two reviewers independently conducted study selection and data extraction, resolving discrepancies by consensus. The extracted data were synthesized using a narrative approach. Results: The initial search yielded 615 articles, of which 46 met the eligibility criteria and were included in the final analysis. The majority of studies focused on sleep apnea. Wearable AI was widely deployed for diagnosing and screening disorders; however, none of the studies used it for treatment. Commercial devices were the most commonly used type of wearable technology, appearing in 30 out of 46 (65%) studies. Among these, various brands were utilized rather than a single large, well-known brand; 19 (41%) studies used wrist-worn devices. Respiratory data were used by 25 of 46 (54%) studies as the primary data for model development, followed by heart rate (22/46, 48%) and body movement (17/46, 37%). The most popular algorithm was the convolutional neural network, adopted by 17 of 46 (37%) studies, followed by random forest (14/46, 30%) and support vector machines (12/46, 26%). Conclusions: Wearable AI technology offers promising solutions for sleep disorders. These devices can be used for screening and diagnosis; however, research on wearable technology for sleep disorders other than sleep apnea remains limited. To statistically synthesize performance and efficacy results, more reviews are needed. Technology companies should prioritize advancements such as deep learning algorithms and invest in wearable AI for treating sleep disorders, given its potential. Further research is necessary to validate machine learning techniques using clinical data from wearable devices and to develop useful analytics for data collection, monitoring, prediction, classification, and recommendation in the context of sleep disorders. %M 40327852 %R 10.2196/65272 %U https://www.jmir.org/2025/1/e65272 %U https://doi.org/10.2196/65272 %U http://www.ncbi.nlm.nih.gov/pubmed/40327852 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e66556 %T Code Error in “Diagnostic Classification and Prognostic Prediction Using Common Genetic Variants in Autism Spectrum Disorder: Genotype-Based Deep Learning” %A Miller,Catriona %A Portlock,Theo %A Nyaga,Denis M %A Gamble,Greg D %A O'Sullivan,Justin M %K autism prediction %K machine learning %K data leakage %D 2025 %7 6.5.2025 %9 %J JMIR Med Inform %G English %X %R 10.2196/66556 %U https://medinform.jmir.org/2025/1/e66556 %U https://doi.org/10.2196/66556 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e69284 %T The Applications of Large Language Models in Mental Health: Scoping Review %A Jin,Yu %A Liu,Jiayi %A Li,Pan %A Wang,Baosen %A Yan,Yangxinyu %A Zhang,Huilin %A Ni,Chenhao %A Wang,Jing %A Li,Yi %A Bu,Yajun %A Wang,Yuanyuan %+ School of Psychology, Center for Studies of Psychological Application, and Guangdong Key Laboratory of Mental Health and Cognitive Science, Key Laboratory of Brain, Cognition and Education Sciences, Ministry of Education, South China Normal University, Room 219, Floor 2, School of Psychology, Guangzhou, Guangdong, 510660, China, 86 13076729124, angelayuanyuanwang@gmail.com %K mental health %K large language models %K application %K process %K performance %K comparison %D 2025 %7 5.5.2025 %9 Review %J J Med Internet Res %G English %X Background: Mental health is emerging as an increasingly prevalent public issue globally. There is an urgent need in mental health for efficient detection methods, effective treatments, affordable privacy-focused health care solutions, and increased access to specialized psychiatrists. The emergence and rapid development of large language models (LLMs) have shown the potential to address these mental health demands. However, a comprehensive review summarizing the application areas, processes, and performance comparisons of LLMs in mental health has been lacking until now. Objective: This review aimed to summarize the applications of LLMs in mental health, including trends, application areas, performance comparisons, challenges, and prospective future directions. Methods: A scoping review was conducted to map the landscape of LLMs’ applications in mental health, including trends, application areas, comparative performance, and future trajectories. We searched 7 electronic databases, including Web of Science, PubMed, Cochrane Library, IEEE Xplore, Weipu, CNKI, and Wanfang, from January 1, 2019, to August 31, 2024. Studies eligible for inclusion were peer-reviewed articles focused on LLMs’ applications in mental health. Studies were excluded if they (1) were not peer-reviewed or did not focus on mental health or mental disorders or (2) did not use LLMs; studies that used only natural language processing or long short-term memory models were also excluded. Relevant information on application details and performance metrics was extracted during the data charting of eligible articles. Results: A total of 95 articles were drawn from 4859 studies using LLMs for mental health tasks. The applications were categorized into 3 key areas: screening or detection of mental disorders (67/95, 71%), supporting clinical treatments and interventions (31/95, 33%), and assisting in mental health counseling and education (11/95, 12%). Most studies used LLMs for depression detection and classification (33/95, 35%), clinical treatment support and intervention (14/95, 15%), and suicide risk prediction (12/95, 13%). Compared with nontransformer models and humans, LLMs demonstrate higher capabilities in information acquisition and analysis and efficiently generating natural language responses. Various series of LLMs also have different advantages and disadvantages in addressing mental health tasks. Conclusions: This scoping review synthesizes the applications, processes, performance, and challenges of LLMs in the mental health field. These findings highlight the substantial potential of LLMs to augment mental health research, diagnostics, and intervention strategies, underscoring the imperative for ongoing development and ethical deliberation in clinical settings. %M 40324177 %R 10.2196/69284 %U https://www.jmir.org/2025/1/e69284 %U https://doi.org/10.2196/69284 %U http://www.ncbi.nlm.nih.gov/pubmed/40324177 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 14 %N %P e66336 %T Innovative, Technology-Driven, Digital Tools for Managing Pediatric Urinary Incontinence: Scoping Review %A Bladt,Lola %A Vermeulen,Jiri %A Vermandel,Alexandra %A De Win,Gunter %A Van Campenhout,Lukas %+ Department of Product Development, Faculty of Design Sciences, University of Antwerp, Paardenmarkt 90/94, Antwerp, 2000, Belgium, 32 497848014, lola.bladt@uantwerpen.be %K pediatric urinary incontinence %K nocturnal enuresis %K behavioral therapy %K urotherapy %K patient compliance %K digital health %K serious games %K telehealth %K health technology %K enuresis alarm %K artificial intelligence %K AI %D 2025 %7 5.5.2025 %9 Review %J Interact J Med Res %G English %X Background: Urinary incontinence affects approximately 7% to 10% of children during the day and 9% to 12% of children during the night. Treatment mainly involves lifestyle advice and behavioral methods, but motivation and adherence are low. Traditional tools such as pen-and-paper solutions may feel outdated and no longer meet the needs of today’s “digital native” children. Meanwhile, digital interventions have already shown effectiveness in other pediatric health care areas. Objective: This scoping review aimed to identify and map innovative, technology-driven, digital tools for managing pediatric urinary incontinence. Methods: PubMed, Web of Science, and the Cochrane Library were searched in March 2022 without date restrictions, complemented by cross-referencing. Studies were eligible if they focused on pediatric patients (aged ≤18 years) with bladder and bowel dysfunctions and explored noninvasive, technology-based interventions such as digital health, remote monitoring, and gamification. Studies on adults, invasive treatments, and conventional methods without tangible tools were excluded. Gray literature was considered, but non–English-language, inaccessible, or result-lacking articles were excluded. A formal critical appraisal was not conducted as the focus was on mapping existing tools rather than evaluating effectiveness. Data analysis combined descriptive statistics and qualitative content analysis, categorizing tools through iterative coding and team discussions. Results: In total, 66 articles were included, with nearly one-third (21/66, 32%) focusing on nocturnal enuresis. Our analysis led to the identification of six main categories of tools: (1) digital self-management (7/66, 11%); (2) serious games (7/66, 11%); (3) reminder technology (6/66, 9%); (4) educational media (12/66, 18%), further divided into video (5/12, 42%) and other media (7/12, 58%); (5) telehealth and remote patient monitoring (13/66, 20%), with subcategories of communication (5/13, 38%) and technological advances (8/13, 62%); and (6) enuresis alarm innovations (21/66, 32%), further divided into novel configurations (8/21, 38%) and prevoid alarms (13/21, 62%). Conclusions: The field of pediatric urinary incontinence demonstrates a considerable level of innovation, as evidenced by the inclusion of 66 studies. Many tools identified in this review were described as promising and feasible alternatives to traditional methods. These tools were reported to enhance engagement, improve compliance, and increase patient satisfaction and preference while also having the potential to save time for health care providers. However, this review also identified gaps in research, highlighting the need for more rigorous research to better assess the tools’ effectiveness and address the complex, multifaceted challenges of pediatric urinary incontinence management. Limitations of this review include restricting the search to 3 databases, excluding non–English-language articles, the broad scope, and single-reviewer screening, although frequent team discussions ensured rigor. We propose that future tools should integrate connected, adaptive, and personalized approaches that align with stakeholder needs, guided by a multidisciplinary, human-centered framework combining both qualitative and quantitative insights. %M 40324170 %R 10.2196/66336 %U https://www.i-jmr.org/2025/1/e66336 %U https://doi.org/10.2196/66336 %U http://www.ncbi.nlm.nih.gov/pubmed/40324170 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e65551 %T Current Technological Advances in Dysphagia Screening: Systematic Scoping Review %A Wong,Duo Wai-Chi %A Wang,Jiao %A Cheung,Sophia Ming-Yan %A Lai,Derek Ka-Hei %A Chiu,Armstrong Tat-San %A Pu,Dai %A Cheung,James Chung-Wai %A Kwok,Timothy Chi-Yui %+ Department of Biomedical Engineering, Faculty of Engineering, Hong Kong Polytechnic University, GH137, GH Wing, 1/F, Department of Biomedical Engineering,, 11 Yuk Choi Road, Hung Hom, Kowloon, Hong Kong, 999077, China (Hong Kong), 852 27667673, james.chungwai.cheung@polyu.edu.hk %K digital health %K computer-aided diagnosis %K computational deglutition %K machine learning %K deep learning %K artificial intelligence %K AI %K swallowing disorder %K aspiration %D 2025 %7 5.5.2025 %9 Review %J J Med Internet Res %G English %X Background: Dysphagia affects more than half of older adults with dementia and is associated with a 10-fold increase in mortality. The development of accessible, objective, and reliable screening tools is crucial for early detection and management. Objective: This systematic scoping review aimed to (1) examine the current state of the art in artificial intelligence (AI) and sensor-based technologies for dysphagia screening, (2) evaluate the performance of these AI-based screening tools, and (3) assess the methodological quality and rigor of studies on AI-based dysphagia screening tools. Methods: We conducted a systematic literature search across CINAHL, Embase, PubMed, and Web of Science from inception to July 4, 2024, following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) framework. In total, 2 independent researchers conducted the search, screening, and data extraction. Eligibility criteria included original studies using sensor-based instruments with AI to identify individuals with dysphagia or unsafe swallow events. We excluded studies on pediatric, infant, or postextubation dysphagia, as well as those using non–sensor-based assessments or diagnostic tools. We used a modified Quality Assessment of Diagnostic Accuracy Studies–2 tool to assess methodological quality, adding a “model” domain for AI-specific evaluation. Data were synthesized narratively. Results: This review included 24 studies involving 2979 participants (1717 with dysphagia and 1262 controls). In total, 75% (18/24) of the studies focused solely on per-individual classification rather than per–swallow event classification. Acoustic (13/24, 54%) and vibratory (9/24, 38%) signals were the primary modality sources. In total, 25% (6/24) of the studies used multimodal approaches, whereas 75% (18/24) used a single modality. Support vector machine was the most common AI model (15/24, 62%), with deep learning approaches emerging in recent years (3/24, 12%). Performance varied widely—accuracy ranged from 71.2% to 99%, area under the receiver operating characteristic curve ranged from 0.77 to 0.977, and sensitivity ranged from 63.6% to 100%. Multimodal systems generally outperformed unimodal systems. The methodological quality assessment revealed a risk of bias, particularly in patient selection (unclear in 18/24, 75% of the studies), index test (unclear in 23/24, 96% of the studies), and modeling (high risk in 13/24, 54% of the studies). Notably, no studies conducted external validation or domain adaptation testing, raising concerns about real-world applicability. Conclusions: This review provides a comprehensive overview of technological advancements in AI and sensor-based dysphagia screening. While these developments show promise for continuous long-term tele-swallowing assessments, significant methodological limitations were identified. Future studies can explore how each modality can target specific anatomical regions and manifestations of dysphagia. This detailed understanding of how different modalities address various aspects of dysphagia can significantly benefit multimodal systems, enabling them to better handle the multifaceted nature of dysphagia conditions. %M 40324167 %R 10.2196/65551 %U https://www.jmir.org/2025/1/e65551 %U https://doi.org/10.2196/65551 %U http://www.ncbi.nlm.nih.gov/pubmed/40324167 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e69906 %T The Application Status of Radiomics-Based Machine Learning in Intrahepatic Cholangiocarcinoma: Systematic Review and Meta-Analysis %A Xu,Lan %A Chen,Zian %A Zhu,Dan %A Wang,Yingjun %+ Department of Dermatology, Quzhou Municipal Hospital of Traditional Chinese Medicine, No.117 Quhua Road, Kecheng District, Quzhou, 324000, China, 86 17826819402, 17826819402@163.com %K radiomic %K machine learning %K intrahepatic cholangiocarcinoma %K bile duct cancer %K systematic review %K meta-analysis %D 2025 %7 5.5.2025 %9 Review %J J Med Internet Res %G English %X Background: Over the past few years, radiomics for the detection of intrahepatic cholangiocarcinoma (ICC) has been extensively studied. However, systematic evidence is lacking in the use of radiomics in this domain, which hinders its further development. Objective: To address this gap, our study delved into the status quo and application value of radiomics in ICC and aimed to offer evidence-based support to promote its systematic application in this field. Methods: PubMed, Web of Science, Cochrane Library, and Embase were comprehensively retrieved to determine relevant original studies. The study quality was appraised through the Radiomics Quality Score. In addition, subgroup analyses were undertaken according to datasets (training and validation sets), imaging sources, and model types. Results: Fifty-eight studies encompassing 12,903 patients were eligible, with an average Radiomics Quality Score of 9.21. Radiomics-based machine learning (ML) was mainly used to diagnose ICC (n=30), microvascular invasion (n=8), gene mutations (n=5), perineural invasion (PNI; n=2), lymph node (LN) positivity (n=2), and tertiary lymphoid structures (TLSs; n=2), and predict overall survival (n=6) and recurrence (n=9). The C-index, sensitivity (SEN), and specificity (SPC) of the ML model developed using clinical features (CFs) for ICC detection were 0.762 (95% CI 0.728-0.796), 0.72 (95% CI 0.66-0.77), and 0.72 (95% CI 0.66-0.78), respectively, in the validation dataset. In contrast, the C-index, SEN, and SPC of the radiomics-based ML model for detecting ICC were 0.853 (95% CI 0.824-0.882), 0.80 (95% CI 0.73-0.85), and 0.88 (95% CI 0.83-0.92), respectively. The C-index, SEN, and SPC of ML constructed using both radiomics and CFs for diagnosing ICC were 0.912 (95% CI 0.889-0.935), 0.77 (95% CI 0.72-0.81), and 0.90 (95% CI 0.86-0.92). The deep learning–based model that integrated both radiomics and CFs yielded a notably higher C-index of 0.924 (0.863-0.984) in the task of detecting ICC. Additional analyses showed that radiomics demonstrated promising accuracy in predicting overall survival and recurrence, as well as in diagnosing microvascular invasion, gene mutations, PNI, LN positivity, and TLSs. Conclusions: Radiomics-based ML demonstrates excellent accuracy in the clinical diagnosis of ICC. However, studies involving specific tasks, such as diagnosing PNI and TLSs, are still scarce. The limited research on deep learning has hindered both further analysis and the development of subgroup analyses across various models. Furthermore, challenges such as data heterogeneity and interpretability caused by segmentation and imaging parameter variations require further optimization and refinement. Future research should delve into the application of radiomics to enhance its clinical use. Its integration into clinical practice holds great promise for improving decision-making, boosting diagnostic and treatment accuracy, minimizing unnecessary tests, and optimizing health care resource usage. %M 40323647 %R 10.2196/69906 %U https://www.jmir.org/2025/1/e69906 %U https://doi.org/10.2196/69906 %U http://www.ncbi.nlm.nih.gov/pubmed/40323647 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67525 %T Development of a Predictive Model for Metabolic Syndrome Using Noninvasive Data and its Cardiovascular Disease Risk Assessments: Multicohort Validation Study %A Park,Jin-Hyun %A Jeong,Inyong %A Ko,Gang-Jee %A Jeong,Seogsong %A Lee,Hwamin %+ Korea University College of Medicine, 73 Goryeodae-ro, Seongbuk-gu, Seoul, 02841, Republic of Korea, 82 1063205109, hwamin@korea.ac.kr %K metabolic syndrome prediction %K noninvasive data %K clinical interpretable model %K body composition data %K early intervention %D 2025 %7 2.5.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Metabolic syndrome is a cluster of metabolic abnormalities, including obesity, hypertension, dyslipidemia, and insulin resistance, that significantly increase the risk of cardiovascular disease (CVD) and other chronic conditions. Its global prevalence is rising, particularly in aging and urban populations. Traditional screening methods rely on laboratory tests and specialized assessments, which may not be readily accessible in routine primary care and community settings. Limited resources, time constraints, and inconsistent screening practices hinder early identification and intervention. Developing a noninvasive and scalable predictive model could enhance accessibility and improve early detection. Objective: This study aimed to develop and validate a predictive model for metabolic syndrome using noninvasive body composition data. Additionally, we evaluated the model’s ability to predict long-term CVD risk, supporting its application in clinical and public health settings for early intervention and preventive strategies. Methods: We developed a machine learning–based predictive model using noninvasive data from two nationally representative cohorts: the Korea National Health and Nutrition Examination Survey (KNHANES) and the Korean Genome and Epidemiology Study. The model was trained using dual-energy x-ray absorptiometry data from KNHANES (2008-2011) and validated internally with bioelectrical impedance analysis data from KNHANES 2022. External validation was conducted using Korean Genome and Epidemiology Study follow-up datasets. Five machine learning algorithms were compared, and the best-performing model was selected based on the area under the receiver operating characteristic curve. Cox proportional hazards regression was used to assess the model’s ability to predict long-term CVD risk. Results: The model demonstrated strong predictive performance across validation cohorts. Area under the receiver operating characteristic curve values for metabolic syndrome prediction ranged from 0.8338 to 0.8447 in internal validation, 0.8066 to 0.8138 in external validation 1, and 0.8039 to 0.8123 in external validation 2. The model’s predictions were significantly associated with future cardiovascular risk, with Cox regression analysis indicating that individuals classified as having metabolic syndrome had a 1.51-fold higher risk of developing CVD (hazard ratio 1.51, 95% CI 1.32-1.73; P<.001). The ability to predict long-term CVD risk highlights the potential utility of this model for guiding early interventions. Conclusions: This study developed a noninvasive predictive model for metabolic syndrome with strong performance across diverse validation cohorts. By enabling early risk identification without laboratory tests, the model enhances accessibility in primary care and large-scale screenings. Its ability to predict long-term CVD risk supports proactive intervention strategies, potentially reducing the burden of cardiometabolic diseases. Further research should refine the model with additional clinical factors and broader population validation to maximize its clinical impact. %M 40315452 %R 10.2196/67525 %U https://www.jmir.org/2025/1/e67525 %U https://doi.org/10.2196/67525 %U http://www.ncbi.nlm.nih.gov/pubmed/40315452 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 9 %N %P e59631 %T Predicting Transvaginal Surgical Mesh Exposure Outcomes Using an Integrated Dataset of Blood Cytokine Levels and Medical Record Data: Machine Learning Approach %A Waugh,Mihyun Lim %A Mills,Tyler %A Boltin,Nicholas %A Wolf,Lauren %A Parker,Patti %A Horner,Ronnie %A Wheeler II,Thomas L %A Goodwin,Richard L %A Moss,Melissa A %K cytokines %K inflammatory response %K medical record %K pelvic organ prolapse %K polypropylene mesh %K supervised machine learning models %K polypropylene %K mesh surgery %K surgical outcome %K cost-efficiency %K risk factor %K efficacy %K health care data %K female %K informed decision-making %K patient care %K digital health %D 2025 %7 1.5.2025 %9 %J JMIR Form Res %G English %X Background: Transvaginal insertion of polypropylene mesh was extensively used in surgical procedures to treat pelvic organ prolapse (POP) due to its cost-efficiency and durability. However, studies have reported a high rate of complications, including mesh exposure through the vaginal wall. Developing predictive models via supervised machine learning holds promise in identifying risk factors associated with such complications, thereby facilitating better informed surgical decisions. Previous studies have demonstrated the efficacy of anticipating medical outcomes by employing supervised machine learning approaches that integrate patient health care data with laboratory findings. However, such an approach has not been adopted within the realm of POP mesh surgery. Objective: We examined the efficacy of supervised machine learning to predict mesh exposure following transvaginal POP surgery using 3 different datasets: (1) patient medical record data, (2) biomaterial-induced blood cytokine levels, and (3) the integration of both. Methods: Blood samples and medical record data were collected from 20 female patients who had prior surgical intervention for POP using transvaginal polypropylene mesh. Of these subjects, 10 had experienced mesh exposure through the vaginal wall following surgery, and 10 had not. Standardized medical record data, including vital signs, previous diagnoses, and social history, were acquired from patient records. In addition, cytokine levels in patient blood samples incubated with sterile polypropylene mesh were measured via multiplex assay. Datasets were created with patient medical record data alone, blood cytokine levels alone, and the integration of both data. The data were split into 70% and 30% for training and testing sets, respectively, for machine learning models that predicted the presence or absence of postsurgical mesh exposure. Results: Upon training the models with patient medical record data, systolic blood pressure, pulse pressure, and a history of alcohol usage emerged as the most significant factors for predicting mesh exposure. Conversely, when the models were trained solely on blood cytokine levels, interleukin (IL)-1β and IL-12 p40 stood out as the most influential cytokines in predicting mesh exposure. Using the combined dataset, new factors emerged as the primary predictors of mesh exposure: IL-8, tumor necrosis factor-α, and the presence of hemorrhoids. Remarkably, models trained on the integrated dataset demonstrated superior predictive capabilities with a prediction accuracy as high as 94%, surpassing the predictive performance of individual datasets. Conclusions: Supervised machine learning models demonstrated improved prediction accuracy when trained using a composite dataset that combined patient medical record data and biomaterial-induced blood cytokine levels, surpassing the performance of models trained with either dataset in isolation. This result underscores the advantage of integrating health care data with blood biomarkers, presenting a promising avenue for predicting surgical outcomes in not only POP mesh procedures but also other surgeries involving biomaterials. Such an approach has the potential to enhance informed decision-making for both patients and surgeons, ultimately elevating the standard of patient care. %R 10.2196/59631 %U https://formative.jmir.org/2025/1/e59631 %U https://doi.org/10.2196/59631 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 4 %N %P e67356 %T Fine-Grained Classification of Pressure Ulcers and Incontinence-Associated Dermatitis Using Multimodal Deep Learning: Algorithm Development and Validation Study %A Brehmer,Alexander %A Seibold,Constantin %A Egger,Jan %A Majjouti,Khalid %A Tapp-Herrenbrück,Michaela %A Pinnekamp,Hannah %A Priester,Vanessa %A Aleithe,Michael %A Fischer,Uli %A Hosters,Bernadette %A Kleesiek,Jens %K computer vision %K image classification %K wound classification %K deep learning %K pressure ulcer %K incontinence-associated dermatitis %K multi modal data %K synthetic image generation %D 2025 %7 1.5.2025 %9 %J JMIR AI %G English %X Background: Pressure ulcers (PUs) and incontinence-associated dermatitis (IAD) are prevalent conditions in clinical settings, posing significant challenges due to their similar presentations but differing treatment needs. Accurate differentiation between PUs and IAD is essential for appropriate patient care, yet it remains a burden for nursing staff and wound care experts. Objective: This study aims to develop and introduce a robust multimodal deep learning framework for the classification of PUs and IAD, along with the fine-grained categorization of their respective wound severities, to enhance diagnostic accuracy and support clinical decision-making. Methods: We collected and annotated a dataset of 1555 wound images, achieving consensus among 4 wound experts. Our framework integrates wound images with categorical patient data to improve classification performance. We evaluated 4 models—2 convolutional neural networks and 2 transformer-based architectures—each with approximately 25 million parameters. Various data preprocessing strategies, augmentation techniques, training methods (including multimodal data integration, synthetic data generation, and sampling), and postprocessing approaches (including ensembling and test-time augmentation) were systematically tested to optimize model performance. Results: The transformer-based TinyViT model achieved the highest performance in binary classification of PU and IAD, with an F1-score (harmonic mean of precision and recall) of 93.23%, outperforming wound care experts and nursing staff on the test dataset. In fine-grained classification of wound categories, the TinyViT model also performed best for PU categories with an F1-score of 75.43%, while ConvNeXtV2 showed superior performance in IAD category classification with an F1-score of 53.20%. Incorporating multimodal data improved performance in binary classification but had less impact on fine-grained categorization. Augmentation strategies and training techniques significantly influenced model performance, with ensembling enhancing accuracy across all tasks. Conclusions: Our multimodal deep learning framework effectively differentiates between PUs and IAD, achieving high accuracy and outperforming human wound care experts. By integrating wound images with categorical patient data, the model enhances diagnostic precision, offering a valuable decision-support tool for health care professionals. This advancement has the potential to reduce diagnostic uncertainty, optimize treatment pathways, and alleviate the burden on medical staff, leading to faster interventions and improved patient outcomes. The framework’s strong performance suggests practical applications in clinical settings, such as integration into hospital electronic health record systems or mobile applications for bedside diagnostics. Future work should focus on validating real-world implementation, expanding dataset diversity, and refining fine-grained classification capabilities to further enhance clinical utility. %R 10.2196/67356 %U https://ai.jmir.org/2025/1/e67356 %U https://doi.org/10.2196/67356 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67383 %T Perspectives and Experiences With Large Language Models in Health Care: Survey Study %A Sumner,Jennifer %A Wang,Yuchen %A Tan,Si Ying %A Chew,Emily Hwee Hoon %A Wenjun Yip,Alexander %+ Alexandra Research Centre for Healthcare in a Virtual Environment, Alexandra Hospital, Alexandra Hospital-Medical Affairs, Alexandra Road, Singapore, 159964, Singapore, 65 98860360, jennyssumner@gmail.com %K digital health %K artificial intelligence %K survey research %K large language model %K healthcare %K survey %K workforce %K healthcare worker %K professional %D 2025 %7 1.5.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Large language models (LLMs) are transforming how data is used, including within the health care sector. However, frameworks including the Unified Theory of Acceptance and Use of Technology highlight the importance of understanding the factors that influence technology use for successful implementation. Objective: This study aimed to (1) investigate users’ uptake, perceptions, and experiences regarding LLMs in health care and (2) contextualize survey responses by demographics and professional profiles. Methods: An electronic survey was administered to elicit stakeholder perspectives of LLMs (health care providers and support functions), their experiences with LLMs, and their potential impact on functional roles. Survey domains included: demographics (6 questions), user experiences of LLMs (8 questions), motivations for using LLMs (6 questions), and perceived impact on functional roles (4 questions). The survey was launched electronically, targeting health care providers or support staff, health care students, and academics in health-related fields. Respondents were adults (>18 years) aware of LLMs. Results: Responses were received from 1083 individuals, of which 845 were analyzable. Of the 845 respondents, 221 had yet to use an LLM. Nonusers were more likely to be health care workers (P<.001), older (P<.001), and female (P<.01). Users primarily adopted LLMs for speed, convenience, and productivity. While 75% (470/624) agreed that the user experience was positive, 46% (294/624) found the generated content unhelpful. Regression analysis showed that the experience with LLMs is more likely to be positive if the user is male (odds ratio [OR] 1.62, CI 1.06-2.48), and increasing age was associated with a reduced likelihood of reporting LLM output as useful (OR 0.98, CI 0.96-0.99). Nonusers compared to LLM users were less likely to report LLMs meeting unmet needs (45%, 99/221 vs 65%, 407/624; OR 0.48, CI 0.35-0.65), and males were more likely to report that LLMs do address unmet needs (OR 1.64, CI 1.18-2.28). Furthermore, nonusers compared to LLM users were less likely to agree that LLMs will improve functional roles (63%, 140/221 vs 75%, 469/624; OR 0.60, CI 0.43-0.85). Free-text opinions highlighted concerns regarding autonomy, outperformance, and reduced demand for care. Respondents also predicted changes to human interactions, including fewer but higher quality interactions and a change in consumer needs as LLMs become more common, which would require provider adaptation. Conclusions: Despite the reported benefits of LLMs, nonusers—primarily health care workers, older individuals, and females—appeared more hesitant to adopt these tools. These findings underscore the need for targeted education and support to address adoption barriers and ensure the successful integration of LLMs in health care. Anticipated role changes, evolving human interactions, and the risk of the digital divide further emphasize the need for careful implementation and ongoing evaluation of LLMs in health care to ensure equity and sustainability. %M 40310666 %R 10.2196/67383 %U https://www.jmir.org/2025/1/e67383 %U https://doi.org/10.2196/67383 %U http://www.ncbi.nlm.nih.gov/pubmed/40310666 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 14 %N %P e65213 %T Multihealth Promotion Programs on Physical Health and Quality of Life in Older Adults: Quasi-Experimental Study %A Lee,Li-Yun %A Tung,Heng-Hsin %A Liao,George %A Liu,Su-Ju %A Chen,Zi-Yu %A Yang,Yea-Ru %+ , Department of Nursing, National Yang Ming Chiao Tung University, No 155, Sec 2, Linong St, Beitou Dist, Taipei, 112, Taiwan, 886 2 2826 7000 ext 67991, shannontung719@gmail.com %K older adult %K body composition %K physical activity %K health promotion %K exercise %K nutrition %K diet %K well-being %K quality-of-life %K QoL %K gerontology %K geriatrics %D 2025 %7 1.5.2025 %9 Original Paper %J Interact J Med Res %G English %X Background: Physical activity and appropriate nutrition are essential for older adults. Improving physical health and quality of life can lead to healthy aging. Objective: This study aims to investigate the long-term effects of multihealth promotion programs on the physical and mental health of older adults in communities. Methods: A quasi-experimental method was used to recruit 112 older adults voluntarily from a pharmacy in central Taiwan between April 2021 and February 2023. Participants were divided into an experimental group receiving a multihealth promotion program and a control group with no specific intervention. The study measured frailty, nutritional status, well-being, and quality of life using standardized tools such as the Clinical Frailty Scale (CFS), Mini-Nutritional Assessment-Short Form (MNA-SF), Well-being Scale for Elders, and the EQ-5D-3L. Data were analyzed using descriptive statistics, independent t tests, Pearson correlation, and generalized estimating equations. Results: A total of 112 participants were recruited. There were 64 (57.1%) in the experimental group and 48 (42.9%) in the control group. The experimental group exhibited significantly better quality of life (EQ-5D index) at weeks 12 (β=–.59; P=.01) and 24 (β=–.44; P=.04) compared to the control group. The experimental group muscle mass significantly increased at weeks 24 (β=4.29; P<.01) and 36 (β=3.03; P=.01). Upper limb strength improved significantly at weeks 12 (β=3.4; P=.04) and 36 (β=5; P=.01), while core strength showed significant gains at weeks 12 (β=4.43; P=.01) and 36 (β=6.99; P<.01). Lower limb strength increased significantly only at week 12 (β=4.15; P=.01). Overall physical performance improved significantly at weeks 12 (β=5.47; P<.01), 24 (β=5.17; P<.01), and 36 (β=8.79; P<.01). Conclusions: The study’s findings highlight the practical benefits of interventions, including physical and social activities and nutritional support, in enhancing the quality of life and general physical health of older adults. This study’s findings have significant implications for clinical practice. These findings can aid in the establishment of effective interventions for older adults. Trial Registration: ClinicalTrials.gov NCT05412251; https://clinicaltrials.gov/study/NCT05412251 %M 40310677 %R 10.2196/65213 %U https://www.i-jmr.org/2025/1/e65213 %U https://doi.org/10.2196/65213 %U http://www.ncbi.nlm.nih.gov/pubmed/40310677 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e68030 %T Investigating Protective and Risk Factors and Predictive Insights for Aboriginal Perinatal Mental Health: Explainable Artificial Intelligence Approach %A Wang,Guanjin %A Bennamoun,Hachem %A Kwok,Wai Hang %A Quimbayo,Jenny Paola Ortega %A Kelly,Bridgette %A Ratajczak,Trish %A Marriott,Rhonda %A Walker,Roz %A Kotz,Jayne %+ , School of Information Technology, Murdoch University, 90 South St, Murdoch WA, Perth, 6150, Australia, 61 89360735, Guanjin.Wang@murdoch.edu.au %K explainable AI %K perinatal mental health %K AI-assisted decision-making %K perinatal %K mental health %K artificial intelligence %K predictive %K depression %K anxiety %K maternal health %K maternal %K infant health %K infant %K Aboriginal %K woman %K psychological risk %K mother %K decision-making %K decision support %K machine learning %K psychological distress %K Aboriginal mothers %K risk factors %K Australia %K cultural strengths %K protective factors %K life events %K worries %K relationships %K childhood experiences %K domestic violence %K substance use %D 2025 %7 30.4.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Perinatal depression and anxiety significantly impact maternal and infant health, potentially leading to severe outcomes like preterm birth and suicide. Aboriginal women, despite their resilience, face elevated risks due to the long-term effects of colonization and cultural disruption. The Baby Coming You Ready (BCYR) model of care, centered on a digitized, holistic, strengths-based assessment, was co-designed to address these challenges. The successful BCYR pilot demonstrated its ability to replace traditional risk-based screens. However, some health professionals still overrely on psychological risk scores, often overlooking the contextual circumstances of Aboriginal mothers, their cultural strengths, and mitigating protective factors. This highlights the need for new tools to improve clinical decision-making. Objective: We explored different explainable artificial intelligence (XAI)–powered machine learning techniques for developing culturally informed, strengths-based predictive modeling of perinatal psychological distress among Aboriginal mothers. The model identifies and evaluates influential protective and risk factors while offering transparent explanations for AI-driven decisions. Methods: We used deidentified data from 293 Aboriginal mothers who participated in the BCYR program between September 2021 and June 2023 at 6 health care services in Perth and regional Western Australia. The original dataset includes variables spanning cultural strengths, protective factors, life events, worries, relationships, childhood experiences, family and domestic violence, and substance use. After applying feature selection and expert input, 20 variables were chosen as predictors. The Kessler-5 scale was used as an indicator of perinatal psychological distress. Several machine learning models, including random forest (RF), CatBoost (CB), light gradient-boosting machine (LightGBM), extreme gradient boosting (XGBoost), k-nearest neighbor (KNN), support vector machine (SVM), and explainable boosting machine (EBM), were developed and compared for predictive performance. To make the black-box model interpretable, post hoc explanation techniques including Shapley additive explanations and local interpretable model-agnostic explanations were applied. Results: The EBM outperformed other models (accuracy=0.849, 95% CI 0.8170-0.8814; F1-score=0.771, 95% CI 0.7169-0.8245; area under the curve=0.821, 95% CI 0.7829-0.8593) followed by RF (accuracy=0.829, 95% CI 0.7960-0.8617; F1-score=0.736, 95% CI 0.6859-0.7851; area under the curve=0.795, 95% CI 0.7581-0.8318). Explanations from EBM, Shapley additive explanations, and local interpretable model-agnostic explanations identified consistent patterns of key influential factors, including questions related to “Feeling Lonely,” “Blaming Herself,” “Makes Family Proud,” “Life Not Worth Living,” and “Managing Day-to-Day.” At the individual level, where responses are highly personal, these XAI techniques provided case-specific insights through visual representations, distinguishing between protective and risk factors and illustrating their impact on predictions. Conclusions: This study shows the potential of XAI-driven models to predict psychological distress in Aboriginal mothers and provide clear, human-interpretable explanations of how important factors interact and influence outcomes. These models may help health professionals make more informed, non-biased decisions in Aboriginal perinatal mental health screenings. %M 40306634 %R 10.2196/68030 %U https://www.jmir.org/2025/1/e68030 %U https://doi.org/10.2196/68030 %U http://www.ncbi.nlm.nih.gov/pubmed/40306634 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e66098 %T Use of Retrieval-Augmented Large Language Model for COVID-19 Fact-Checking: Development and Usability Study %A Li,Hai %A Huang,Jingyi %A Ji,Mengmeng %A Yang,Yuyi %A An,Ruopeng %+ , School of Economics and Management, Shanghai University of Sport, 650 Hengren Road, Yangpu District, Shanghai, 200000, China, 86 13816490872, lihai1107@hotmail.com %K large language model %K misinformation %K disinformation %K fact-checking %K COVID-19 %K artificial intelligence %K ChatGPT %K natural language processing %K machine learning %K SARS-CoV-2 %K coronavirus %K respiratory %K infectious %K pulmonary %K pandemic %K infodemic %K retrieval-augmented generation %K accuracy %D 2025 %7 30.4.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: The COVID-19 pandemic has been accompanied by an “infodemic,” where the rapid spread of misinformation has exacerbated public health challenges. Traditional fact-checking methods, though effective, are time-consuming and resource-intensive, limiting their ability to combat misinformation at scale. Large language models (LLMs) such as GPT-4 offer a more scalable solution, but their susceptibility to generating hallucinations—plausible yet incorrect information—compromises their reliability. Objective: This study aims to enhance the accuracy and reliability of COVID-19 fact-checking by integrating a retrieval-augmented generation (RAG) system with LLMs, specifically addressing the limitations of hallucination and context inaccuracy inherent in stand-alone LLMs. Methods: We constructed a context dataset comprising approximately 130,000 peer-reviewed papers related to COVID-19 from PubMed and Scopus. This dataset was integrated with GPT-4 to develop multiple RAG-enhanced models: the naïve RAG, Lord of the Retrievers (LOTR)–RAG, corrective RAG (CRAG), and self-RAG (SRAG). The RAG systems were designed to retrieve relevant external information, which was then embedded and indexed in a vector store for similarity searches. One real-world dataset and one synthesized dataset, each containing 500 claims, were used to evaluate the performance of these models. Each model’s accuracy, F1-score, precision, and sensitivity were compared to assess their effectiveness in reducing hallucination and improving fact-checking accuracy. Results: The baseline GPT-4 model achieved an accuracy of 0.856 on the real-world dataset. The naïve RAG model improved this to 0.946, while the LOTR-RAG model further increased accuracy to 0.951. The CRAG and SRAG models outperformed all others, achieving accuracies of 0.972 and 0.973, respectively. The baseline GPT-4 model reached an accuracy of 0.960 on the synthesized dataset. The naïve RAG model increased this to 0.972, and the LOTR-RAG, CRAG, and SRAG models achieved an accuracy of 0.978. These findings demonstrate that the RAG-enhanced models consistently maintained high accuracy levels, closely mirroring ground-truth labels and significantly reducing hallucinations. The CRAG and SRAG models also provided more detailed and contextually accurate explanations, further establishing the superiority of agentic RAG frameworks in delivering reliable and precise fact-checking outputs across diverse datasets. Conclusions: The integration of RAG systems with LLMs substantially improves the accuracy and contextual relevance of automated fact-checking. By reducing hallucinations and enhancing transparency by citing retrieved sources, this method holds significant promise for rapid, reliable information verification to combat misinformation during public health crises. %M 40306628 %R 10.2196/66098 %U https://www.jmir.org/2025/1/e66098 %U https://doi.org/10.2196/66098 %U http://www.ncbi.nlm.nih.gov/pubmed/40306628 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e64486 %T Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis %A Wang,Ling %A Li,Jinglin %A Zhuang,Boyang %A Huang,Shasha %A Fang,Meilin %A Wang,Cunze %A Li,Wen %A Zhang,Mohan %A Gong,Shurong %+ The Third Department of Critical Care Medicine, Fuzhou University Affiliated Provincial Hospital, Shengli Clinical Medical College, Fujian Medical University, No.134 Dongjie Road, Fuzhou, Fujian, 350001, China, 86 15060677447, shurong_gong@fjmu.edu.cn %K large language models %K LLM %K clinical research questions %K accuracy %K network meta-analysis %K PRISMA %D 2025 %7 30.4.2025 %9 Review %J J Med Internet Res %G English %X Background: Large language models (LLMs) have flourished and gradually become an important research and application direction in the medical field. However, due to the high degree of specialization, complexity, and specificity of medicine, which results in extremely high accuracy requirements, controversy remains about whether LLMs can be used in the medical field. More studies have evaluated the performance of various types of LLMs in medicine, but the conclusions are inconsistent. Objective: This study uses a network meta-analysis (NMA) to assess the accuracy of LLMs when answering clinical research questions to provide high-level evidence-based evidence for its future development and application in the medical field. Methods: In this systematic review and NMA, we searched PubMed, Embase, Web of Science, and Scopus from inception until October 14, 2024. Studies on the accuracy of LLMs when answering clinical research questions were included and screened by reading published reports. The systematic review and NMA were conducted to compare the accuracy of different LLMs when answering clinical research questions, including objective questions, open-ended questions, top 1 diagnosis, top 3 diagnosis, top 5 diagnosis, and triage and classification. The NMA was performed using Bayesian frequency theory methods. Indirect intercomparisons between programs were performed using a grading scale. A larger surface under the cumulative ranking curve (SUCRA) value indicates a higher ranking of the corresponding LLM accuracy. Results: The systematic review and NMA examined 168 articles encompassing 35,896 questions and 3063 clinical cases. Of the 168 studies, 40 (23.8%) were considered to have a low risk of bias, 128 (76.2%) had a moderate risk, and none were rated as having a high risk. ChatGPT-4o (SUCRA=0.9207) demonstrated strong performance in terms of accuracy for objective questions, followed by Aeyeconsult (SUCRA=0.9187) and ChatGPT-4 (SUCRA=0.8087). ChatGPT-4 (SUCRA=0.8708) excelled at answering open-ended questions. In terms of accuracy for top 1 diagnosis and top 3 diagnosis of clinical cases, human experts (SUCRA=0.9001 and SUCRA=0.7126, respectively) ranked the highest, while Claude 3 Opus (SUCRA=0.9672) performed well at the top 5 diagnosis. Gemini (SUCRA=0.9649) had the highest rated SUCRA value for accuracy in the area of triage and classification. Conclusions: Our study indicates that ChatGPT-4o has an advantage when answering objective questions. For open-ended questions, ChatGPT-4 may be more credible. Humans are more accurate at the top 1 diagnosis and top 3 diagnosis. Claude 3 Opus performs better at the top 5 diagnosis, while for triage and classification, Gemini is more advantageous. This analysis offers valuable insights for clinicians and medical practitioners, empowering them to effectively leverage LLMs for improved decision-making in learning, diagnosis, and management of various clinical scenarios. Trial Registration: PROSPERO CRD42024558245; https://www.crd.york.ac.uk/PROSPERO/view/CRD42024558245 %M 40305085 %R 10.2196/64486 %U https://www.jmir.org/2025/1/e64486 %U https://doi.org/10.2196/64486 %U http://www.ncbi.nlm.nih.gov/pubmed/40305085 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e68762 %T Harnessing an Artificial Intelligence–Based Large Language Model With Personal Health Record Capability for Personalized Information Support in Postsurgery Myocardial Infarction: Descriptive Qualitative Study %A Yang,Ting-ting %A Zheng,Hong-xia %A Cao,Sha %A Jing,Mei-ling %A Hu,Ju %A Zuo,Yan %A Chen,Qing-yong %A Zhang,Jian-jun %+ , Department of Gynecology and Obstetrics, West China Second University Hospital, Sichuan University, #20 3rd Section, Renmin Nan Road, Chengdu, 610041, China, 86 13348886426, zhangjianjun-1983@163.com %K myocardial infarction %K post-surgery recovery %K personalized health support %K artificial intelligence %K large language model %K personal health record %K digital health tools %K health information accessibility %K qualitative study %K mobile phone %D 2025 %7 30.4.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Myocardial infarction (MI) remains a leading cause of morbidity and mortality worldwide. Although postsurgical cardiac interventions have improved survival rates, effective management during recovery remains challenging. Traditional informational support systems often provide generic guidance that does not account for individualized medical histories or psychosocial factors. Recently, artificial intelligence (AI)–based large language models (LLM) tools have emerged as promising interventions to deliver personalized health information to post-MI patients. Objective: We aim to explore the user experiences and perceptions of an AI-based LLM tool (iflyhealth) with integrated personal health record functionality in post-MI care, assess how patients and their family members engaged with the tool during recovery, identify the perceived benefits and challenges of using the technology, and to understand the factors promoting or hindering continued use. Methods: A purposive sample of 20 participants (12 users and 8 nonusers) who underwent MI surgery within the previous 6 months was recruited between July and August 2024. Data were collected through semistructured, face-to-face interviews conducted in a private setting, using an interview guide to address participants’ first impressions, usage patterns, and reasons for adoption or nonadoption of the iflyhealth app. The interviews were audio-recorded, transcribed verbatim, and analyzed using Colaizzi method. Results: Four key themes revealed included: (1) participants’ experiences varied based on digital literacy, prior exposure to health technologies, and individual recovery needs; (2) users appreciated the app’s enhanced accessibility to professional health information, personalized advice tailored to their clinical conditions, and the tool’s responsiveness to health status changes; (3) challenges such as difficulties with digital literacy, usability concerns, and data privacy issues were significant barriers; and (4) nonusers and those who discontinued use primarily cited complexity of the interface and perceived limited relevance of the advice as major deterrents. Conclusions: iflyhealth, an LLM AI app with a built-in personal health record functionality, shows significant potential in assisting post-MI patients. The main benefits reported by iflyhealth users include improved access to personalized health information and an enhanced ability to respond to changing health conditions. However, challenges such as digital literacy, usability, and privacy and security concerns persist. Overcoming the barriers may further enhance the use of the iflyhealth app, which can play an important role in patient-centered, personalized post-MI management. %M 40305084 %R 10.2196/68762 %U https://www.jmir.org/2025/1/e68762 %U https://doi.org/10.2196/68762 %U http://www.ncbi.nlm.nih.gov/pubmed/40305084 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 4 %N %P e72109 %T Using Segment Anything Model 2 for Zero-Shot 3D Segmentation of Abdominal Organs in Computed Tomography Scans to Adapt Video Tracking Capabilities for 3D Medical Imaging: Algorithm Development and Validation %A Yamagishi,Yosuke %A Hanaoka,Shouhei %A Kikuchi,Tomohiro %A Nakao,Takahiro %A Nakamura,Yuta %A Nomura,Yukihiro %A Miki,Soichiro %A Yoshikawa,Takeharu %A Abe,Osamu %K artificial intelligence %K medical image processing %K computed tomography %K abdominal imaging %K segmentation %K AI %D 2025 %7 29.4.2025 %9 %J JMIR AI %G English %X Background: Medical image segmentation is crucial for diagnosis and treatment planning in radiology, but it traditionally requires extensive manual effort and specialized training data. With its novel video tracking capabilities, the Segment Anything Model 2 (SAM 2) presents a potential solution for automated 3D medical image segmentation without the need for domain-specific training. However, its effectiveness in medical applications, particularly in abdominal computed tomography (CT) imaging remains unexplored. Objective: The aim of this study was to evaluate the zero-shot performance of SAM 2 in 3D segmentation of abdominal organs in CT scans and to investigate the effects of prompt settings on segmentation results. Methods: In this retrospective study, we used a subset of the TotalSegmentator CT dataset from eight institutions to assess SAM 2’s ability to segment eight abdominal organs. Segmentation was initiated from three different z-coordinate levels (caudal, mid, and cranial levels) of each organ. Performance was measured using the dice similarity coefficient (DSC). We also analyzed the impact of “negative prompts,” which explicitly exclude certain regions from the segmentation process, on accuracy. Results: A total of 123 patients (mean age 60.7, SD 15.5 years; 63 men, 60 women) were evaluated. As a zero-shot approach, larger organs with clear boundaries demonstrated high segmentation performance, with mean DSCs as follows: liver, 0.821 (SD 0.192); right kidney, 0.862 (SD 0.212); left kidney, 0.870 (SD 0.154); and spleen, 0.891 (SD 0.131). Smaller organs showed lower performance: gallbladder, 0.531 (SD 0.291); pancreas, 0.361 (SD 0.197); and adrenal glands—right, 0.203 (SD 0.222) and left, 0.308 (SD 0.234). The initial slice for segmentation and the use of negative prompts significantly influenced the results. By removing negative prompts from the input, the DSCs significantly decreased for six organs. Conclusions: SAM 2 demonstrated promising zero-shot performance in segmenting certain abdominal organs in CT scans, particularly larger organs. Performance was significantly influenced by input negative prompts and initial slice selection, highlighting the importance of optimizing these factors. %R 10.2196/72109 %U https://ai.jmir.org/2025/1/e72109 %U https://doi.org/10.2196/72109 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 9 %N %P e66180 %T Applications of Self-Driving Vehicles in an Aging Population %A Shu,Sara %A Woo,Benjamin K P %K self-driving %K driverless %K driver %K autonomous vehicles %K car %K transportation %K mobility %K travel %K vehicle %K driving %K artificial intelligence %K gerontology %K geriatric %K older %K elderly %K aging %K healthy aging %K older adult %K autonomy %K independence %K aging in place %K health equity %D 2025 %7 28.4.2025 %9 %J JMIR Form Res %G English %X The proportion of older adult drivers is increasing and represents a growing population that must contemplate reducing driving and eventually stopping driving. The advent of self-driving vehicles opens vast possibilities with practical and far-reaching applications for our aging population. Advancing technologies in transportation may help to overcome transportation barriers for less mobile individuals, transcend social and geographical isolation, and improve resource and medical access. Herein, we propose various applications and benefits that self-driving vehicles have in maintaining independence and autonomy specifically for our aging population to preserve aging. %R 10.2196/66180 %U https://formative.jmir.org/2025/1/e66180 %U https://doi.org/10.2196/66180 %0 Journal Article %@ 2817-092X %I JMIR Publications %V 4 %N %P e70589 %T Effectiveness of Artificial Intelligence–Based Platform in Administering Therapies for Children With Autism Spectrum Disorder: 12-Month Observational Study %A Atturu,Harini %A Naraganti,Somasekhar %A Rao,Bugatha Rajvir %K autism spectrum disorder %K neurodevelopmental disorders %K applied behavior analysis %K software %K artificial intelligence %D 2025 %7 28.4.2025 %9 %J JMIR Neurotech %G English %X Background: A 12-month longitudinal observational study was conducted on 43 children aged 2‐18 years to evaluate the effectiveness of the CognitiveBotics artificial intelligence (AI)–based platform in conjunction with continuous therapy in improving therapeutic outcomes for children with autism spectrum disorder (ASD). Objective: This study evaluates the CognitiveBotics software’s effectiveness in supporting children with ASD through structured, technology-assisted learning. The primary objectives include assessing user engagement, tracking progress, and measuring efficacy using standardized clinical assessments. Methods: A 12-month observational study was conducted on children diagnosed with ASD using the CognitiveBotics AI-based platform. Standardized assessments, include the Childhood Autism Rating Scale (CARS), Vineland Social Maturity Scale, Developmental Screening Test, and Receptive Expressive Emergent Language Test (REEL), were conducted at baseline (T1) and at the endpoint (T2). All participants meeting the inclusion criteria were provided access to the platform and received standard therapy. Participants who consistently adhered to platform use as per the study protocol were classified as the intervention group, while those who did not maintain continuous platform use were designated as the control group. Additionally, caregivers received structured training, including web-based parent teaching sessions, reinforcement strategy training, and home-based activity guidance. Results: Participants in the intervention group demonstrated statistically significant improvements across multiple scales. CARS scores reduced from 33.41 (SD 1.89) at T1 to 28.34 (SD 3.80) at T2 (P<.001). Social age increased from 22.80 (SD 7.33) to 35.76 (SD 9.09; mean change: 12.96, 56.84% increase; P<.001). Social quotient increased from 53.26 (SD 11.84) to 64.75 (SD 16.12; mean change: 11.49, 21.57% increase; P<.001). Developmental age showed an improvement from 30.93 (SD 9.91) to 45.31 (SD 11.20; mean change: 14.38, 46.49% increase; P<.001), while developmental quotient increased from 70.94 (SD 10.95) to 81.33 (SD 16.85; mean change: 10.39, 14.65% increase; P<.001). REEL scores showed substantial improvements, with receptive language increasing by 56.22% (P<.001) and expressive language by 59.93% (P<.001). In the control group, while most psychometric parameters showed some improvements, they were not statistically significant. CARS scores decreased by 10.62% (P=.06), social age increased by 52.27% (P=.06), social quotient increased by 19.62% (P=.12), developmental age increased by 44.88% (P=.06), and developmental quotient increased by 11.23% (P=.19). REEL receptive and expressive language increased by 34.69% (P=.10) and 40.48% (P=.054), respectively. Conclusions: Overall, the platform was an effective supplement in enhancing therapeutic outcomes for children with ASD. This platform holds promise as a valuable tool for augmenting ASD therapies across cognitive, social, and developmental domains. Future development should prioritize expanding the product’s accessibility across various languages, ensuring cultural sensitivity and enhancing user-friendliness. %R 10.2196/70589 %U https://neuro.jmir.org/2025/1/e70589 %U https://doi.org/10.2196/70589 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e58723 %T Are Treatment Services Ready for the Use of Big Data Analytics and AI in Managing Opioid Use Disorder? %A Amer,Matthew %A Gittins,Rosalind %A Millana,Antonio Martinez %A Scheibein,Florian %A Ferri,Marica %A Tofighi,Babak %A Sullivan,Frank %A Handley,Margaret %A Ghosh,Monty %A Baldacchino,Alexander %A Tay Wee Teck,Joseph %+ NHS Tayside, Ninewells Hospital, 1 James Arrott Drive, Dundee, DD1 9SY, United Kingdom, 44 7462884057, matthew.amer2@nhs.scot %K machine learning %K ML %K artificial intelligence %K AI %K algorithm %K predictive model %K predictive analytics %K predictive system %K practical model %K deep learning %K early warning %K early detection %K big data %K opioid use %K opioid %K opioid use disorder %K substance use %K substance use disorder %D 2025 %7 28.4.2025 %9 Viewpoint %J J Med Internet Res %G English %X In this viewpoint, we explore the use of big data analytics and artificial intelligence (AI) and discuss important challenges to their ethical, effective, and equitable use within opioid use disorder (OUD) treatment settings. Applying our collective experiences as OUD policy and treatment experts, we discuss 8 key challenges that OUD treatment services must contend with to make the most of these rapidly evolving technologies: data and algorithmic transparency, clinical validation, new practitioner-technology interfaces, capturing data relevant to improving patient care, understanding and responding to algorithmic outputs, obtaining informed patient consent, navigating mistrust, and addressing digital exclusion and bias. Through this paper, we hope to critically engage clinicians and policy makers on important ethical considerations, clinical implications, and implementation challenges involved in big data analytics and AI deployment in OUD treatment settings. %M 40294410 %R 10.2196/58723 %U https://www.jmir.org/2025/1/e58723 %U https://doi.org/10.2196/58723 %U http://www.ncbi.nlm.nih.gov/pubmed/40294410 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67114 %T Expert and Interdisciplinary Analysis of AI-Driven Chatbots for Mental Health Support: Mixed Methods Study %A Moylan,Kayley %A Doherty,Kevin %+ School of Information and Communication Studies, University College Dublin, Belfield, Dublin, D04 V1W8, Ireland, 353 863113688, kayley.moylan@ucdconnect.ie %K mental health %K therapy %K design %K chatbots %K artificial intelligence %K AI %K ethics %K emotional dependence %K self-reliance %D 2025 %7 25.4.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Recent years have seen an immense surge in the creation and use of chatbots as social and mental health companions. Aiming to provide empathic responses in support of the delivery of personalized support, these tools are often presented as offering immense potential. However, it is also essential that we understand the risks of their deployment, including their potential adverse impacts on the mental health of users, including those most at risk. Objective: The study aims to assess the ethical and pragmatic clinical implications of using chatbots that claim to aid mental health. While several studies within human-computer interaction and related fields have examined users’ perceptions of such systems, few studies have engaged mental health professionals in critical analysis of their conduct as mental health support tools. This paper comprises, in turn, an effort to assess the ethical and pragmatic clinical implications of using chatbots that claim to aid mental health. Methods: This study included 8 interdisciplinary mental health professional participants (from psychology and psychotherapy to social care and crisis volunteer workers) in a mixed methods and hands-on analysis of 2 popular mental health–related chatbots’ data handling, interface design, and responses. This analysis was carried out through profession-specific tasks with each chatbot, eliciting participants’ perceptions through both the Trust in Automation scale and semistructured interviews. Through thematic analysis and a 2-tailed, paired t test, these chatbots’ implications for mental health support were thus evaluated. Results: Qualitative analysis revealed emphatic initial impressions among mental health professionals of chatbot responses likely to produce harm, exhibiting a generic mode of care, and risking user dependence and manipulation given the central role of trust in the therapeutic relationship. Trust scores from the Trust in Automation scale, while exhibiting no statistically significant differences between the chatbots (t6=–0.76; P=.48), indicated medium to low trust scores for each chatbot. The findings of this work highlight that the design and development of artificial intelligence (AI)–driven mental health–related solutions must be undertaken with utmost caution. The mental health professionals in this study collectively resist these chatbots and make clear that AI-driven chatbots used for mental health by at-risk users invite several potential and specific harms. Conclusions: Through this work, we contributed insights into the mental health professional perspective on the design of chatbots used for mental health and underscore the necessity of ongoing critical assessment and iterative refinement to maximize the benefits and minimize the risks associated with integrating AI into mental health support. %M 40279575 %R 10.2196/67114 %U https://www.jmir.org/2025/1/e67114 %U https://doi.org/10.2196/67114 %U http://www.ncbi.nlm.nih.gov/pubmed/40279575 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e68427 %T Enhancing Bidirectional Encoder Representations From Transformers (BERT) With Frame Semantics to Extract Clinically Relevant Information From German Mammography Reports: Algorithm Development and Validation %A Reichenpfader,Daniel %A Knupp,Jonas %A von Däniken,Sandro Urs %A Gaio,Roberto %A Dennstädt,Fabio %A Cereghetti,Grazia Maria %A Sander,André %A Hiltbrunner,Hans %A Nairz,Knud %A Denecke,Kerstin %+ Institute for Patient-Centered Digital Health, School of Engineering and Computer Science, Bern University of Applied Sciences, Quellgasse 21, Biel/Bienne, 2502, Switzerland, 41 31 848 60 93, daniel.reichenpfader@bfh.ch %K radiology %K information extraction %K mammography %K large language models %K structured reporting %K template filling %K annotation %K quality control %K natural language processing %D 2025 %7 25.4.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Structured reporting is essential for improving the clarity and accuracy of radiological information. Despite its benefits, the European Society of Radiology notes that it is not widely adopted. For example, while structured reporting frameworks such as the Breast Imaging Reporting and Data System provide standardized terminology and classification for mammography findings, radiology reports still mostly comprise free-text sections. This variability complicates the systematic extraction of key clinical data. Moreover, manual structuring of reports is time-consuming and prone to inconsistencies. Recent advancements in large language models have shown promise for clinical information extraction by enabling models to understand contextual nuances in medical text. However, challenges such as domain adaptation, privacy concerns, and generalizability remain. To address these limitations, frame semantics offers an approach to information extraction grounded in computational linguistics, allowing a structured representation of clinically relevant concepts. Objective: This study explores the combination of Bidirectional Encoder Representations from Transformers (BERT) architecture with the linguistic concept of frame semantics to extract and normalize information from free-text mammography reports. Methods: After creating an annotated corpus of 210 German reports for fine-tuning, we generate several BERT model variants by applying 3 pretraining strategies to hospital data. Afterward, a fact extraction pipeline is built, comprising an extractive question-answering model and a sequence labeling model. We quantitatively evaluate all model variants using common evaluation metrics (model perplexity, Stanford Question Answering Dataset 2.0 [SQuAD_v2], seqeval) and perform a qualitative clinician evaluation of the entire pipeline on a manually generated synthetic dataset of 21 reports, as well as a comparison with a generative approach following best practice prompting techniques using the open-source Llama 3.3 model (Meta). Results: Our system is capable of extracting 14 fact types and 40 entities from the clinical findings section of mammography reports. Further pretraining on hospital data reduced model perplexity, although it did not significantly impact the 2 downstream tasks. We achieved average F1-scores of 90.4% and 81% for question answering and sequence labeling, respectively (best pretraining strategy). Qualitative evaluation of the pipeline based on synthetic data shows an overall precision of 96.1% and 99.6% for facts and entities, respectively. In contrast, generative extraction shows an overall precision of 91.2% and 87.3% for facts and entities, respectively. Hallucinations and extraction inconsistencies were observed. Conclusions: This study demonstrates that frame semantics provides a robust and interpretable framework for automating structured reporting. By leveraging frame semantics, the approach enables customizable information extraction and supports generalization to diverse radiological domains and clinical contexts with additional annotation efforts. Furthermore, the BERT-based model architecture allows for efficient, on-premise deployment, ensuring data privacy. Future research should focus on validating the model’s generalizability across external datasets and different report types to ensure its broader applicability in clinical practice. %M 40279645 %R 10.2196/68427 %U https://www.jmir.org/2025/1/e68427 %U https://doi.org/10.2196/68427 %U http://www.ncbi.nlm.nih.gov/pubmed/40279645 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e65937 %T Localization and Classification of Adrenal Masses in Multiphase Computed Tomography: Retrospective Study %A Yang,Liuyang %A Zhang,Xinzhang %A Li,Zhenhui %A Wang,Jian %A Zhang,Yiwen %A Shan,Liyu %A Shi,Xin %A Si,Yapeng %A Wang,Shuailong %A Li,Lin %A Wu,Ping %A Xu,Ning %A Liu,Lizhu %A Yang,Junfeng %A Leng,Jinjun %A Yang,Maolin %A Zhang,Zhuorui %A Wang,Junfeng %A Dong,Xingxiang %A Yang,Guangjun %A Yan,Ruiying %A Li,Wei %A Liu,Zhimin %A Li,Wenliang %+ Yunnan Cancer Hospital, The Third Affiliated Hospital of Kunming Medical University, No. 519, Kunzhou Road, Xishan District, Kunming, 650118, China, 86 0871 6818903, liwenliang@kmmu.edu.cn %K MA-YOLO model %K multi-class adrenal masses %K multi-phase CT images %K localization %K classification %D 2025 %7 24.4.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: The incidence of adrenal incidentalomas is increasing annually, and most types of adrenal masses require surgical intervention. Accurate classification of common adrenal masses based on tumor computed tomography (CT) images by radiologists or clinicians requires extensive experience and is often challenging, which increases the workload of radiologists and leads to unnecessary adrenal surgeries. There is an urgent need for a fully automated, noninvasive, and precise approach for the identification and accurate classification of common adrenal masses. Objective: This study aims to enhance diagnostic efficiency and transform the current clinical practice of preoperative diagnosis of adrenal masses. Methods: This study is a retrospective analysis that includes patients with adrenal masses who underwent adrenalectomy from January 1, 2021, to May 31, 2023, at Center 1 (internal dataset), and from January 1, 2016, to May 31, 2023, at Center 2 (external dataset). The images include unenhanced, arterial, and venous phases, with 21,649 images used for the training set, 2406 images used for the validation set, and 12,857 images used for the external test set. We invited 3 experienced radiologists to precisely annotate the images, and these annotations served as references. We developed a deep learning–based adrenal mass detection model, Multi-Attention YOLO (MA-YOLO), which can automatically localize and classify 6 common types of adrenal masses. In order to scientifically evaluate the model performance, we used a variety of evaluation metrics, in addition, we compared the improvement in diagnostic efficacy of 6 doctors after incorporating model assistance. Results: A total of 516 patients were included. In the external test set, the MA-YOLO model achieved an intersection over union of 0.838, 0.885, and 0.890 for the localization of 6 types of adrenal masses in unenhanced, arterial, and venous phase CT images, respectively. The corresponding mean average precision for classification was 0.885, 0.913, and 0.915, respectively. Additionally, with the assistance of this model, the classification diagnostic performance of 6 radiologists and clinicians for adrenal masses improved. Except for adrenal cysts, at least 1 physician significantly improved diagnostic performance for the other 5 types of tumors. Notably, in the categories of adrenal adenoma (for senior clinician: P=.04, junior radiologist: P=.01, and senior radiologist: P=.01) and adrenal cortical carcinoma (junior clinician: P=.02, junior radiologist: P=.01, and intermediate radiologist: P=.001), half of the physicians showed significant improvements after using the model for assistance. Conclusions: The MA-YOLO model demonstrates the ability to achieve efficient, accurate, and noninvasive preoperative localization and classification of common adrenal masses in CT examinations, showing promising potential for future applications. %M 40273442 %R 10.2196/65937 %U https://www.jmir.org/2025/1/e65937 %U https://doi.org/10.2196/65937 %U http://www.ncbi.nlm.nih.gov/pubmed/40273442 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 4 %N %P e70566 %T The Diagnostic Performance of Large Language Models and Oral Medicine Consultants for Identifying Oral Lesions in Text-Based Clinical Scenarios: Prospective Comparative Study %A AlFarabi Ali,Sarah %A AlDehlawi,Hebah %A Jazzar,Ahoud %A Ashi,Heba %A Esam Abuzinadah,Nihal %A AlOtaibi,Mohammad %A Algarni,Abdulrahman %A Alqahtani,Hazzaa %A Akeel,Sara %A Almazrooa,Soulafa %K artificial intelligence %K ChatGPT %K Copilot %K diagnosis %K oral medicine %K diagnostic performance %K large language model %K lesion %K oral lesion %D 2025 %7 24.4.2025 %9 %J JMIR AI %G English %X Background: The use of artificial intelligence (AI), especially large language models (LLMs), is increasing in health care, including in dentistry. There has yet to be an assessment of the diagnostic performance of LLMs in oral medicine. Objective: We aimed to compare the effectiveness of ChatGPT (OpenAI) and Microsoft Copilot (integrated within the Microsoft 365 suite) with oral medicine consultants in formulating accurate differential and final diagnoses for oral lesions from written clinical scenarios. Methods: Fifty comprehensive clinical case scenarios including patient age, presenting complaint, history of the presenting complaint, medical history, allergies, intra- and extraoral findings, lesion description, and any additional information including laboratory investigations and specific clinical features were given to three oral medicine consultants, who were asked to formulate a differential diagnosis and a final diagnosis. Specific prompts for the same 50 cases were designed and input into ChatGPT and Copilot to formulate both differential and final diagnoses. The diagnostic accuracy was compared between the LLMs and oral medicine consultants. Results: ChatGPT exhibited the highest accuracy, providing the correct differential diagnoses in 37 of 50 cases (74%). There were no significant differences in the accuracy of providing the correct differential diagnoses between AI models and oral medicine consultants. ChatGPT was as accurate as consultants in making the final diagnoses, but Copilot was significantly less accurate than ChatGPT (P=.015) and one of the oral medicine consultants (P<.001) in providing the correct final diagnosis. Conclusions: ChatGPT and Copilot show promising performance for diagnosing oral medicine pathology in clinical case scenarios to assist dental practitioners. ChatGPT-4 and Copilot are still evolving, but even now, they might provide a significant advantage in the clinical setting as tools to help dental practitioners in their daily practice. %R 10.2196/70566 %U https://ai.jmir.org/2025/1/e70566 %U https://doi.org/10.2196/70566 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e71521 %T Evaluating ChatGPT in Qualitative Thematic Analysis With Human Researchers in the Japanese Clinical Context and Its Cultural Interpretation Challenges: Comparative Qualitative Study %A Sakaguchi,Kota %A Sakama,Reiko %A Watari,Takashi %+ , Integrated Clinical Education Center, Kyoto University Hospital, Shogoin Kawaramachi 54, Sakyo-ku, Kyoto, 606-8506, Japan, 81 075 751 4839, wataritari@gmail.com %K ChatGPT %K large language models %K qualitative research %K sacred moment(s) %K thematic analysis %D 2025 %7 24.4.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Qualitative research is crucial for understanding the values and beliefs underlying individual experiences, emotions, and behaviors, particularly in social sciences and health care. Traditionally reliant on manual analysis by experienced researchers, this methodology requires significant time and effort. The advent of artificial intelligence (AI) technology, especially large language models such as ChatGPT (OpenAI), holds promise for enhancing qualitative data analysis. However, existing studies have predominantly focused on AI’s application to English-language datasets, leaving its applicability to non-English languages, particularly structurally and contextually complex languages such as Japanese, insufficiently explored. Objective: This study aims to evaluate the feasibility, strengths, and limitations of ChatGPT-4 in analyzing qualitative Japanese interview data by directly comparing its performance with that of experienced human researchers. Methods: A comparative qualitative study was conducted to assess the performance of ChatGPT-4 and human researchers in analyzing transcribed Japanese semistructured interviews. The analysis focused on thematic agreement rates, interpretative depth, and ChatGPT’s ability to process culturally nuanced concepts, particularly for descriptive and socio-culturally embedded themes. This study analyzed transcripts from 30 semistructured interviews conducted between February and March 2024 in an urban community hospital (Hospital A) and a rural university hospital (Hospital B) in Japan. Interviews centered on the theme of “sacred moments” and involved health care providers and patients. Transcripts were digitized using NVivo (version 14; Lumivero) and analyzed using ChatGPT-4 with iterative prompts for thematic analysis. The results were compared with a reflexive thematic analysis performed by human researchers. Furthermore, to assess the adaptability and consistency of ChatGPT in qualitative analysis, Charmaz’s grounded theory and Pope’s five-step framework approach were applied. Results: ChatGPT-4 demonstrated high thematic agreement rates (>80%) with human researchers for descriptive themes such as “personal experience of a sacred moment” and “building relationships.” However, its performance declined for themes requiring deeper cultural and emotional interpretation, such as “difficult to answer, no experience of sacred moments” and “fate.” For these themes, agreement rates were approximately 30%, revealing significant limitations in ChatGPT’s ability to process context-dependent linguistic structures and implicit emotional expressions in Japanese. Conclusions: ChatGPT-4 demonstrates potential as an auxiliary tool in qualitative research, particularly for efficiently identifying descriptive themes within Japanese-language datasets. However, its limited capacity to interpret cultural and emotional nuances highlights the continued necessity of human expertise in qualitative analysis. These findings emphasize the complementary role of AI-assisted qualitative research and underscore the importance of further advancements in AI models tailored to non-English linguistic and cultural contexts. Future research should explore strategies to enhance AI’s interpretability, expand multilingual training datasets, and assess the applicability of emerging AI models in diverse cultural settings. In addition, ethical and legal considerations in AI-driven qualitative analysis require continued scrutiny. %M 40273439 %R 10.2196/71521 %U https://www.jmir.org/2025/1/e71521 %U https://doi.org/10.2196/71521 %U http://www.ncbi.nlm.nih.gov/pubmed/40273439 %0 Journal Article %@ 2369-2529 %I JMIR Publications %V 12 %N %P e70855 %T Mainstream Smart Home Technology–Based Intervention to Enhance Functional Independence in Individuals With Complex Physical Disabilities: Single-Group Pre-Post Feasibility Study %A Ding,Dan %A Morris,Lindsey %A Novario,Gina %A Fairman,Andrea %A Roehrich,Kacey %A Foschi Walko,Palma %A Boateng,Jessica %+ Department of Rehabilitation Science and Technology, School of Health and Rehabilitation Sciences, University of Pittsburgh, 6425 Penn Ave, Suite 401, Pittsburgh, PA, 15206, United States, 1 412 624 1964, dad5@pitt.edu %K physical disabilities %K smart home technology %K assistive technology %K assistive technology service delivery %K functional independence %K participation %K occupational therapy %K artificial intelligence %K AI %D 2025 %7 24.4.2025 %9 Original Paper %J JMIR Rehabil Assist Technol %G English %X Background: Mainstream smart home technologies (MSHTs), such as home automation devices and smart speakers, are becoming more powerful, affordable, and integrated into daily life. While not designed for individuals with disabilities, MSHT has the potential to serve as assistive technology to enhance their independence and participation. Objective: The study aims to describe a comprehensive MSHT-based intervention named ASSIST (Autonomy, Safety, and Social Integration via Smart Technologies) and evaluate its feasibility in enhancing the functional independence of individuals with complex physical disabilities. Methods: ASSIST is a time-limited intervention with a design based on the human activity assistive technology model, emphasizing client-centered goals and prioritizing individual needs. The intervention follows a structured assistive technology service delivery process that includes 2 assessment sessions to determine technology recommendations, installation and setup of the recommended technology, and up to 8 training sessions. An occupational therapist led the intervention, supported by a contractor and a technologist. Feasibility was evaluated through several measures: (1) the ASSIST Functional Performance Index, which quantifies the number of tasks transitioned from requiring assistance to independent completion and from higher levels of assistance or effort to lower levels; (2) pre- and postintervention measures of perceived task performance and satisfaction using a 10-point scale; (3) the number and types of tasks successfully addressed, along with the costs of devices and installation services; and (4) training effectiveness using the Goal Attainment Scale (GAS). Results: In total, 17 powered wheelchair users with complex physical disabilities completed the study with 100% session attendance. Across participants, 127 tasks were addressed, with 2 to 10 tasks at an average cost of US $3308 (SD US $1192) per participant. Of these tasks, 95 (74.8%) transitioned from requiring partial or complete assistance to independent completion, while 24 (18.9%) either improved from requiring complete to partial assistance or, if originally performed independently, required reduced effort. Only 8 (6.3%) tasks showed no changes. All training goals, except for 2, were achieved at or above the expected level, with a baseline average GAS score of 22.6 (SD 3.5) and a posttraining average GAS score of 77.2 (SD 4.5). Perceived task performance and satisfaction showed significant improvement, with performance score increasing from a baseline mean of 2.6 (SD 1.2) to 8.8 (SD 1.0; P<.001) and satisfaction score rising from an average of 2.9 (SD 1.3) to 9.0 (SD 0.9; P<.001). Conclusions: The ASSIST intervention demonstrated the immediate benefits of enhancing functional independence and satisfaction with MSHT among individuals with complex physical disabilities. While MSHT shows promise in addressing daily living needs at lower costs, barriers such as digital literacy, device setup, and caregiver involvement remain. Future work should focus on scalable models, caregiver engagement, and sustainable solutions for real-world implementation. %M 40272873 %R 10.2196/70855 %U https://rehab.jmir.org/2025/1/e70855 %U https://doi.org/10.2196/70855 %U http://www.ncbi.nlm.nih.gov/pubmed/40272873 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e60367 %T Prediction of Reactivation After Antivascular Endothelial Growth Factor Monotherapy for Retinopathy of Prematurity: Multimodal Machine Learning Model Study %A Wu,Rong %A Zhang,Yu %A Huang,Peijie %A Xie,Yiying %A Wang,Jianxun %A Wang,Shuangyong %A Lin,Qiuxia %A Bai,Yichen %A Feng,Songfu %A Cai,Nian %A Lu,Xiaohe %+ Department of Ophthalmology, Zhujiang Hospital, Southern Medical University, No 253 Gongyedadao Middle Road, Guangzhou, 510260, China, 86 15002000613, luxh63@163.com %K retinopathy of prematurity %K reactivation %K prediction %K machine learning %K deep learning %K anti-VEGF %D 2025 %7 23.4.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Retinopathy of prematurity (ROP) is the leading preventable cause of childhood blindness. A timely intravitreal injection of antivascular endothelial growth factor (anti-VEGF) is required to prevent retinal detachment with consequent vision impairment and loss. However, anti-VEGF has been reported to be associated with ROP reactivation. Therefore, an accurate prediction of reactivation after treatment is urgently needed. Objective: To develop and validate prediction models for reactivation after anti-VEGF intravitreal injection in infants with ROP using multimodal machine learning algorithms. Methods: Infants with ROP undergoing anti-VEGF treatment were recruited from 3 hospitals, and conventional machine learning, deep learning, and fusion models were constructed. The areas under the curve (AUCs), accuracy, sensitivity, and specificity were used to show the performances of the prediction models. Results: A total of 239 cases with anti-VEGF treatment were recruited, including 90 (37.66%) with reactivation and 149 (62.34%) nonreactivation cases. The AUCs for the conventional machine learning model were 0.806 and 0.805 in the internal validation and test groups, respectively. The average AUC, sensitivity, and specificity in the test for the deep learning model were 0.787, 0.800, and 0.570, respectively. The specificity, AUC, and sensitivity for the fusion model were 0.686, 0.822, and 0.800 in a test, separately. Conclusions: We constructed 3 prediction models for ROP reactivation. The fusion model achieved the best performance. Using this prediction model, we could optimize strategies for treating ROP in infants and develop better screening plans after treatment. %M 40267476 %R 10.2196/60367 %U https://www.jmir.org/2025/1/e60367 %U https://doi.org/10.2196/60367 %U http://www.ncbi.nlm.nih.gov/pubmed/40267476 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e66986 %T Health Care Professionals’ Concerns About Medical AI and Psychological Barriers and Strategies for Successful Implementation: Scoping Review %A Arvai,Nora %A Katonai,Gellért %A Mesko,Bertalan %+ , Kálmán Laki Doctoral School of Biomedical and Clinical Sciences, University of Debrecen, Egyetem tér 1. Főépület fszt.15/A, Debrecen, 4032, Hungary, 36 52 258 010, noraarvai.endoblog@gmail.com %K artificial intelligence %K attitudes %K health care professionals %K digital health %K fear %K anxiety %K reluctance %K resistance %K skepticism %D 2025 %7 23.4.2025 %9 Review %J J Med Internet Res %G English %X Background: The rapid progress in the development of artificial intelligence (AI) is having a substantial impact on health care (HC) delivery and the physician-patient interaction. Objective: This scoping review aims to offer a thorough analysis of the current status of integrating AI into medical practice as well as the apprehensions expressed by HC professionals (HCPs) over its application. Methods: This scoping review used the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines to examine articles that investigated the apprehensions of HCPs about medical AI. Following the application of inclusion and exclusion criteria, 32 of an initial 217 studies (14.7%) were selected for the final analysis. We aimed to develop an attitude range that accurately captured the unfavorable emotions of HCPs toward medical AI. We achieved this by selecting attitudes and ranking them on a scale that represented the degree of aversion, ranging from mild skepticism to intense fear. The ultimate depiction of the scale was as follows: skepticism, reluctance, anxiety, resistance, and fear. Results: In total, 3 themes were identified through the process of thematic analysis. National surveys performed among HCPs aimed to comprehensively analyze their current emotions, worries, and attitudes regarding the integration of AI in the medical industry. Research on technostress primarily focused on the psychological dimensions of adopting AI, examining the emotional reactions, fears, and difficulties experienced by HCPs when they encountered AI-powered technology. The high-level perspective category included studies that took a broad and comprehensive approach to evaluating overarching themes, trends, and implications related to the integration of AI technology in HC. We discovered 15 sources of attitudes, which we classified into 2 distinct groups: intrinsic and extrinsic. The intrinsic group focused on HCPs’ inherent professional identity, encompassing their tasks and capacities. Conversely, the extrinsic group pertained to their patients and the influence of AI on patient care. Next, we examined the shared themes and made suggestions to potentially tackle the problems discovered. Ultimately, we analyzed the results in relation to the attitude scale, assessing the degree to which each attitude was portrayed. Conclusions: The solution to addressing resistance toward medical AI appears to be centered on comprehensive education, the implementation of suitable legislation, and the delineation of roles. Addressing these issues may foster acceptance and optimize AI integration, enhancing HC delivery while maintaining ethical standards. Due to the current prominence and extensive research on regulation, we suggest that further research could be dedicated to education. %M 40267462 %R 10.2196/66986 %U https://www.jmir.org/2025/1/e66986 %U https://doi.org/10.2196/66986 %U http://www.ncbi.nlm.nih.gov/pubmed/40267462 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e69813 %T Interoperability Framework of the European Health Data Space for the Secondary Use of Data: Interactive European Interoperability Framework–Based Standards Compliance Toolkit for AI-Driven Projects %A Hussein,Rada %A Gyrard,Amelie %A Abedian,Somayeh %A Gribbon,Philip %A Martínez,Sara Alabart %+ Ludwig Boltzmann Institute for Digital Health and Prevention, Lindhofstraße 22, Salzburg, 5020, Austria, 43 57 255 82713, rada.hussein@dhp.lbg.ac.at %K artificial intelligence %K European Health Data Space %K European interoperability framework %K healthcare standards interoperability %K secondary use of health data %D 2025 %7 23.4.2025 %9 Viewpoint %J J Med Internet Res %G English %X The successful implementation of the European Health Data Space (EHDS) for the secondary use of data (known as EHDS2) hinges on overcoming significant challenges, including the proper implementation of interoperability standards, harmonization of diverse national approaches to data governance, and the integration of rapidly evolving AI technologies. This work addresses these challenges by developing an interactive toolkit that leverages insights from 7 leading cancer research projects (Integration of Heterogeneous Data and Evidence towards Regulatory and HTA Acceptance [IDERHA], European Federation for Cancer Images [EUCAIM], Artificial intelligence Supporting Cancer Patients across Europe [ASCAPE], Personalised Health Monitoring and Decision Support Based On Artificial Intelligence and Holistic Health Records [iHelp], Central repository for digital pathology [Bigpicture], Piloting an infrastructure for the secondary use of health data [HealthData@EU] pilot, and improving cancer diagnosis and prediction with AI and big data [INCISIVE]) to guide in shaping the EHDS2 interoperability framework. Building upon the foundations laid by the Towards the European Health Data Space (TEHDAS) joint action (JA) and the new European Interoperability Framework (EIF), the toolkit incorporates several key innovative features. First, it provides interactive and user-friendly entry modules to support European projects in creating their own interoperability frameworks aligned with the evolving EHDS2 requirements technical and governance requirements. Second, it guides projects in navigating the complex landscape of health data standards, emphasizing the need for a balanced approach to implementing the EHDS2 recommended standards for data discoverability and sharing. Third, the toolkit fosters collaboration and knowledge sharing among projects by enabling them to share their experiences and best practices in implementing standards and addressing interoperability challenges. Finally, the toolkit recognizes the dynamic nature of the EHDS2 and the evolving regulatory landscape, including the impact of AI regulations and related standards. This allows for continuous adaptation and improvement, ensuring the toolkit remains relevant and useful for future projects. In collaboration with HSbooster.eu, the toolkit will be disseminated to a wider audience of projects and experts, facilitating broader feedback and continuous improvement. This collaborative approach will foster harmonized standards implementation across projects that ultimately contribute to the development of a common EHDS2 interoperability framework. %M 40266673 %R 10.2196/69813 %U https://www.jmir.org/2025/1/e69813 %U https://doi.org/10.2196/69813 %U http://www.ncbi.nlm.nih.gov/pubmed/40266673 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e69293 %T Development and Validation of a Dynamic Real-Time Risk Prediction Model for Intensive Care Units Patients Based on Longitudinal Irregular Data: Multicenter Retrospective Study %A Zheng,Zhuo %A Luo,Jiawei %A Zhu,Yingchao %A Du,Lei %A Lan,Lan %A Zhou,Xiaobo %A Yang,Xiaoyan %A Huang,Shixin %+ , Department of Scientific Research, The People’s Hospital of Yubei District of Chongqing City, 23 Central Park North Road, Yubei District, Chongqing, 401120, China, 86 15803659045, d200101011@stu.cqupt.edu.cn %K intensive care units %K machine learning %K in-hospital mortality %K continuous prediction %K model interpretability %D 2025 %7 23.4.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Timely and accurate prediction of short-term mortality is critical in intensive care units (ICUs), where patients’ conditions change rapidly. Traditional scoring systems, such as the Simplified Acute Physiology Score and Acute Physiology and Chronic Health Evaluation, rely on static variables collected within the first 24 hours of admission and do not account for continuously evolving clinical states. These systems lack real-time adaptability, interpretability, and generalizability. With the increasing availability of high-frequency electronic medical record (EMR) data, machine learning (ML) approaches have emerged as powerful tools to model complex temporal patterns and support dynamic clinical decision-making. However, existing models are often limited by their inability to handle irregular sampling and missing values, and many lack rigorous external validation across institutions. Objective: We aimed to develop a real-time, interpretable risk prediction model that continuously assesses ICU patient mortality using irregular, longitudinal EMR data, with improved performance and generalizability over traditional static scoring systems. Methods: A time-aware bidirectional attention-based long short-term memory (TBAL) model was developed using EMR data from the MIMIC-IV (Medical Information Mart for Intensive Care) and eICU Collaborative Research Database (eICU-CRD) databases, comprising 176,344 ICU stays. The model incorporated dynamic variables, including vital signs, laboratory results, and medication data, updated hourly, to perform static and continuous mortality risk assessments. External cross-validation and subgroup sensitivity analyses were conducted to evaluate robustness and fairness. Model performance was assessed using the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), accuracy, and F1-score. Interpretability was enhanced using integrated gradients to identify key predictors. Results: For the static 12-hour to 1-day mortality prediction task, the TBAL model achieved AUROCs of 95.9 (95% CI 94.2-97.5) and 93.3 (95% CI 91.5-95.3) and AUPRCs of 48.5 and 21.6 in MIMIC-IV and eICU-CRD, respectively. Accuracy and F1-scores reached 94.1 and 46.7 in MIMIC-IV and 92.2 and 28.1 in eICU-CRD. In dynamic prediction tasks, AUROCs reached 93.6 (95% CI 93.2-93.9) and 91.9 (95% CI 91.6-92.1), with AUPRCs of 41.3 and 50, respectively. The model maintained high recall for positive cases (82.6% and 79.1% in MIMIC-IV and eICU-CRD). Cross-database validation yielded AUROCs of 81.3 and 76.1, confirming generalizability. Subgroup analysis showed stable performance across age, sex, and severity strata, with top predictors including lactate, vasopressor use, and Glasgow Coma Scale score. Conclusions: The TBAL model offers a robust, interpretable, and generalizable solution for dynamic real-time mortality risk prediction in ICU patients. Its ability to adapt to irregular temporal patterns and to provide hourly updated predictions positions it as a promising decision-support tool. Future work should validate its utility in prospective clinical trials and investigate its integration into real-world ICU workflows to enhance patient outcomes. %M 40266658 %R 10.2196/69293 %U https://www.jmir.org/2025/1/e69293 %U https://doi.org/10.2196/69293 %U http://www.ncbi.nlm.nih.gov/pubmed/40266658 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 4 %N %P e68960 %T Trust, Anxious Attachment, and Conversational AI Adoption Intentions in Digital Counseling: A Preliminary Cross-Sectional Questionnaire Study %A Wu,Xiaoli %A Liew,Kongmeng %A Dorahy,Martin J %+ , School of Psychology, Speech and Hearing, University of Canterbury, Private Bag 4800, Christchurch, 8140, New Zealand, 64 02059037078, xwu40@uclive.ac.nz %K attachment style %K conversational artificial intelligence %K CAI %K perceived trust %K adoption intentions %K CAI counseling %K mobile phone %D 2025 %7 22.4.2025 %9 Original Paper %J JMIR AI %G English %X Background: Conversational artificial intelligence (CAI) is increasingly used in various counseling settings to deliver psychotherapy, provide psychoeducational content, and offer support like companionship or emotional aid. Research has shown that CAI has the potential to effectively address mental health issues when its associated risks are handled with great caution. It can provide mental health support to a wider population than conventional face-to-face therapy, and at a faster response rate and more affordable cost. Despite CAI’s many advantages in mental health support, potential users may differ in their willingness to adopt and engage with CAI to support their own mental health. Objective: This study focused specifically on dispositional trust in AI and attachment styles, and examined how they are associated with individuals’ intentions to adopt CAI for mental health support. Methods: A cross-sectional survey of 239 American adults was conducted. Participants were first assessed on their attachment style, then presented with a vignette about CAI use, after which their dispositional trust and subsequent adoption intentions toward CAI counseling were surveyed. Participants had not previously used CAI for digital counseling for mental health support. Results: Dispositional trust in artificial intelligence emerged as a critical predictor of CAI adoption intentions (P<.001), while attachment anxiety (P=.04), rather than avoidance (P=.09), was found to be positively associated with the intention to adopt CAI counseling after controlling for age and gender. Conclusions: These findings indicated higher dispositional trust might lead to stronger adoption intention, and higher attachment anxiety might also be associated with greater CAI counseling adoption. Further research into users’ attachment styles and dispositional trust is needed to understand individual differences in CAI counseling adoption for enhancing the safety and effectiveness of CAI-driven counseling services and tailoring interventions. Trial Registration: Open Science Framework; https://osf.io/c2xqd %M 40262137 %R 10.2196/68960 %U https://ai.jmir.org/2025/1/e68960 %U https://doi.org/10.2196/68960 %U http://www.ncbi.nlm.nih.gov/pubmed/40262137 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e66530 %T Diagnosis Test Accuracy of Artificial Intelligence for Endometrial Cancer: Systematic Review and Meta-Analysis %A Wang,Longyun %A Wang,Zeyu %A Zhao,Bowei %A Wang,Kai %A Zheng,Jingying %A Zhao,Lijing %+ Department of Gynecology and Obstetrics, The Second Hospital of Jilin University, No.4026, Yatai Street, Changchun, 130000, China, 86 15704313636, zheng_jy@jlu.edu.cn %K artificial intelligence %K endometrial cancer %K diagnostic test accuracy %K systematic review %K meta-analysis %K machine learning %K deep learning %D 2025 %7 18.4.2025 %9 Review %J J Med Internet Res %G English %X Background: Endometrial cancer is one of the most common gynecological tumors, and early screening and diagnosis are crucial for its treatment. Research on the application of artificial intelligence (AI) in the diagnosis of endometrial cancer is increasing, but there is currently no comprehensive meta-analysis to evaluate the diagnostic accuracy of AI in screening for endometrial cancer. Objective: This paper presents a systematic review of AI-based endometrial cancer screening, which is needed to clarify its diagnostic accuracy and provide evidence for the application of AI technology in screening for endometrial cancer. Methods: A search was conducted across PubMed, Embase, Cochrane Library, Web of Science, and Scopus databases to include studies published in English, which evaluated the performance of AI in endometrial cancer screening. A total of 2 independent reviewers screened the titles and abstracts, and the quality of the selected studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies—2 (QUADAS-2) tool. The certainty of the diagnostic test evidence was evaluated using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) system. Results: A total of 13 studies were included, and the hierarchical summary receiver operating characteristic model used for the meta-analysis showed that the overall sensitivity of AI-based endometrial cancer screening was 86% (95% CI 79%-90%) and specificity was 92% (95% CI 87%-95%). Subgroup analysis revealed similar results across AI type, study region, publication year, and study type, but the overall quality of evidence was low. Conclusions: AI-based endometrial cancer screening can effectively detect patients with endometrial cancer, but large-scale population studies are needed in the future to further clarify the diagnostic accuracy of AI in screening for endometrial cancer. Trial Registration: PROSPERO CRD42024519835; https://www.crd.york.ac.uk/PROSPERO/view/CRD42024519835 %M 40249940 %R 10.2196/66530 %U https://www.jmir.org/2025/1/e66530 %U https://doi.org/10.2196/66530 %U http://www.ncbi.nlm.nih.gov/pubmed/40249940 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e66491 %T Artificial Intelligence Models for Pediatric Lung Sound Analysis: Systematic Review and Meta-Analysis %A Park,Ji Soo %A Park,Sa-Yoon %A Moon,Jae Won %A Kim,Kwangsoo %A Suh,Dong In %+ Department of Pediatrics, Seoul National University College of Medicine, 101, Daehak-Ro Jongno-Gu, Seoul, 03080, Republic of Korea, 82 2 2072 362, dongins0@snu.ac.kr %K machine learning %K respiratory disease classification %K wheeze detection %K auscultation %K mel-spectrogram %K abnormal lung sound detection %K artificial intelligence %K pediatric %K lung sound analysis %K systematic review %K asthma %K pneumonia %K children %K morbidity %K mortality %K diagnostic %K respiratory pathology %D 2025 %7 18.4.2025 %9 Review %J J Med Internet Res %G English %X Background: Pediatric respiratory diseases, including asthma and pneumonia, are major causes of morbidity and mortality in children. Auscultation of lung sounds is a key diagnostic tool but is prone to subjective variability. The integration of artificial intelligence (AI) and machine learning (ML) with electronic stethoscopes offers a promising approach for automated and objective lung sound. Objective: This systematic review and meta-analysis assess the performance of ML models in pediatric lung sound analysis. The study evaluates the methodologies, model performance, and database characteristics while identifying limitations and future directions for clinical implementation. Methods: A systematic search was conducted in Medline via PubMed, Embase, Web of Science, OVID, and IEEE Xplore for studies published between January 1, 1990, and December 16, 2024. Inclusion criteria are as follows: studies developing ML models for pediatric lung sound classification with a defined database, physician-labeled reference standard, and reported performance metrics. Exclusion criteria are as follows: studies focusing on adults, cardiac auscultation, validation of existing models, or lacking performance metrics. Risk of bias was assessed using a modified Quality Assessment of Diagnostic Accuracy Studies (version 2) framework. Data were extracted on study design, dataset, ML methods, feature extraction, and classification tasks. Bivariate meta-analysis was performed for binary classification tasks, including wheezing and abnormal lung sound detection. Results: A total of 41 studies met the inclusion criteria. The most common classification task was binary detection of abnormal lung sounds, particularly wheezing. Pooled sensitivity and specificity for wheeze detection were 0.902 (95% CI 0.726-0.970) and 0.955 (95% CI 0.762-0.993), respectively. For abnormal lung sound detection, pooled sensitivity was 0.907 (95% CI 0.816-0.956) and specificity 0.877 (95% CI 0.813-0.921). The most frequently used feature extraction methods were Mel-spectrogram, Mel-frequency cepstral coefficients, and short-time Fourier transform. Convolutional neural networks were the predominant ML model, often combined with recurrent neural networks or residual network architectures. However, high heterogeneity in dataset size, annotation methods, and evaluation criteria were observed. Most studies relied on small, single-center datasets, limiting generalizability. Conclusions: ML models show high accuracy in pediatric lung sound analysis, but face limitations due to dataset heterogeneity, lack of standard guidelines, and limited external validation. Future research should focus on standardized protocols and the development of large-scale, multicenter datasets to improve model robustness and clinical implementation. %M 40249944 %R 10.2196/66491 %U https://www.jmir.org/2025/1/e66491 %U https://doi.org/10.2196/66491 %U http://www.ncbi.nlm.nih.gov/pubmed/40249944 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e64902 %T Effect of Uncertainty-Aware AI Models on Pharmacists’ Reaction Time and Decision-Making in a Web-Based Mock Medication Verification Task: Randomized Controlled Trial %A Lester,Corey %A Rowell,Brigid %A Zheng,Yifan %A Co,Zoe %A Marshall,Vincent %A Kim,Jin Yong %A Chen,Qiyuan %A Kontar,Raed %A Yang,X Jessie %K artificial intelligence %K AI %K human-computer interaction %K decision-making %K human factors %K randomized controlled trial %K clinical decision support %K prediction %K pharmacist %K verification %K drug development %K drug %K diagnosis %K clinical decision support systems %D 2025 %7 18.4.2025 %9 %J JMIR Med Inform %G English %X Background: Artificial intelligence (AI)–based clinical decision support systems are increasingly used in health care. Uncertainty-aware AI presents the model’s confidence in its decision alongside its prediction, whereas black-box AI only provides a prediction. Little is known about how this type of AI affects health care providers’ work performance and reaction time. Objective: This study aimed to determine the effects of black-box and uncertainty-aware AI advice on pharmacist decision-making and reaction time. Methods: Recruitment emails were sent to pharmacists through professional listservs describing a web-based, crossover, randomized controlled trial. Participants were randomized to the black-box AI or uncertainty-aware AI condition in a 1:1 manner. Participants completed 100 mock verification tasks with AI help and 100 without AI help. The order of no help and AI help was randomized. Participants were exposed to correct and incorrect prescription fills, where the correct decision was to “accept” or “reject,” respectively. AI help provided correct (79%) or incorrect (21%) advice. Reaction times, participant decisions, AI advice, and AI help type were recorded for each verification. Likelihood ratio tests compared means across the three categories of AI type for each level of AI correctness. Results: A total of 30 participants provided complete datasets. An equal number of participants were in each AI condition. Participants’ decision-making performance and reaction times differed across the 3 conditions. Accurate AI recommendations resulted in the rejection of the incorrect drug 96.1% and 91.8% of the time for uncertainty-aware AI and black-box AI respectively, compared with 81.2% without AI help. Correctly dispensed medications were accepted at rates of 99.2% with black-box help, 94.1% with uncertainty-aware AI help, and 94.6% without AI help. Uncertainty-aware AI protected against bad AI advice to approve an incorrectly filled medication compared with black-box AI (83.3% vs 76.7%). When the AI recommended rejecting a correctly filled medication, pharmacists without AI help had a higher rate of correctly accepting the medication (94.6%) compared with uncertainty-aware AI help (86.2%) and black-box AI help (81.2%). Uncertainty-aware AI resulted in shorter reaction times than black-box AI and no AI help except in the scenario where “AI rejects the correct drug.” Black-box AI did not lead to reduced reaction times compared with pharmacists acting alone. Conclusions: Pharmacists’ performance and reaction times varied by AI type and AI accuracy. Overall, uncertainty-aware AI resulted in faster decision-making and acted as a safeguard against bad AI advice to approve a misfilled medication. Conversely, black-box AI had the longest reaction times, and user performance degraded in the presence of bad AI advice. However, uncertainty-aware AI could result in unnecessary double-checks, but it is preferred over false negative advice, where patients receive the wrong medication. These results highlight the importance of well-designed AI that addresses users’ needs, enhances performance, and avoids overreliance on AI. Trial Registration: ClinicalTrials.gov NCT06795477; https://clinicaltrials.gov/study/NCT06795477 %R 10.2196/64902 %U https://medinform.jmir.org/2025/1/e64902 %U https://doi.org/10.2196/64902 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e63130 %T Generating Artificial Patients With Reliable Clinical Characteristics Using a Geometry-Based Variational Autoencoder: Proof-of-Concept Feasibility Study %A Ferré,Fabrice %A Allassonnière,Stéphanie %A Chadebec,Clément %A Minville,Vincent %+ Department of Anesthesia, Intensive Care and Perioperative Medicine, Purpan University Hospital, Place du Dr Baylac, Toulouse, 31300, France, 33 561779988, fabriceferre31@gmail.com %K digital health %K artificial data %K variational autoencoder %K data science %K artificial intelligence %K health monitoring %K deep learning %K medical imaging %K imaging %K magnetic resonance imaging %K Alzheimer disease %K anesthesia %K prediction %K data augmentation %D 2025 %7 17.4.2025 %9 Short Paper %J J Med Internet Res %G English %X Background: Artificial patient technology could transform health care by accelerating diagnosis, treatment, and mapping clinical pathways. Deep learning methods for generating artificial data in health care include data augmentation by variational autoencoders (VAE) technology. Objective: We aimed to test the feasibility of generating artificial patients with reliable clinical characteristics by using a geometry-based VAE applied, for the first time, on high-dimension, low-sample-size tabular data. Methods: Clinical tabular data were extracted from 521 real patients of the “MAX” digital conversational agent (BOTdesign) created for preparing patients for anesthesia. A 3-stage methodological approach was implemented to generate up to 10,000 artificial patients: training the model and generating artificial data, assessing the consistency and confidentiality of artificial data, and validating the plausibility of the newly created artificial patients. Results: We demonstrated the feasibility of applying the VAE technique to tabular data to generate large artificial patient cohorts with high consistency (fidelity scores>94%). Moreover, artificial patients could not be matched with real patients (filter similarity scores>99%, κ coefficients of agreement<0.2), thus guaranteeing the essential ethical concern of confidentiality. Conclusions: This proof-of-concept study has demonstrated our ability to augment real tabular data to generate artificial patients. These promising results make it possible to envisage in silico trials carried out on large cohorts of artificial patients, thereby overcoming the pitfalls usually encountered in in vivo trials. Further studies integrating longitudinal dynamics are needed to map patient trajectories. %M 40245392 %R 10.2196/63130 %U https://www.jmir.org/2025/1/e63130 %U https://doi.org/10.2196/63130 %U http://www.ncbi.nlm.nih.gov/pubmed/40245392 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e70535 %T Unveiling the Potential of Large Language Models in Transforming Chronic Disease Management: Mixed Methods Systematic Review %A Li,Caixia %A Zhao,Yina %A Bai,Yang %A Zhao,Baoquan %A Tola,Yetunde Oluwafunmilayo %A Chan,Carmen WH %A Zhang,Meifen %A Fu,Xia %+ The Department of Nursing, The Eighth Affiliated Hospital, Sun Yat-sen University, No. 3025, Shennan Middle Road, Room 501, The Administrative Building, Shenzhen, 518033, China, 86 13829706026, fuxia5@mail.sysu.edu.cn %K artificial intelligence %K chronic disease %K health management %K large language model %K systematic review %D 2025 %7 16.4.2025 %9 Review %J J Med Internet Res %G English %X Background: Chronic diseases are a major global health burden, accounting for nearly three-quarters of the deaths worldwide. Large language models (LLMs) are advanced artificial intelligence systems with transformative potential to optimize chronic disease management; however, robust evidence is lacking. Objective: This review aims to synthesize evidence on the feasibility, opportunities, and challenges of LLMs across the disease management spectrum, from prevention to screening, diagnosis, treatment, and long-term care. Methods: Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) guidelines, 11 databases (Cochrane Central Register of Controlled Trials, CINAHL, Embase, IEEE Xplore, MEDLINE via Ovid, ProQuest Health & Medicine Collection, ScienceDirect, Scopus, Web of Science Core Collection, China National Knowledge Internet, and SinoMed) were searched on April 17, 2024. Intervention and simulation studies that examined LLMs in the management of chronic diseases were included. The methodological quality of the included studies was evaluated using a rating rubric designed for simulation-based research and the risk of bias in nonrandomized studies of interventions tool for quasi-experimental studies. Narrative analysis with descriptive figures was used to synthesize the study findings. Random-effects meta-analyses were conducted to assess the pooled effect estimates of the feasibility of LLMs in chronic disease management. Results: A total of 20 studies examined general-purpose (n=17) and retrieval-augmented generation-enhanced LLMs (n=3) for the management of chronic diseases, including cancer, cardiovascular diseases, and metabolic disorders. LLMs demonstrated feasibility across the chronic disease management spectrum by generating relevant, comprehensible, and accurate health recommendations (pooled accurate rate 71%, 95% CI 0.59-0.83; I2=88.32%) with retrieval-augmented generation-enhanced LLMs having higher accuracy rates compared to general-purpose LLMs (odds ratio 2.89, 95% CI 1.83-4.58; I2=54.45%). LLMs facilitated equitable information access; increased patient awareness regarding ailments, preventive measures, and treatment options; and promoted self-management behaviors in lifestyle modification and symptom coping. Additionally, LLMs facilitate compassionate emotional support, social connections, and health care resources to improve the health outcomes of chronic diseases. However, LLMs face challenges in addressing privacy, language, and cultural issues; undertaking advanced tasks, including diagnosis, medication, and comorbidity management; and generating personalized regimens with real-time adjustments and multiple modalities. Conclusions: LLMs have demonstrated the potential to transform chronic disease management at the individual, social, and health care levels; however, their direct application in clinical settings is still in its infancy. A multifaceted approach that incorporates robust data security, domain-specific model fine-tuning, multimodal data integration, and wearables is crucial for the evolution of LLMs into invaluable adjuncts for health care professionals to transform chronic disease management. Trial Registration: PROSPERO CRD42024545412; https://www.crd.york.ac.uk/PROSPERO/view/CRD42024545412 %M 40239198 %R 10.2196/70535 %U https://www.jmir.org/2025/1/e70535 %U https://doi.org/10.2196/70535 %U http://www.ncbi.nlm.nih.gov/pubmed/40239198 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67772 %T Acoustic Features for Identifying Suicide Risk in Crisis Hotline Callers: Machine Learning Approach %A Su,Zhengyuan %A Jiang,Huadong %A Yang,Ying %A Hou,Xiangqing %A Su,Yanli %A Yang,Li %+ , Laboratory of Suicidal Behavior Research, Tianjin University, 135 Yaguan Road, Jinnan District, Tianjin, 300354, China, 86 13752183496, yangli@tju.edu.cn %K suicide %K crisis hotline %K acoustic feature %K machine learning %K acoustics %K suicide risk %K artificial intelligence %K feasibility %K prediction models %K hotline callers %K voice %D 2025 %7 14.4.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Crisis hotlines serve as a crucial avenue for the early identification of suicide risk, which is of paramount importance for suicide prevention and intervention. However, assessing the risk of callers in the crisis hotline context is constrained by factors such as lack of nonverbal communication cues, anonymity, time limits, and single-occasion intervention. Therefore, it is necessary to develop approaches, including acoustic features, for identifying the suicide risk among hotline callers early and quickly. Given the complicated features of sound, adopting artificial intelligence models to analyze callers’ acoustic features is promising. Objective: In this study, we investigated the feasibility of using acoustic features to predict suicide risk in crisis hotline callers. We also adopted a machine learning approach to analyze the complex acoustic features of hotline callers, with the aim of developing suicide risk prediction models. Methods: We collected 525 suicide-related calls from the records of a psychological assistance hotline in a province in northwest China. Callers were categorized as low or high risk based on suicidal ideation, suicidal plans, and history of suicide attempts, with risk assessments verified by a team of 18 clinical psychology raters. A total of 164 clearly categorized risk recordings were analyzed, including 102 low-risk and 62 high-risk calls. We extracted 273 audio segments, each exceeding 2 seconds in duration, which were labeled by raters as containing suicide-related expressions for subsequent model training and evaluation. Basic acoustic features (eg, Mel Frequency Cepstral Coefficients, formant frequencies, jitter, shimmer) and high-level statistical function (HSF) features (using OpenSMILE [Open-Source Speech and Music Interpretation by Large-Space Extraction] with the ComParE 2016 configuration) were extracted. Four supervised machine learning algorithms (logistic regression, support vector machine, random forest, and extreme gradient boosting) were trained and evaluated using grouped 5-fold cross-validation and a test set, with performance metrics, including accuracy, F1-score, recall, and false negative rate. Results: The development of machine learning models utilizing HSF acoustic features has been demonstrated to enhance recognition performance compared to models based solely on basic acoustic features. The random forest classifier, developed with HSFs, achieved the best performance in detecting the suicide risk among the models evaluated (accuracy=0.75, F1-score=0.70, recall=0.76, false negative rate=0.24). Conclusions: The results of our study demonstrate the potential of developing artificial intelligence–based early warning systems using acoustic features for identifying the suicide risk among crisis hotline callers. Our work also has implications for employing acoustic features to identify suicide risk in salient voice contexts. %M 40228243 %R 10.2196/67772 %U https://www.jmir.org/2025/1/e67772 %U https://doi.org/10.2196/67772 %U http://www.ncbi.nlm.nih.gov/pubmed/40228243 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 14 %N %P e63017 %T Health Care Social Robots in the Age of Generative AI: Protocol for a Scoping Review %A Lempe,Paul Notger %A Guinemer,Camille %A Fürstenau,Daniel %A Dressler,Corinna %A Balzer,Felix %A Schaaf,Thorsten %+ Institute of Medical Informatics, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin & Humboldt–Universität zu Berlin, Charitéplatz 1, Berlin, Germany, 49 30 450 570 425, thorsten.schaaf@charite.de %K robotics %K social robots %K artificial intelligence %K generative AI %K human-robot interaction %K health care sector %K PRISMA %D 2025 %7 14.4.2025 %9 Protocol %J JMIR Res Protoc %G English %X Background: Social robots (SR), sensorimotor machines designed to interact with humans, can help to respond to the increasing demands in the health care sector. To ensure the successful use of this technology, acceptance is paramount. Generative artificial intelligence (AI) is an emerging technology with the potential to enhance the functionality of SR and promote user acceptance by further improving human-robot interaction. Objective: We present a protocol for a scoping review of the literature on the implementation of generative AI in SR in the health care sector. The aim of this scoping review is to map out the intersection of SR and generative AI in the health care sector; to explore if generative AI is applied in SR in the health care sector; to outline which models of generative AI and SR are used for these implementations; and to explore whether user acceptance is reported as an outcome following these implementations. This scoping review supports future research by providing an overview of the state of connectedness of 2 emerging technologies and by mapping out research gaps. Methods: We follow the methodological framework developed by Arksey and O'Malley and the recommendations by the Joanna Briggs Institute. Our protocol was drafted using the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-analyses extension for Scoping Reviews). We will conduct a systematic literature search of the online databases MEDLINE, Embase, CINAHL (Cumulative Index to Nursing and Allied Health Literature), Web of Science, and IEEE Xplore, aiming to retrieve relevant data items via tabular data charting from references meeting specific inclusion criteria which are studies published from 2010 onwards, set in the health care sector, focusing on SR with physical bodies and implemented generative AI. There are no restrictions on study types. Results will be categorized, clustered, and summarized using tables, graphs, visual representations, and narratives. Results: After conducting a preliminary search and deduplication in the second quarter of 2024, we retrieved 3176 preliminary results. This scoping review will be supplemented with the next methodological steps, including retrieving the results in a reference management tool as well as screening titles, abstracts, and full text regarding specific inclusion criteria. The completion of these steps is scheduled for the second quarter of 2025. Limitations based on the heterogeneity of the included studies and the general breadth of a scoping review compared to a systematic review are to be expected. To reduce bias, we adopted a system of dual reviews and thorough documentation of the study selection. Conclusions: The conducted preliminary search implies that there are a sufficient number of heterogeneous references to complete this scoping review. To our knowledge, this is the first scoping review on generative AI in health care SR. International Registered Report Identifier (IRRID): PRR1-10.2196/63017 %M 40227846 %R 10.2196/63017 %U https://www.researchprotocols.org/2025/1/e63017 %U https://doi.org/10.2196/63017 %U http://www.ncbi.nlm.nih.gov/pubmed/40227846 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e59002 %T Text-Based Depression Prediction on Social Media Using Machine Learning: Systematic Review and Meta-Analysis %A Phiri,Doreen %A Makowa,Frank %A Amelia,Vivi Leona %A Phiri,Yohane Vincent Abero %A Dlamini,Lindelwa Portia %A Chung,Min-Huey %+ School of Nursing, College of Nursing, Taipei Medical University, 250 Wu-Xing Street, Taipei, 110, Taiwan, 886 227361661 ext 6317, minhuey300@tmu.edu.tw %K depression %K social media %K machine learning %K meta-analysis %K text-based %K depression prediction %D 2025 %7 11.4.2025 %9 Review %J J Med Internet Res %G English %X Background: Depression affects more than 350 million people globally. Traditional diagnostic methods have limitations. Analyzing textual data from social media provides new insights into predicting depression using machine learning. However, there is a lack of comprehensive reviews in this area, which necessitates further research. Objective: This review aims to assess the effectiveness of user-generated social media texts in predicting depression and evaluate the influence of demographic, language, social media activity, and temporal features on predicting depression on social media texts through machine learning. Methods: We searched studies from 11 databases (CINHAL [through EBSCOhost], PubMed, Scopus, Ovid MEDLINE, Embase, PubPsych, Cochrane Library, Web of Science, ProQuest, IEEE Explore, and ACM digital library) from January 2008 to August 2023. We included studies that used social media texts, machine learning, and reported area under the curve, Pearson r, and specificity and sensitivity (or data used for their calculation) to predict depression. Protocol papers and studies not written in English were excluded. We extracted study characteristics, population characteristics, outcome measures, and prediction factors from each study. A random effects model was used to extract the effect sizes with 95% CIs. Study heterogeneity was evaluated using forest plots and P values in the Cochran Q test. Moderator analysis was performed to identify the sources of heterogeneity. Results: A total of 36 studies were included. We observed a significant overall correlation between social media texts and depression, with a large effect size (r=0.630, 95% CI 0.565-0.686). We noted the same correlation and large effect size for demographic (largest effect size; r=0.642, 95% CI 0.489-0.757), social media activity (r=0.552, 95% CI 0.418-0.663), language (r=0.545, 95% CI 0.441-0.649), and temporal features (r=0.531, 95% CI 0.320-0.693). The social media platform type (public or private; P<.001), machine learning approach (shallow or deep; P=.048), and use of outcome measures (yes or no; P<.001) were significant moderators. Sensitivity analysis revealed no change in the results, indicating result stability. The Begg-Mazumdar rank correlation (Kendall τb=0.22063; P=.058) and the Egger test (2-tailed t34=1.28696; P=.207) confirmed the absence of publication bias. Conclusions: Social media textual content can be a useful tool for predicting depression. Demographics, language, social media activity, and temporal features should be considered to maximize the accuracy of depression prediction models. Additionally, the effects of social media platform type, machine learning approach, and use of outcome measures in depression prediction models need attention. Analyzing social media texts for depression prediction is challenging, and findings may not apply to a broader population. Nevertheless, our findings offer valuable insights for future research. Trial Registration: PROSPERO CRD42023427707; https://www.crd.york.ac.uk/PROSPERO/view/CRD42023427707 %M 40215481 %R 10.2196/59002 %U https://www.jmir.org/2025/1/e59002 %U https://doi.org/10.2196/59002 %U http://www.ncbi.nlm.nih.gov/pubmed/40215481 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 8 %N %P e64473 %T Artificial Intelligence-Driven Biological Age Prediction Model Using Comprehensive Health Checkup Data: Development and Validation Study %A Jeong,Chang-Uk %A Leiby,Jacob S %A Kim,Dokyoon %A Choe,Eun Kyung %K biological age %K aging clock %K mortality %K artificial intelligence %K machine learning %K record %K history %K health checkup %K clinical relevance %K gerontology %K geriatric %K older %K elderly %K aging %K prediction %K predictive %K life expectancy %K AI %D 2025 %7 11.4.2025 %9 %J JMIR Aging %G English %X Background: The global increase in life expectancy has not shown a similar rise in healthy life expectancy. Accurate assessment of biological aging is crucial for mitigating diseases and socioeconomic burdens associated with aging. Current biological age prediction models are limited by their reliance on conventional statistical methods and constrained clinical information. Objective: This study aimed to develop and validate an aging clock model using artificial intelligence, based on comprehensive health check-up data, to predict biological age and assess its clinical relevance. Methods: We used data from Koreans who underwent health checkups at the Seoul National University Hospital Gangnam Center as well as from the Korean Genome and Epidemiology Study. Our model incorporated 27 clinical factors and employed machine learning algorithms, including linear regression, least absolute shrinkage and selection operator, ridge regression, elastic net, random forest, support vector machine, gradient boosting, and K-nearest neighbors. Model performance was evaluated using adjusted R2 and the mean squared error (MSE) values. Shapley Additive exPlanation (SHAP) analysis was conducted to interpret the model’s predictions. Results: The Gradient Boosting model achieved the best performance with a mean (SE) MSE of 4.219 (0.14) and a mean (SE) R2 of 0.967 (0.001). SHAP analysis identified significant predictors of biological age, including kidney function markers, gender, glycated hemoglobin level, liver function markers, and anthropometric measurements. After adjusting for the chronological age, the predicted biological age showed strong associations with multiple clinical factors, such as metabolic status, body compositions, fatty liver, smoking status, and pulmonary function. Conclusions: Our aging clock model demonstrates a high predictive accuracy and clinical relevance, offering a valuable tool for personalized health monitoring and intervention. The model’s applicability in routine health checkups could enhance health management and promote regular health evaluations. %R 10.2196/64473 %U https://aging.jmir.org/2025/1/e64473 %U https://doi.org/10.2196/64473 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 8 %N %P e69504 %T Identifying Deprescribing Opportunities With Large Language Models in Older Adults: Retrospective Cohort Study %A Socrates,Vimig %A Wright,Donald S %A Huang,Thomas %A Fereydooni,Soraya %A Dien,Christine %A Chi,Ling %A Albano,Jesse %A Patterson,Brian %A Sasidhar Kanaparthy,Naga %A Wright,Catherine X %A Loza,Andrew %A Chartash,David %A Iscoe,Mark %A Taylor,Richard Andrew %+ Department of Biomedical Informatics and Data Science, School of Medicine, Yale University, 464 Congress Avenue, Suite 260, New Haven, CT, 06510, United States, 1 2037854058, richard.taylor@yale.edu %K deprescribing %K large language models %K geriatrics %K potentially inappropriate medication list %K emergency medicine %K natural language processing %K calibration %D 2025 %7 11.4.2025 %9 Original Paper %J JMIR Aging %G English %X Background: Polypharmacy, the concurrent use of multiple medications, is prevalent among older adults and associated with increased risks for adverse drug events including falls. Deprescribing, the systematic process of discontinuing potentially inappropriate medications, aims to mitigate these risks. However, the practical application of deprescribing criteria in emergency settings remains limited due to time constraints and criteria complexity. Objective: This study aims to evaluate the performance of a large language model (LLM)–based pipeline in identifying deprescribing opportunities for older emergency department (ED) patients with polypharmacy, using 3 different sets of criteria: Beers, Screening Tool of Older People’s Prescriptions, and Geriatric Emergency Medication Safety Recommendations. The study further evaluates LLM confidence calibration and its ability to improve recommendation performance. Methods: We conducted a retrospective cohort study of older adults presenting to an ED in a large academic medical center in the Northeast United States from January 2022 to March 2022. A random sample of 100 patients (712 total oral medications) was selected for detailed analysis. The LLM pipeline consisted of two steps: (1) filtering high-yield deprescribing criteria based on patients’ medication lists, and (2) applying these criteria using both structured and unstructured patient data to recommend deprescribing. Model performance was assessed by comparing model recommendations to those of trained medical students, with discrepancies adjudicated by board-certified ED physicians. Selective prediction, a method that allows a model to abstain from low-confidence predictions to improve overall reliability, was applied to assess the model’s confidence and decision-making thresholds. Results: The LLM was significantly more effective in identifying deprescribing criteria (positive predictive value: 0.83; negative predictive value: 0.93; McNemar test for paired proportions: χ21=5.985; P=.02) relative to medical students, but showed limitations in making specific deprescribing recommendations (positive predictive value=0.47; negative predictive value=0.93). Adjudication revealed that while the model excelled at identifying when there was a deprescribing criterion related to one of the patient’s medications, it often struggled with determining whether that criterion applied to the specific case due to complex inclusion and exclusion criteria (54.5% of errors) and ambiguous clinical contexts (eg, missing information; 39.3% of errors). Selective prediction only marginally improved LLM performance due to poorly calibrated confidence estimates. Conclusions: This study highlights the potential of LLMs to support deprescribing decisions in the ED by effectively filtering relevant criteria. However, challenges remain in applying these criteria to complex clinical scenarios, as the LLM demonstrated poor performance on more intricate decision-making tasks, with its reported confidence often failing to align with its actual success in these cases. The findings underscore the need for clearer deprescribing guidelines, improved LLM calibration for real-world use, and better integration of human–artificial intelligence workflows to balance artificial intelligence recommendations with clinician judgment. %M 40215480 %R 10.2196/69504 %U https://aging.jmir.org/2025/1/e69504 %U https://doi.org/10.2196/69504 %U http://www.ncbi.nlm.nih.gov/pubmed/40215480 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 9 %N %P e63700 %T Cost-Effectiveness Analysis of a Machine Learning–Based eHealth System to Predict and Reduce Emergency Department Visits and Unscheduled Hospitalizations of Older People Living at Home: Retrospective Study %A Havreng-Théry,Charlotte %A Fouchard,Arnaud %A Denis,Fabrice %A Veyron,Jacques-Henri %A Belmin,Joël %+ PRESAGE, 112-114 rue la Boétie, Paris, 75008, France, 33 622152004, jhveyron@presage.care %K monitoring %K older adult %K predictive tool %K home care aide %K emergency department visit %K cost-effectiveness %K artificial intelligence %K electronic health %K eHealth %K emergency department %K unscheduled hospitalization %K aging %K retrospective study %K medico-economic %K living at home %K nursing home %K emergency visit %K Brittany %K France %K machine learning %K remote monitoring %K digital health %K health informatics %D 2025 %7 11.4.2025 %9 Original Paper %J JMIR Form Res %G English %X Background: Dependent older people or those losing their autonomy are at risk of emergency hospitalization. Digital systems that monitor health remotely could be useful in reducing these visits by detecting worsening health conditions earlier. However, few studies have assessed the medico-economic impact of these systems, particularly for older people. Objective: The objective of this study was to compare the clinical and economic impacts of an eHealth device in real life compared with the usual monitoring of older people living at home. Methods: This study was a comparative, retrospective, and controlled trial on data collected between May 31, 2021, and May 31, 2022, in one health care and home nursing center located in Brittany, France. Participants had to be aged >75 years, living at home, and receiving assistance from the home care service for at least 1 month. We implemented among the intervention group an eHealth system that produces an alert for a high risk of emergency department visits or hospitalizations. After each home visit, the home care aides completed a questionnaire on participants’ functional status using a smartphone app, and the information was processed in real time by a previously developed machine learning algorithm that identifies patients at risk of an emergency visit within 7 to 14 days. In the case of predicted risk, the eHealth system alerted a coordinating nurse who could then inform the family carer and the patient’s nurses or general practitioner. Results: A total of 120 patients were included in the study, with 60 in the control group and 60 in the intervention group. Among the 726 visits from the intervention group that were not followed by an alert, only 4 (0.6%) resulted in hospitalizations (P<.001), confirming the relevance of the system’s alerts. Over the course of the study, 37 hospitalizations were recorded for 25 (20.8%) of the 120 patients. Additionally, of the 120 patients, 9 (7.5%) were admitted to a nursing home, and 7 (5.8%) died. Patients in the intervention group (56/60, 93%) remained at home significantly more often than those in the control group (48/60, 80%; P=.03). The total cost of primary care and hospitalization during the study was €167,000 (€1=US $1.09), with €108,000 (64.81%) attributed to the intervention group (P=.20). Conclusions: This study presents encouraging results on the impact of a remote medical monitoring system for older adults, demonstrating a reduction in both emergency department visits and hospitalization costs. Trial Registration: ClinicalTrials.gov NCT05221697; https://clinicaltrials.gov/study/NCT05221697 %M 40215100 %R 10.2196/63700 %U https://formative.jmir.org/2025/1/e63700 %U https://doi.org/10.2196/63700 %U http://www.ncbi.nlm.nih.gov/pubmed/40215100 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 9 %N %P e67767 %T Web-Based Explainable Machine Learning-Based Drug Surveillance for Predicting Sunitinib- and Sorafenib-Associated Thyroid Dysfunction: Model Development and Validation Study %A Chan,Fan-Ying %A Ku,Yi-En %A Lie,Wen-Nung %A Chen,Hsiang-Yin %K thyroid dysfunction %K machine learning %K cancer %K sunitinib %K sorafenib %K TKI %K tyrosine kinase inhibitor %D 2025 %7 10.4.2025 %9 %J JMIR Form Res %G English %X Background: Unlike one-snap data collection methods that only identify high-risk patients, machine learning models using time-series data can predict adverse events and aid in the timely management of cancer. Objective: This study aimed to develop and validate machine learning models for sunitinib- and sorafenib-associated thyroid dysfunction using a time-series data collection approach. Methods: Time series data of patients first prescribed sunitinib or sorafenib were collected from a deidentified clinical research database. Logistic regression, random forest, adaptive Boosting, Light Gradient-Boosting Machine, and Gradient Boosting Decision Tree were used to develop the models. Prediction performances were compared using the accuracy, precision, recall, F1-score, area under the receiver operating characteristic curve, and area under the precision-recall curve. The optimal threshold for the best-performing model was selected based on the maximum F1-score. SHapley Additive exPlanations analysis was conducted to assess feature importance and contributions at both the cohort and patient levels. Results: The training cohort included 609 patients, while the temporal validation cohort had 198 patients. The Gradient Boosting Decision Tree model without resampling outperformed other models, with area under the precision-recall curve of 0.600, area under the receiver operating characteristic curve of 0.876, and F1-score of 0.583 after adjusting the threshold. The SHapley Additive exPlanations analysis identified higher cholesterol levels, longer summed days of medication use, and clear cell adenocarcinoma histology as the most important features. The final model was further integrated into a web-based application. Conclusions: This model can serve as an explainable adverse drug reaction surveillance system for predicting sunitinib- and sorafenib-associated thyroid dysfunction. %R 10.2196/67767 %U https://formative.jmir.org/2025/1/e67767 %U https://doi.org/10.2196/67767 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67883 %T Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study %A Wei,Bin %A Yao,Lili %A Hu,Xin %A Hu,Yuxiang %A Rao,Jie %A Ji,Yu %A Dong,Zhuoer %A Duan,Yichong %A Wu,Xiaorong %+ Jiangxi Medical College, The First Affiliated Hospital, Nanchang University, No.17 Yongwai Zheng Street, Donghu District, Jiangxi Province, Nanchang, 330000, China, 86 136117093259, wxr98021@126.com %K LLM %K large language models %K ocular myasthenia gravis %K patient education %K China %K effectiveness %K deep learning %K artificial intelligence %K health care %K accuracy %K applicability %K neuromuscular disorder %K extraocular muscles %K ptosis %K diplopia %K ophthalmology %K ChatGPT %K clinical practice %K digital health %D 2025 %7 10.4.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Ocular myasthenia gravis (OMG) is a neuromuscular disorder primarily affecting the extraocular muscles, leading to ptosis and diplopia. Effective patient education is crucial for disease management; however, in China, limited health care resources often restrict patients’ access to personalized medical guidance. Large language models (LLMs) have emerged as potential tools to bridge this gap by providing instant, AI-driven health information. However, their accuracy and readability in educating patients with OMG remain uncertain. Objective: The purpose of this study was to systematically evaluate the effectiveness of multiple LLMs in the education of Chinese patients with OMG. Specifically, the validity of these models in answering patients with OMG-related questions was assessed through accuracy, completeness, readability, usefulness, and safety, and patients’ ratings of their usability and readability were analyzed. Methods: The study was conducted in two phases: 130 choice ophthalmology examination questions were input into 5 different LLMs. Their performance was compared with that of undergraduates, master’s students, and ophthalmology residents. In addition, 23 common patients with OMG-related patient questions were posed to 4 LLMs, and their responses were evaluated by ophthalmologists across 5 domains. In the second phase, 20 patients with OMG interacted with the 2 LLMs from the first phase, each asking 3 questions. Patients assessed the responses for satisfaction and readability, while ophthalmologists evaluated the responses again using the 5 domains. Results: ChatGPT o1-preview achieved the highest accuracy rate of 73% on 130 ophthalmology examination questions, outperforming other LLMs and professional groups like undergraduates and master’s students. For 23 common patients with OMG-related questions, ChatGPT o1-preview scored highest in correctness (4.44), completeness (4.44), helpfulness (4.47), and safety (4.6). GEMINI (Google DeepMind) provided the easiest-to-understand responses in readability assessments, while GPT-4o had the most complex responses, suitable for readers with higher education levels. In the second phase with 20 patients with OMG, ChatGPT o1-preview received higher satisfaction scores than Ernie 3.5 (Baidu; 4.40 vs 3.89, P=.002), although Ernie 3.5’s responses were slightly more readable (4.31 vs 4.03, P=.01). Conclusions: LLMs such as ChatGPT o1-preview may have the potential to enhance patient education. Addressing challenges such as misinformation risk, readability issues, and ethical considerations is crucial for their effective and safe integration into clinical practice. %M 40209226 %R 10.2196/67883 %U https://www.jmir.org/2025/1/e67883 %U https://doi.org/10.2196/67883 %U http://www.ncbi.nlm.nih.gov/pubmed/40209226 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67318 %T Bridging Data Gaps in Emergency Care: The NIGHTINGALE Project and the Future of AI in Mass Casualty Management %A , %A Caviglia,Marta %+ Center for Research and Training in Disaster Medicine, Humanitarian Aid and Global Health (CRIMEDIM), Università del Piemonte Orientale, Via Lanino 1, Novara, 28100, Italy, 39 0321 660 620, marta.caviglia@med.uniupo.it %K AI %K technology %K mass casualty incident %K incident management %K artificial intelligence %K emergency care %K MCI %K data gaps %K tool %D 2025 %7 10.4.2025 %9 Viewpoint %J J Med Internet Res %G English %X In the context of mass casualty incident (MCI) management, artificial intelligence (AI) represents a promising future, offering potential improvements in processes such as triage, decision support, and resource optimization. However, the effectiveness of AI is heavily reliant on the availability of quality data. Currently, MCI data are scarce and difficult to obtain, as critical information regarding patient demographics, vital signs, and treatment responses is often missing or incomplete, particularly in the prehospital setting. Although the NIGHTINGALE (Novel Integrated Toolkit for Enhanced Pre-Hospital Life Support and Triage in Challenging and Large Emergencies) project is actively addressing these challenges by developing a comprehensive toolkit designed to support first responders and enhance data collection during MCIs, significant work remains to ensure the tools are fully operational and can effectively integrate continuous monitoring and data management. To further advance these efforts, we provide a series of recommendation, advocating for increased European Union funding to facilitate the generation of diverse and high-quality datasets essential for training AI models, including the application of transfer learning and the development of tools supporting data collection during MCIs, while fostering continuous collaboration between end users and technical developers. By securing these resources, we can enhance the efficiency and adaptability of AI applications in emergency care, bridging the current data gaps and ultimately improving outcomes during critical situations. %M 40209223 %R 10.2196/67318 %U https://www.jmir.org/2025/1/e67318 %U https://doi.org/10.2196/67318 %U http://www.ncbi.nlm.nih.gov/pubmed/40209223 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 13 %N %P e53133 %T Development of a Mobile Intervention for Procrastination Augmented With a Semigenerative Chatbot for University Students: Pilot Randomized Controlled Trial %A Lee,Seonmi %A Jeong,Jaehyun %A Kim,Myungsung %A Lee,Sangil %A Kim,Sung-Phil %A Jung,Dooyoung %+ Graduate School of Health Science and Technology, Ulsan National Institute of Science and Technology, UNIST-gil 50, Ulsan, 44919, Republic of Korea, 82 522174010, dooyoung@unist.ac.kr %K procrastination %K chatbot %K generative model %K semigenerative model %K time management %K cognitive behavioral therapy %K psychological assessment %K intervention engagement %K emotional support %K user experience %K mobile intervention %K artificial intelligence %K AI %D 2025 %7 10.4.2025 %9 Original Paper %J JMIR Mhealth Uhealth %G English %X Background: Procrastination negatively affects university students’ academics and mental health. Traditional time management apps lack therapeutic strategies like cognitive behavioral therapy to address procrastination’s psychological aspects. Therefore, we developed and integrated a semigenerative chatbot named Moa into a to-do app. Objective: We intended to determine the benefits of the Moa-integrated to-do app over the app without Moa by verifying behavioral and cognitive changes, analyzing the influence of engagement patterns on the changes, and exploring the user experience. Methods: The developed chatbot Moa guided users over 30 days in terms of self-observation, strategy establishment, and reflection. The architecture comprised response-generating and procrastination factor–detection algorithms. A pilot randomized controlled trial was conducted with 85 participants (n=37, 44% female; n=48, 56% male) from a university in South Korea. The control group used a to-do app without Moa, whereas the treatment group used a fully automated Moa-integrated app. The Irrational Procrastination Scale, Pure Procrastination Scale, Time Management Behavior Scale, and the Perceived Stress Scale were examined using linear mixed models with repeated measurements obtained before (T0) and after (T1) 1-month use and after 2-month use (T2) to assess the changes in irrational procrastination, pure procrastination, time management and behavior, academic self-regulation, and stress. Intervention engagement, divided into “high,” “middle” and “low” clusters, was quantified using app access and use of the to-do list and grouped using k-means clustering. In addition, changes in the psychological scale scores between the control and treatment groups were analyzed within each cluster. User experience was quantified based on the usability, feasibility, and acceptability of and satisfaction with the app, whereas thematic analysis explored the users’ subjective responses to app use. Results: In total, 75 participants completed the study. The interaction of time × procrastination was significant during the required use period (P=.01). The post hoc test indicated a significant improvement from T0 to T1 in the Time Management Behavior Scale and Perceived Stress Scale scores only in the treatment group (P<.001 and P=.009). The changes in Pure Procrastination Scale score after the required use period were significant in all clusters except for the low cluster of the control group. The high cluster in the treatment group exhibited a significant change in the Irrational Procrastination Scale after Bonferroni correction (P=.046). Usability was determined to be good in the treatment group (mean score 72.8, SD 16.0), and acceptability was higher than in the control group (P=.03). Evaluation of user experience indicated that only the participants in the treatment group achieved self-reflection and experienced an alliance with the app. Conclusions: The chatbot-integrated app demonstrated greater efficacy in influencing user behavior providing psychological support. It will serve as a valuable tool for managing procrastination and stress together. Trial Registration: Clinical Research Information Service (CRIS) KCT0009056; https://tinyurl.com/yc84tedk %M 40208664 %R 10.2196/53133 %U https://mhealth.jmir.org/2025/1/e53133 %U https://doi.org/10.2196/53133 %U http://www.ncbi.nlm.nih.gov/pubmed/40208664 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67744 %T Young Adult Perspectives on Artificial Intelligence–Based Medication Counseling in China: Discrete Choice Experiment %A Zhang,Jia %A Wang,Jing %A Zhang,JingBo %A Xia,XiaoQian %A Zhou,ZiYun %A Zhou,XiaoMing %A Wu,YiBo %+ Department of Research, Shandong Provincial Hospital Affiliated to Shandong First Medical University, No.324 Jingwu Road, Huaiyi District, Jinan, 250021, China, 86 15168887283, sdslyy@yeah.net %K artificial intelligence %K medication counseling services %K discrete choice experiment %K willingness to pay %D 2025 %7 9.4.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: As artificial intelligence (AI) permeates the current society, the young generation is becoming increasingly accustomed to using digital solutions. AI-based medication counseling services may help people take medications more accurately and reduce adverse events. However, it is not known which AI-based medication counseling service will be preferred by young people. Objective: This study aims to assess young people’s preferences for AI-based medication counseling services. Methods: A discrete choice experiment (DCE) approach was the main analysis method applied in this study, involving 6 attributes: granularity, linguistic comprehensibility, symptom-specific results, access platforms, content model, and costs. The participants in this study were screened and recruited through web-based registration and investigator visits, and the questionnaire was filled out online, with the questionnaire platform provided by Questionnaire Star. The sample population in this study consisted of young adults aged 18-44 years. A mixed logit model was used to estimate attribute preference coefficients and to estimate the willingness to pay (WTP) and relative importance (RI) scores. Subgroups were also analyzed to check for heterogeneity in preferences. Results: In this analysis, 340 participants were included, generating 8160 DCE observations. Participants exhibited a strong preference for receiving 100% symptom-specific results (β=3.18, 95% CI 2.54-3.81; P<.001), and the RI of the attributes (RI=36.99%) was consistent with this. Next, they showed preference for the content model of the video (β=0.86, 95% CI 0.51-1.22; P<.001), easy-to-understand language (β=0.81, 95% CI 0.46-1.16; P<.001), and when considering the granularity, refined content was preferred over general information (β=0.51, 95% CI 0.21-0.8; P<.001). Finally, participants exhibited a notable preference for accessing information through WeChat applets rather than websites (β=0.66, 95% CI 0.27-1.05; P<.001). The WTP for AI-based medication counseling services ranked from the highest to the lowest for symptom-specific results, easy-to-understand language, video content, WeChat applet platform, and refined medication counseling. Among these, the WTP for 100% symptom-specific results was the highest (¥24.01, 95% CI 20.16-28.77; US $1=¥7.09). High-income participants exhibited significantly higher WTP for highly accurate results (¥45.32) compared to low-income participants (¥20.65). Similarly, participants with higher education levels showed greater preferences for easy-to-understand language (¥5.93) and video content (¥12.53). Conclusions: We conducted an in-depth investigation of the preference of young people for AI-based medication counseling services. Service providers should pay attention to symptom-specific results, support more convenient access platforms, and optimize the language description, content models that add multiple digital media interactions, and more refined medication counseling to develop AI-based medication counseling services. %M 40203305 %R 10.2196/67744 %U https://www.jmir.org/2025/1/e67744 %U https://doi.org/10.2196/67744 %U http://www.ncbi.nlm.nih.gov/pubmed/40203305 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e67706 %T Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study %A Mahyoub,Mohammed %A Dougherty,Kacie %A Shukla,Ajit %+ Virtua Health, 301 Lippincott Drive, 3rd Fl., Marlton, NJ, 08053, United States, 1 8888478823, mmahyoub@virtua.org %K pulmonary embolism %K large language models %K LLMs %K natural language processing %K GPT-4o %K Clinical Longformer %K text classification %K radiology reports %D 2025 %7 9.4.2025 %9 Original Paper %J JMIR Med Inform %G English %X Background: Pulmonary embolism (PE) is a critical condition requiring rapid diagnosis to reduce mortality. Extracting PE diagnoses from radiology reports manually is time-consuming, highlighting the need for automated solutions. Advances in natural language processing, especially transformer models like GPT-4o, offer promising tools to improve diagnostic accuracy and workflow efficiency in clinical settings. Objective: This study aimed to develop an automatic extraction system using GPT-4o to extract PE diagnoses from radiology report impressions, enhancing clinical decision-making and workflow efficiency. Methods: In total, 2 approaches were developed and evaluated: a fine-tuned Clinical Longformer as a baseline model and a GPT-4o-based extractor. Clinical Longformer, an encoder-only model, was chosen for its robustness in text classification tasks, particularly on smaller scales. GPT-4o, a decoder-only instruction-following LLM, was selected for its advanced language understanding capabilities. The study aimed to evaluate GPT-4o’s ability to perform text classification compared to the baseline Clinical Longformer. The Clinical Longformer was trained on a dataset of 1000 radiology report impressions and validated on a separate set of 200 samples, while the GPT-4o extractor was validated using the same 200-sample set. Postdeployment performance was further assessed on an additional 200 operational records to evaluate model efficacy in a real-world setting. Results: GPT-4o outperformed the Clinical Longformer in 2 of the metrics, achieving a sensitivity of 1.0 (95% CI 1.0-1.0; Wilcoxon test, P<.001) and an F1-score of 0.975 (95% CI 0.9495-0.9947; Wilcoxon test, P<.001) across the validation dataset. Postdeployment evaluations also showed strong performance of the deployed GPT-4o model with a sensitivity of 1.0 (95% CI 1.0-1.0), a specificity of 0.94 (95% CI 0.8913-0.9804), and an F1-score of 0.97 (95% CI 0.9479-0.9908). This high level of accuracy supports a reduction in manual review, streamlining clinical workflows and improving diagnostic precision. Conclusions: The GPT-4o model provides an effective solution for the automatic extraction of PE diagnoses from radiology reports, offering a reliable tool that aids timely and accurate clinical decision-making. This approach has the potential to significantly improve patient outcomes by expediting diagnosis and treatment pathways for critical conditions like PE. %M 40203306 %R 10.2196/67706 %U https://medinform.jmir.org/2025/1/e67706 %U https://doi.org/10.2196/67706 %U http://www.ncbi.nlm.nih.gov/pubmed/40203306 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e62853 %T A Risk Prediction Model (CMC-AKIX) for Postoperative Acute Kidney Injury Using Machine Learning: Algorithm Development and Validation %A Min,Ji Won %A Min,Jae-Hong %A Chang,Se-Hyun %A Chung,Byung Ha %A Koh,Eun Sil %A Kim,Young Soo %A Kim,Hyung Wook %A Ban,Tae Hyun %A Shin,Seok Joon %A Choi,In Young %A Yoon,Hye Eun %+ Department of Internal Medicine, Incheon St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, 56, Dongsu-ro, Bupyeong-gu, Incheon, Seoul, 21431, Republic of Korea, 82 032 280 7370, berrynana@catholic.ac.kr %K acute kidney injury %K general surgery %K deep neural networks %K machine learning %K prediction model %K postoperative care %K surgery %K anesthesia %K mortality %K morbidity %K retrospective study %K cohort analysis %K hospital %K South Korea %K logistic regression %K user-friendly %K patient care %K risk management %K artificial intelligence %K digital health %D 2025 %7 9.4.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Postoperative acute kidney injury (AKI) is a significant risk associated with surgeries under general anesthesia, often leading to increased mortality and morbidity. Existing predictive models for postoperative AKI are usually limited to specific surgical areas or require external validation. Objective: We proposed to build a prediction model for postoperative AKI using several machine learning methods. Methods: We conducted a retrospective cohort analysis of noncardiac surgeries from 2009 to 2019 at seven university hospitals in South Korea. We evaluated six machine learning models: deep neural network, logistic regression, decision tree, random forest, light gradient boosting machine, and naïve Bayes for predicting postoperative AKI, defined as a significant increase in serum creatinine or the initiation of renal replacement therapy within 30 days after surgery. The performance of the models was analyzed using the area under the curve (AUC) of the receiver operating characteristic curve, accuracy, precision, sensitivity (recall), specificity, and F1-score. Results: Among the 239,267 surgeries analyzed, 7935 cases of postoperative AKI were identified. The models, using 38 preoperative predictors, showed that deep neural network (AUC=0.832), light gradient boosting machine (AUC=0.836), and logistic regression (AUC=0.825) demonstrated superior performance in predicting AKI risk. The deep neural network model was then developed into a user-friendly website for clinical use. Conclusions: Our study introduces a robust, high-performance AKI risk prediction system that is applicable in clinical settings using preoperative data. This model’s integration into a user-friendly website enhances its clinical utility, offering a significant step forward in personalized patient care and risk management. %M 40203303 %R 10.2196/62853 %U https://www.jmir.org/2025/1/e62853 %U https://doi.org/10.2196/62853 %U http://www.ncbi.nlm.nih.gov/pubmed/40203303 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e66366 %T Developing a Machine Learning Model for Predicting 30-Day Major Adverse Cardiac and Cerebrovascular Events in Patients Undergoing Noncardiac Surgery: Retrospective Study %A Kwun,Ju-Seung %A Ahn,Houng-Beom %A Kang,Si-Hyuck %A Yoo,Sooyoung %A Kim,Seok %A Song,Wongeun %A Hyun,Junho %A Oh,Ji Seon %A Baek,Gakyoung %A Suh,Jung-Won %+ Cardiovascular Center, Department of Internal Medicine, Seoul National University Bundang Hospital, 82 Gumi-ro, 173 Beon-gil, Bundang-gu, Gyeonggi-do, Seongnam-si, 13620, Republic of Korea, 82 01076615931, suhjw1@gmail.com %K perioperative risk evaluation %K noncardiac surgery %K prediction models %K machine learning %K common data model %K ML %K predictive modeling %K cerebrovascular %K electronic health records %K EHR %K clinical practice %K risk %K noncardiac surgeries %K perioperative %D 2025 %7 9.4.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Considering that most patients with low or no significant risk factors can safely undergo noncardiac surgery without additional cardiac evaluation, and given the excessive evaluations often performed in patients undergoing intermediate or higher risk noncardiac surgeries, practical preoperative risk assessment tools are essential to reduce unnecessary delays for urgent outpatient services and manage medical costs more efficiently. Objective: This study aimed to use the Observational Medical Outcomes Partnership Common Data Model to develop a predictive model by applying machine learning algorithms that can effectively predict major adverse cardiac and cerebrovascular events (MACCE) in patients undergoing noncardiac surgery. Methods: This retrospective observational network study collected data by converting electronic health records into a standardized Observational Medical Outcomes Partnership Common Data Model format. The study was conducted in 2 tertiary hospitals. Data included demographic information, diagnoses, laboratory results, medications, surgical types, and clinical outcomes. A total of 46,225 patients were recruited from Seoul National University Bundang Hospital and 396,424 from Asan Medical Center. We selected patients aged 65 years and older undergoing noncardiac surgeries, excluding cardiac or emergency surgeries, and those with less than 30 days of observation. Using these observational health care data, we developed machine learning–based prediction models using the observational health data sciences and informatics open-source patient-level prediction package in R (version 4.1.0; R Foundation for Statistical Computing). A total of 5 machine learning algorithms, including random forest, were developed and validated internally and externally, with performance assessed through the area under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve, and calibration plots. Results: All machine learning prediction models surpassed the Revised Cardiac Risk Index in MACCE prediction performance (AUROC=0.704). Random forest showed the best results, achieving AUROC values of 0.897 (95% CI 0.883-0.911) internally and 0.817 (95% CI 0.815-0.819) externally, with an area under the precision-recall curve of 0.095. Among 46,225 patients of the Seoul National University Bundang Hospital, MACCE occurred in 4.9% (2256/46,225), including myocardial infarction (907/46,225, 2%) and stroke (799/46,225, 1.7%), while in-hospital mortality was 0.9% (419/46,225). For Asan Medical Center, 6.3% (24,861/396,424) of patients experienced MACCE, with 1.5% (6017/396,424) stroke and 3% (11,875/396,424) in-hospital mortality. Furthermore, the significance of predictors linked to previous diagnoses and laboratory measurements underscored their critical role in effectively predicting perioperative risk. Conclusions: Our prediction models outperformed the widely used Revised Cardiac Risk Index in predicting MACCE within 30 days after noncardiac surgery, demonstrating superior calibration and generalizability across institutions. Its use can optimize preoperative evaluations, minimize unnecessary testing, and streamline perioperative care, significantly improving patient outcomes and resource use. We anticipate that applying this model to actual electronic health records will benefit clinical practice. %M 40203300 %R 10.2196/66366 %U https://www.jmir.org/2025/1/e66366 %U https://doi.org/10.2196/66366 %U http://www.ncbi.nlm.nih.gov/pubmed/40203300 %0 Journal Article %@ 2562-7600 %I JMIR Publications %V 8 %N %P e71535 %T Clinical, Operational, and Economic Benefits of a Digitally Enabled Wound Care Program in Home Health: Quasi-Experimental, Pre-Post Comparative Study %A Mohammed,Heba Tallah %A Corcoran,Kathleen %A Lavergne,Kyle %A Graham,Angela %A Gill,Daniel %A Jones,Kwame %A Singal,Shivika %A Krishnamoorthy,Malini %A Cassata,Amy %A Mannion,David %A Fraser,Robert D J %+ Swift Medical Inc, 1 King St W, Suite 4800-355, Toronto, ON, M5H1A1, Canada, 1 226 444 5073, heba@swiftmedical.com %K home health care %K artificial intelligence %K AI %K digital wound care %K wound assessment %K operational efficiency %K clinical outcomes %K healing time %K cost saving %K skilled nursing visits %D 2025 %7 8.4.2025 %9 Original Paper %J JMIR Nursing %G English %X Background: The demand for home health care and nursing visits has steadily increased, requiring significant allocation of resources for wound care. Many home health agencies operate below capacity due to clinician shortages, meeting only 61% to 70% of demand and frequently declining wound care referrals. Implementing artificial intelligence–powered digital wound care solutions (DWCSs) offers an opportunity to enhance wound care programs by improving scalability and effectiveness through better monitoring and risk identification. Objective: This study assessed clinical and operational outcomes across 14 home health branches that adopted a DWCS, comparing pre- and postadoption data and outcomes with 27 control branches without the technology. Methods: This pre-post comparative study analyzed clinical outcomes, including average days to wound healing, and operational outcomes, such as skilled nursing (SN) visits per episode (VPE) and in-home visit durations, during two 7-month intervals (from November to May in 2020-2021 and 2021-2022). Data were extracted from 14,278 patients who received wound care across adoption and control branches. Projected cost savings were also calculated based on reductions in SN visits. Results: The adoption branches showed a 4.3% reduction in SN VPE and a 2.5% reduction in visit duration, saving approximately 309 staff days. In contrast, control branches experienced a 4.5% increase in SN VPE and a 2.2% rise in visit duration, adding 42 days. Healing times improved significantly in the adoption branches, with a reduction of 4.3 days on average per wound compared to 1.6 days in control branches (P<.001); pressure injuries, venous ulcers, and surgical wounds showed the most substantial improvements. Conclusions: Integrating digital wound management technology enhances clinical outcomes, operational efficiencies, and cost savings in home health settings. A reduction of 0.3 SN VPE could generate annual savings of up to US $958,201 across the organization. The adoption branches avoided 1187 additional visits during the study period. If control branches had implemented the DWCS and achieved similar outcomes, they would have saved 18,546 healing days. These findings emphasize the importance of incorporating DWCSs into wound care programs to address increasing demands, clinician shortages, and rising health care costs while maintaining positive clinical outcomes. %M 40198913 %R 10.2196/71535 %U https://nursing.jmir.org/2025/1/e71535 %U https://doi.org/10.2196/71535 %U http://www.ncbi.nlm.nih.gov/pubmed/40198913 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e59632 %T AI Applications for Chronic Condition Self-Management: Scoping Review %A Hwang,Misun %A Zheng,Yaguang %A Cho,Youmin %A Jiang,Yun %+ , School of Nursing, University of Michigan, 400 North Ingalls Street, Ann Arbor, MI, 48109, United States, 1 7347633705, jiangyu@umich.edu %K artificial intelligence %K chronic disease %K self-management %K generative AI %K emotional self-management %D 2025 %7 8.4.2025 %9 Review %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) has potential in promoting and supporting self-management in patients with chronic conditions. However, the development and application of current AI technologies to meet patients’ needs and improve their performance in chronic condition self-management tasks remain poorly understood. It is crucial to gather comprehensive information to guide the development and selection of effective AI solutions tailored for self-management in patients with chronic conditions. Objective: This scoping review aimed to provide a comprehensive overview of AI applications for chronic condition self-management based on 3 essential self-management tasks, medical, behavioral, and emotional self-management, and to identify the current developmental stages and knowledge gaps of AI applications for chronic condition self-management. Methods: A literature review was conducted for studies published in English between January 2011 and October 2024. In total, 4 databases, including PubMed, Web of Science, CINAHL, and PsycINFO, were searched using combined terms related to self-management and AI. The inclusion criteria included studies focused on the adult population with any type of chronic condition and AI technologies supporting self-management. This review was conducted following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. Results: Of the 1873 articles retrieved from the search, 66 (3.5%) were eligible and included in this review. The most studied chronic condition was diabetes (20/66, 30%). Regarding self-management tasks, most studies aimed to support medical (45/66, 68%) or behavioral self-management (27/66, 41%), and fewer studies focused on emotional self-management (14/66, 21%). Conversational AI (21/66, 32%) and multiple machine learning algorithms (16/66, 24%) were the most used AI technologies. However, most AI technologies remained in the algorithm development (25/66, 38%) or early feasibility testing stages (25/66, 38%). Conclusions: A variety of AI technologies have been developed and applied in chronic condition self-management, primarily for medication, symptoms, and lifestyle self-management. Fewer AI technologies were developed for emotional self-management tasks, and most AIs remained in the early developmental stages. More research is needed to generate evidence for integrating AI into chronic condition self-management to obtain optimal health outcomes. %M 40198108 %R 10.2196/59632 %U https://www.jmir.org/2025/1/e59632 %U https://doi.org/10.2196/59632 %U http://www.ncbi.nlm.nih.gov/pubmed/40198108 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e62732 %T Investigating Clinicians’ Intentions and Influencing Factors for Using an Intelligence-Enabled Diagnostic Clinical Decision Support System in Health Care Systems: Cross-Sectional Survey %A Zheng,Rui %A Jiang,Xiao %A Shen,Li %A He,Tianrui %A Ji,Mengting %A Li,Xingyi %A Yu,Guangjun %+ , Shanghai Children's Hospital, No 355 Luding Road, Shanghai, 200062, China, 86 18917762998, gjyu@shchildren.com.cn %K artificial intelligence %K clinical decision support systems %K task-technology fit %K technology acceptance model %K perceived risk %K performance expectations %K intention to use %D 2025 %7 7.4.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: An intelligence-enabled clinical decision support system (CDSS) is a computerized system that integrates medical knowledge, patient data, and clinical guidelines to assist health care providers make clinical decisions. Research studies have shown that CDSS utilization rates have not met expectations. Clinicians’ intentions and their attitudes determine the use and promotion of CDSS in clinical practice. Objective: The aim of this study was to enhance the successful utilization of CDSS by analyzing the pivotal factors that influence clinicians’ intentions to adopt it and by putting forward targeted management recommendations. Methods: This study proposed a research model grounded in the task-technology fit model and the technology acceptance model, which was then tested through a cross-sectional survey. The measurement instrument comprised demographic characteristics, multi-item scales, and an open-ended query regarding areas where clinicians perceived the system required improvement. We leveraged structural equation modeling to assess the direct and indirect effects of “task-technology fit” and “perceived ease of use” on clinicians’ intentions to use the CDSS when mediated by “performance expectation” and “perceived risk.” We collated and analyzed the responses to the open-ended question. Results: We collected a total of 247 questionnaires. The model explained 65.8% of the variance in use intention. Performance expectations (β=0.228; P<.001) and perceived risk (β=–0.579; P<.001) were both significant predictors of use intention. Task-technology fit (β=–0.281; P<.001) and perceived ease of use (β=–0.377; P<.001) negatively affected perceived risk. Perceived risk (β=–0.308; P<.001) negatively affected performance expectations. Task-technology fit positively affected perceived ease of use (β=0.692; P<.001) and performance expectations (β=0.508; P<.001). Task characteristics (β=0.168; P<.001) and technology characteristics (β=0.749; P<.001) positively affected task-technology fit. Contrary to expectations, perceived ease of use (β=0.108; P=.07) did not have a significant impact on use intention. From the open-ended question, 3 main themes emerged regarding clinicians’ perceived deficiencies in CDSS: system security risks, personalized interaction, seamless integration. Conclusions: Perceived risk and performance expectations were direct determinants of clinicians’ adoption of CDSS, significantly influenced by task-technology fit and perceived ease of use. In the future, increasing transparency within CDSS and fostering trust between clinicians and technology should be prioritized. Furthermore, focusing on personalized interactions and ensuring seamless integration into clinical workflows are crucial steps moving forward. %M 40194276 %R 10.2196/62732 %U https://www.jmir.org/2025/1/e62732 %U https://doi.org/10.2196/62732 %U http://www.ncbi.nlm.nih.gov/pubmed/40194276 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e68544 %T Exploring Engagement With and Effectiveness of Digital Mental Health Interventions in Young People of Different Ethnicities: Systematic Review %A Bakhti,Rinad %A Daler,Harmani %A Ogunro,Hephzibah %A Hope,Steven %A Hargreaves,Dougal %A Nicholls,Dasha %+ Department of Brain Sciences, Division of Psychiatry, Imperial College London, Du Cane Road, London, W12 0NN, United Kingdom, 44 020 7594 1069, r.bakhti@imperial.ac.uk %K digital mental health interventions %K young people %K ethnicity %K engagement %K effectiveness %K artificial intelligence %K AI %D 2025 %7 7.4.2025 %9 Review %J J Med Internet Res %G English %X Background: The prevalence of mental health difficulties among young people has risen in recent years, with 75% of mental disorders emerging before the age of 24 years. The identification and treatment of mental health issues earlier in life improves later-life outcomes. The COVID-19 pandemic spurred the growth of digital mental health interventions (DMHIs), which offer accessible support. However, young people of different ethnicities face barriers to DMHIs, such as socioeconomic disadvantage and cultural stigma. Objective: This review aimed to summarize and evaluate the engagement with and effectiveness of DMHIs among young people of different ethnicities. Methods: A systematic search was conducted in MEDLINE, Embase, and PsycINFO for studies published between January 2019 and May 2024, with an update in September 2024. The inclusion criteria were participants aged <25 years using DMHIs from various ethnic backgrounds. Three reviewers independently screened and selected the studies. Data on engagement (eg, use and uptake) and effectiveness (eg, clinical outcomes and symptom improvement) were extracted and synthesized to compare findings. Studies were assessed for quality using the Mixed Methods Appraisal Tool. Results: The final search yielded 67 studies, of which 7 (10%) met inclusion criteria. There were 1853 participants across the 7 studies, all from high-income countries. Participants were predominantly aged 12 to 25 years, with representation of diverse ethnic identities, including Black, Asian, Hispanic, mixed race, and Aboriginal individuals. Engagement outcomes varied, with culturally relatable, low-cost interventions showing higher retention and user satisfaction. Linguistic barriers and country of origin impeded the effectiveness of some interventions, while near-peer mentorship, coproduction, and tailored content improved the effectiveness of DMHIs. While initial results are promising, small sample sizes, heterogeneity in outcome assessments, and a paucity of longitudinal data impeded robust comparisons and generalizability. Conclusions: DMHIs show potential as engaging and effective mental health promotional tools for young people of different ethnicities, especially when coproduced and culturally relatable. Initial data suggest that interventions facilitating near-peer mentoring, linguistic adaptation, low cost, and cultural relatability have improved engagement and effectiveness. Future research should focus on developing a consensus definition of DMHIs, exploring DMHIs in children aged <12 years, and conducting detailed qualitative and quantitative research on use factors and treatment efficacy of DMHIs for young people of different ethnicities. Trial Registration: PROSPERO CRD42024544364; https://tinyurl.com/yk5jt8yk %M 40194267 %R 10.2196/68544 %U https://www.jmir.org/2025/1/e68544 %U https://doi.org/10.2196/68544 %U http://www.ncbi.nlm.nih.gov/pubmed/40194267 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 14 %N %P e66232 %T Assessing Patient-Reported Satisfaction With Care and Documentation Time in Primary Care Through AI-Driven Automatic Clinical Note Generation: Protocol for a Proof-of-Concept Study %A Vidal-Alaball,Josep %A Alonso,Carlos %A Heinisch,Daniel Hugo %A Castaño,Alberto %A Sánchez-Freire,Encarna %A Benito Serrano,María Luisa %A Ferrer Pascual,Carla %A Menacho,Ignacio %A Acosta-Rojas,Ruthy %A Cardona Gubert,Odda %A Farrés Creus,Rosa %A Armengol Alegre,Joan %A Martínez Querol,Carles %A Moreno-Martinez,Marina %A Gonfaus Font,Mercè %A Narejos,Silvia %A Gomez-Fernandez,Anna %+ Research and Innovation Unit, Gerència d'Atenció Primària i a la Comunitat de la Catalunya Central, Institut Català de la Salut, Carrer de Soler i March, 6, Manresa, 08242, Spain, 34 6930040, jvidal.cc.ics@gencat.cat %K primary health care %K patient satisfaction %K artificial intelligence %K medical records systems %K computerized %K patient-centered care %D 2025 %7 7.4.2025 %9 Protocol %J JMIR Res Protoc %G English %X Background: Relisten is an artificial intelligence (AI)–based software developed by Recog Analytics that improves patient care by facilitating more natural interactions between health care professionals and patients. This tool extracts relevant information from recorded conversations, structuring it in the medical record, and sending it to the Health Information System after the professional’s approval. This approach allows professionals to focus on the patient without the need to perform clinical documentation tasks. Objective: This study aims to evaluate patient-reported satisfaction and perceived quality of care, assess health care professionals’ satisfaction with the care provided, and measure the time spent on entering records into the electronic medical record using this AI-powered solution. Methods: This proof-of-concept (PoC) study is conducted as a multicenter trial with the participation of several health care professionals (nurses and physicians) in primary care centers (CAPs). The key outcome measures include (1) patient-reported quality of care (evaluated through anonymous surveys), (2) health care professionals’ satisfaction with the care provided (assessed through surveys and structured interviews), and (3) time saved on clinical documentation (determined by comparing the time spent manually writing notes versus reviewing and correcting AI-generated notes). Statistical analyses will be performed for each objective, using independent sample comparison tests according to normality evaluated with the Kolmogorov-Smirnov test and Lilliefors correction. Stratified statistical tests will also be performed to consider the variance between professionals. Results: The protocol has been developed using the SPIRIT (Standard Protocol Items: Recommendations for Interventional Trials) checklist. Recruitment began in July 2024, and as of November 2024, a total of 318 patients have been enrolled. Recruitment is expected to be completed by March 2025. Data analysis will take place between April and May 2025, with results expected to be published in June 2025. Conclusions: We expect an improvement in the perceived quality of care reported by patients and a significant reduction in the time spent taking clinical notes, with a saving of at least 30 seconds per visit. Although a high quality of the notes generated is expected, it is uncertain whether a significant improvement over the control group, which is already expected to have high-quality notes, will be demonstrated. Trial Registration: ClinicalTrials.gov NCT06618092; https://clinicaltrials.gov/study/NCT06618092 International Registered Report Identifier (IRRID): DERR1-10.2196/66232 %M 40193189 %R 10.2196/66232 %U https://www.researchprotocols.org/2025/1/e66232 %U https://doi.org/10.2196/66232 %U http://www.ncbi.nlm.nih.gov/pubmed/40193189 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e69881 %T The Role of AI in Nursing Education and Practice: Umbrella Review %A El Arab,Rabie Adel %A Al Moosa,Omayma Abdulaziz %A Abuadas,Fuad H %A Somerville,Joel %+ , Almoosa College of Health Sciences, Ain Najm Road, Al Ahsa, 36422, Saudi Arabia, 966 508948967, r.adel@almoosacollege.edu.sa %K artificial intelligence %K nursing practice %K nursing education %K ethical implications %K social implications %K AI integration %K AI literacy %K ethical frameworks %D 2025 %7 4.4.2025 %9 Review %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) is rapidly transforming health care, offering substantial advancements in patient care, clinical workflows, and nursing education. Objective: This umbrella review aims to evaluate the integration of AI into nursing practice and education, with a focus on ethical and social implications, and to propose evidence-based recommendations to support the responsible and effective adoption of AI technologies in nursing. Methods: We included systematic reviews, scoping reviews, rapid reviews, narrative reviews, literature reviews, and meta-analyses focusing on AI integration in nursing, published up to October 2024. A new search was conducted in January 2025 to identify any potentially eligible reviews published thereafter. However, no new reviews were found. Eligibility was guided by the Sample, Phenomenon of Interest, Design, Evaluation, Research type framework; databases (PubMed or MEDLINE, CINAHL, Web of Science, Embase, and IEEE Xplore) were searched using comprehensive keywords. Two reviewers independently screened records and extracted data. Risk of bias was assessed with Risk of Bias in Systematic Reviews (ROBIS) and A Measurement Tool to Assess Systematic Reviews, version 2 (AMSTAR 2), which we adapted for systematic and nonsystematic review types. A thematic synthesis approach, conducted independently by 2 reviewers, identified recurring patterns across the included reviews. Results: The search strategy yielded 18 eligible studies after screening 274 records. These studies encompassed diverse methodologies and focused on nursing professionals, students, educators, and researchers. First, ethical and social implications were consistently highlighted, with studies emphasizing concerns about data privacy, algorithmic bias, transparency, accountability, and the necessity for equitable access to AI technologies. Second, the transformation of nursing education emerged as a critical area, with an urgent need to update curricula by integrating AI-driven educational tools and fostering both technical competencies and ethical decision-making skills among nursing students and professionals. Third, strategies for integration were identified as essential for effective implementation, calling for scalable models, robust ethical frameworks, and interdisciplinary collaboration, while also addressing key barriers such as resistance to AI adoption, lack of standardized AI education, and disparities in technology access. Conclusions: AI holds substantial promises for revolutionizing nursing practice and education. However, realizing this potential necessitates a strategic approach that addresses ethical concerns, integrates AI literacy into nursing curricula, and ensures equitable access to AI technologies. Limitations of this review include the heterogeneity of included studies and potential publication bias. Our findings underscore the need for comprehensive ethical frameworks and regulatory guidelines tailored to nursing applications, updated nursing curricula to include AI literacy and ethical training, and investments in infrastructure to promote equitable AI access. Future research should focus on developing standardized implementation strategies and evaluating the long-term impacts of AI integration on nursing practice and patient outcomes. %M 40072926 %R 10.2196/69881 %U https://www.jmir.org/2025/1/e69881 %U https://doi.org/10.2196/69881 %U http://www.ncbi.nlm.nih.gov/pubmed/40072926 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e58660 %T Psychological Factors Influencing Appropriate Reliance on AI-enabled Clinical Decision Support Systems: Experimental Web-Based Study Among Dermatologists %A Küper,Alisa %A Lodde,Georg Christian %A Livingstone,Elisabeth %A Schadendorf,Dirk %A Krämer,Nicole %+ , Social Psychology: Media and Communication, University of Duisburg-Essen, Bismarckstraße 120, Duisburg, 47057, Germany, 49 203 379 6027, alisa.kueper@uni-due.de %K AI reliance %K psychological factors %K clinical decision support systems %K medical decision-making %K artificial intelligence %K AI %D 2025 %7 4.4.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI)–enabled decision support systems are critical tools in medical practice; however, their reliability is not absolute, necessitating human oversight for final decision-making. Human reliance on such systems can vary, influenced by factors such as individual psychological factors and physician experience. Objective: This study aimed to explore the psychological factors influencing subjective trust and reliance on medical AI’s advice, specifically examining relative AI reliance and relative self-reliance to assess the appropriateness of reliance. Methods: A survey was conducted with 223 dermatologists, which included lesion image classification tasks and validated questionnaires assessing subjective trust, propensity to trust technology, affinity for technology interaction, control beliefs, need for cognition, as well as queries on medical experience and decision confidence. Results: A 2-tailed t test revealed that participants’ accuracy improved significantly with AI support (t222=−3.3; P<.001; Cohen d=4.5), but only by an average of 1% (1/100). Reliance on AI was stronger for correct advice than for incorrect advice (t222=4.2; P<.001; Cohen d=0.1). Notably, participants demonstrated a mean relative AI reliance of 10.04% (139/1384) and a relative self-reliance of 85.6% (487/569), indicating a high level of self-reliance but a low level of AI reliance. Propensity to trust technology influenced AI reliance, mediated by trust (indirect effect=0.024, 95% CI 0.008-0.042; P<.001), and medical experience negatively predicted AI reliance (indirect effect=–0.001, 95% CI –0.002 to −0.001; P<.001). Conclusions: The findings highlight the need to design AI support systems in a way that assists less experienced users with a high propensity to trust technology to identify potential AI errors, while encouraging experienced physicians to actively engage with system recommendations and potentially reassess initial decisions. %M 40184614 %R 10.2196/58660 %U https://www.jmir.org/2025/1/e58660 %U https://doi.org/10.2196/58660 %U http://www.ncbi.nlm.nih.gov/pubmed/40184614 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 4 %N %P e68809 %T Insights on the Side Effects of Female Contraceptive Products From Online Drug Reviews: Natural Language Processing–Based Content Analysis %A Groene,Nicole %A Nickel,Audrey %A Rohn,Amanda E %+ , Department for Health and Social Sciences, FOM University of Applied Sciences for Economics and Management, Leimkugelstr 6, Essen, 45141, Germany, 49 201 81004, Nicole.groene@fom-net.de %K contraception %K side effects %K natural language processing %K NLP %K informed choices %K online reviews %K women %K well-being %D 2025 %7 3.4.2025 %9 Original Paper %J JMIR AI %G English %X Background: Most online and social media discussions about birth control methods for women center on side effects, highlighting a demand for shared experiences with these products. Online user reviews and ratings of birth control products offer a largely untapped supplementary resource that could assist women and their partners in making informed contraception choices. Objective: This study sought to analyze women’s online ratings and reviews of various birth control methods, focusing on side effects linked to low product ratings. Methods: Using natural language processing (NLP) for topic modeling and descriptive statistics, this study analyzes 19,506 unique reviews of female contraceptive products posted on the website Drugs.com. Results: Ratings vary widely across contraception types. Hormonal contraceptives with high systemic absorption, such as progestin-only pills and extended-cycle pills, received more unfavorable reviews than other methods and women frequently described menstrual irregularities, continuous bleeding, and weight gain associated with their administration. Intrauterine devices were generally rated more positively, although about 1 in 10 users reported severe cramps and pain, which were linked to very poor ratings. Conclusions: While exploratory, this study highlights the potential of NLP in analyzing extensive online reviews to reveal insights into women’s experiences with contraceptives and the impact of side effects on their overall well-being. In addition to results from clinical studies, NLP-derived insights from online reviews can provide complementary information for women and health care providers, despite possible biases in online reviews. The findings suggest a need for further research to validate links between specific side effects, contraceptive methods, and women’s overall well-being. %M 40179373 %R 10.2196/68809 %U https://ai.jmir.org/2025/1/e68809 %U https://doi.org/10.2196/68809 %U http://www.ncbi.nlm.nih.gov/pubmed/40179373 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 11 %N %P e72998 %T Citation Accuracy Challenges Posed by Large Language Models %A Zhang,Manlin %A Zhao,Tianyu %K chatGPT %K medical education %K Saudi Arabia %K perceptions %K knowledge %K medical students %K faculty %K chatbot %K qualitative study %K artificial intelligence %K AI %K AI-based tools %K universities %K thematic analysis %K learning %K satisfaction %K LLM %K large language model %D 2025 %7 2.4.2025 %9 %J JMIR Med Educ %G English %X %R 10.2196/72998 %U https://mededu.jmir.org/2025/1/e72998 %U https://doi.org/10.2196/72998 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 11 %N %P e73698 %T Authors’ Reply: Citation Accuracy Challenges Posed by Large Language Models %A Temsah,Mohamad-Hani %A Al-Eyadhy,Ayman %A Jamal,Amr %A Alhasan,Khalid %A Malki,Khalid H %K ChatGPT %K Gemini %K DeepSeek %K medical education %K AI %K artificial intelligence %K Saudi Arabia %K perceptions %K medical students %K faculty %K LLM %K chatbot %K qualitative study %K thematic analysis %K satisfaction %K RAG retrieval-augmented generation %D 2025 %7 2.4.2025 %9 %J JMIR Med Educ %G English %X %R 10.2196/73698 %U https://mededu.jmir.org/2025/1/e73698 %U https://doi.org/10.2196/73698 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e59591 %T Public Awareness of and Attitudes Toward the Use of AI in Pathology Research and Practice: Mixed Methods Study %A Lewis,Claire %A Groarke,Jenny %A Graham-Wisener,Lisa %A James,Jacqueline %+ School of Medicine Dentistry and Biomedical Sciences, Queen's University Belfast, University Road, Belfast, BT7 1NN, United Kingdom, 44 2890972804, claire.lewis@qub.ac.uk %K artificial intelligence %K AI %K public opinion %K pathology %K health care %K public awareness %K survey %D 2025 %7 2.4.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: The last decade has witnessed major advances in the development of artificial intelligence (AI) technologies for use in health care. One of the most promising areas of research that has potential clinical utility is the use of AI in pathology to aid cancer diagnosis and management. While the value of using AI to improve the efficiency and accuracy of diagnosis cannot be underestimated, there are challenges in the development and implementation of such technologies. Notably, questions remain about public support for the use of AI to assist in pathological diagnosis and for the use of health care data, including data obtained from tissue samples, to train algorithms. Objective: This study aimed to investigate public awareness of and attitudes toward AI in pathology research and practice. Methods: A nationally representative, cross-sectional, web-based mixed methods survey (N=1518) was conducted to assess the UK public’s awareness of and views on the use of AI in pathology research and practice. Respondents were recruited via Prolific, an online research platform. To be eligible for the study, participants had to be aged >18 years, be UK residents, and have the capacity to express their own opinion. Respondents answered 30 closed-ended questions and 2 open-ended questions. Sociodemographic information and previous experience with cancer were collected. Descriptive and inferential statistics were used to analyze quantitative data; qualitative data were analyzed thematically. Results: Awareness was low, with only 23.19% (352/1518) of the respondents somewhat or moderately aware of AI being developed for use in pathology. Most did not support a diagnosis of cancer (908/1518, 59.82%) or a diagnosis based on biomarkers (694/1518, 45.72%) being made using AI only. However, most (1478/1518, 97.36%) supported diagnoses made by pathologists with AI assistance. The adjusted odds ratio (aOR) for supporting AI in cancer diagnosis and management was higher for men (aOR 1.34, 95% CI 1.02-1.75). Greater awareness (aOR 1.25, 95% CI 1.10-1.42), greater trust in data security and privacy protocols (aOR 1.04, 95% CI 1.01-1.07), and more positive beliefs (aOR 1.27, 95% CI 1.20-1.36) also increased support, whereas identifying more risks reduced the likelihood of support (aOR 0.80, 95% CI 0.73-0.89). In total, 3 main themes emerged from the qualitative data: bringing the public along, the human in the loop, and more hard evidence needed, indicating conditional support for AI in pathology with human decision-making oversight, robust measures for data handling and protection, and evidence for AI benefit and effectiveness. Conclusions: Awareness of AI’s potential use in pathology was low, but attitudes were positive, with high but conditional support. Challenges remain, particularly among women, regarding AI use in cancer diagnosis and management. Apprehension persists about the access to and use of health care data by private organizations. %M 40173441 %R 10.2196/59591 %U https://www.jmir.org/2025/1/e59591 %U https://doi.org/10.2196/59591 %U http://www.ncbi.nlm.nih.gov/pubmed/40173441 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e59520 %T Development and Validation of a Machine Learning Model for Early Prediction of Delirium in Intensive Care Units Using Continuous Physiological Data: Retrospective Study %A Park,Chanmin %A Han,Changho %A Jang,Su Kyeong %A Kim,Hyungjun %A Kim,Sora %A Kang,Byung Hee %A Jung,Kyoungwon %A Yoon,Dukyong %+ Department of Biomedical Systems Informatics, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea, 82 31 5189 8450, dukyong.yoon@yonsei.ac.kr %K delirium %K intensive care unit %K machine learning %K prediction model %K early prediction %D 2025 %7 2.4.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Delirium in intensive care unit (ICU) patients poses a significant challenge, affecting patient outcomes and health care efficiency. Developing an accurate, real-time prediction model for delirium represents an advancement in critical care, addressing needs for timely intervention and resource optimization in ICUs. Objective: We aimed to create a novel machine learning model for delirium prediction in ICU patients using only continuous physiological data. Methods: We developed models integrating routinely available clinical data, such as age, sex, and patient monitoring device outputs, to ensure practicality and adaptability in diverse clinical settings. To confirm the reliability of delirium determination records, we prospectively collected results of Confusion Assessment Method for the ICU (CAM-ICU) evaluations performed by qualified investigators from May 17, 2021, to December 23, 2022, determining Cohen κ coefficients. Participants were included in the study if they were aged ≥18 years at ICU admission, had delirium evaluations using the CAM-ICU, and had data collected for at least 4 hours before delirium diagnosis or nondiagnosis. The development cohort from Yongin Severance Hospital (March 1, 2020, to January 12, 2022) comprised 5478 records: 5129 (93.62%) records from 651 patients for training and 349 (6.37%) records from 163 patients for internal validation. For temporal validation, we used 4438 records from the same hospital (January 28, 2022, to December 31, 2022) to reflect potential seasonal variations. External validation was performed using data from 670 patients at Ajou University Hospital (March 2022 to September 2022). We evaluated machine learning algorithms (random forest [RF], extra-trees classifier, and light gradient boosting machine) and selected the RF model as the final model based on its performance. To confirm clinical utility, a decision curve analysis and temporal pattern for model prediction during the ICU stay were performed. Results: The κ coefficient between labels generated by ICU nurses and prospectively verified by qualified researchers was 0.81, indicating reliable CAM-ICU results. Our final model showed robust performance in internal validation (area under the receiver operating characteristic curve [AUROC]: 0.82; area under the precision-recall curve [AUPRC]: 0.62) and maintained its accuracy in temporal validation (AUROC: 0.73; AUPRC: 0.85). External validation supported its effectiveness (AUROC: 0.84; AUPRC: 0.77). Decision curve analysis showed a positive net benefit at all thresholds, and the temporal pattern analysis showed a gradual increase in the model scores as the actual delirium diagnosis time approached. Conclusions: We developed a machine learning model for delirium prediction in ICU patients using routinely measured variables, including physiological waveforms. Our study demonstrates the potential of the RF model in predicting delirium, with consistent performance across various validation scenarios. The model uses noninvasive variables, making it applicable to a wide range of ICU patients, with minimal additional risk. %M 40173433 %R 10.2196/59520 %U https://www.jmir.org/2025/1/e59520 %U https://doi.org/10.2196/59520 %U http://www.ncbi.nlm.nih.gov/pubmed/40173433 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 8 %N %P e71768 %T The Importance of Comparing New Technologies (AI) to Existing Tools for Patient Education on Common Dermatologic Conditions: A Commentary %A Juels,Parker %K artificial intelligence %K ChatGPT %K atopic dermatitis %K acne vulgaris %K actinic keratosis %K rosacea %K AI %K diagnosis %K treatment %K prognosis %K dermatological diagnoses %K chatbots %K patients %K dermatologist %D 2025 %7 1.4.2025 %9 %J JMIR Dermatol %G English %X %R 10.2196/71768 %U https://derma.jmir.org/2025/1/e71768 %U https://doi.org/10.2196/71768 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 8 %N %P e72540 %T Authors’ Reply: The Importance of Comparing New Technologies (AI) to Existing Tools for Patient Education on Common Dermatologic Conditions: A Commentary %A Chau,Courtney %A Feng,Hao %A Cobos,Gabriela %A Park,Joyce %K artificial intelligence %K ChatGPT %K atopic dermatitis %K acne vulgaris %K actinic keratosis %K rosacea %K AI %K diagnosis %K treatment %K prognosis %K dermatological diagnoses %K chatbots %K patients %K dermatologist %D 2025 %7 1.4.2025 %9 %J JMIR Dermatol %G English %X %R 10.2196/72540 %U https://derma.jmir.org/2025/1/e72540 %U https://doi.org/10.2196/72540 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e70789 %T Adoption of Large Language Model AI Tools in Everyday Tasks: Multisite Cross-Sectional Qualitative Study of Chinese Hospital Administrators %A Chen,Jun %A Liu,Yu %A Liu,Peng %A Zhao,Yiming %A Zuo,Yan %A Duan,Hui %+ , School of Public Administration and Policy, Renmin University of China, #59 Zhongguancun Street, Haidian District, Beijing, 100872, China, 86 1062511122, rucduanhui@ruc.edu.cn %K large language model %K artificial intelligence %K health care administration %K technology adoption %K hospital administrator %K qualitative study %K barriers to adoption %D 2025 %7 1.4.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Large language model (LLM) artificial intelligence (AI) tools have the potential to streamline health care administration by enhancing efficiency in document drafting, resource allocation, and communication tasks. Despite this potential, the adoption of such tools among hospital administrators remains understudied, particularly at the individual level. Objective: This study aims to explore factors influencing the adoption and use of LLM AI tools among hospital administrators in China, focusing on enablers, barriers, and practical applications in daily administrative tasks. Methods: A multicenter, cross-sectional, descriptive qualitative design was used. Data were collected through semistructured face-to-face interviews with 31 hospital administrators across 3 tertiary hospitals in Beijing, Shenzhen, and Chengdu from June 2024 to August 2024. The Colaizzi method was used for thematic analysis to identify patterns in participants’ experiences and perspectives. Results: Adoption of LLM AI tools was generally low, with significant site-specific variations. Participants with higher technological familiarity and positive early experiences reported more frequent use, while barriers such as mistrust in tool accuracy, limited prompting skills, and insufficient training hindered broader adoption. Tools were primarily used for document drafting, with limited exploration of advanced functionalities. Participants strongly emphasized the need for structured training programs and institutional support to enhance usability and confidence. Conclusions: Familiarity with technology, positive early experiences, and openness to innovation may facilitate adoption, while barriers such as limited knowledge, mistrust in tool accuracy, and insufficient prompting skills can hinder broader use. LLM AI tools are now primarily used for basic tasks such as document drafting, with limited application to more advanced functionalities due to a lack of training and confidence. Structured tutorials and institutional support are needed to enhance usability and integration. Targeted training programs, combined with organizational strategies to build trust and improve accessibility, could enhance adoption rates and broaden tool use. Future quantitative investigations should validate the adoption rate and influencing factors. %M 40116330 %R 10.2196/70789 %U https://www.jmir.org/2025/1/e70789 %U https://doi.org/10.2196/70789 %U http://www.ncbi.nlm.nih.gov/pubmed/40116330 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e53567 %T Artificial Intelligence Performance in Image-Based Cancer Identification: Umbrella Review of Systematic Reviews %A Xu,He-Li %A Gong,Ting-Ting %A Song,Xin-Jian %A Chen,Qian %A Bao,Qi %A Yao,Wei %A Xie,Meng-Meng %A Li,Chen %A Grzegorzek,Marcin %A Shi,Yu %A Sun,Hong-Zan %A Li,Xiao-Han %A Zhao,Yu-Hong %A Gao,Song %A Wu,Qi-Jun %+ Department of Clinical Epidemiology, Shengjing Hospital of China Medical University, No. 36, San Hao Street, Shenyang, Liaoning, 110004, China, 86 024 96615 13652, wuqj@sj-hospital.org %K artificial intelligence %K biomedical imaging %K cancer diagnosis %K meta-analysis %K systematic review %K umbrella review %D 2025 %7 1.4.2025 %9 Review %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) has the potential to transform cancer diagnosis, ultimately leading to better patient outcomes. Objective: We performed an umbrella review to summarize and critically evaluate the evidence for the AI-based imaging diagnosis of cancers. Methods: PubMed, Embase, Web of Science, Cochrane, and IEEE databases were searched for relevant systematic reviews from inception to June 19, 2024. Two independent investigators abstracted data and assessed the quality of evidence, using the Joanna Briggs Institute (JBI) Critical Appraisal Checklist for Systematic Reviews and Research Syntheses. We further assessed the quality of evidence in each meta-analysis by applying the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) criteria. Diagnostic performance data were synthesized narratively. Results: In a comprehensive analysis of 158 included studies evaluating the performance of AI algorithms in noninvasive imaging diagnosis across 8 major human system cancers, the accuracy of the classifiers for central nervous system cancers varied widely (ranging from 48% to 100%). Similarities were observed in the diagnostic performance for cancers of the head and neck, respiratory system, digestive system, urinary system, female-related systems, skin, and other sites. Most meta-analyses demonstrated positive summary performance. For instance, 9 reviews meta-analyzed sensitivity and specificity for esophageal cancer, showing ranges of 90%-95% and 80%-93.8%, respectively. In the case of breast cancer detection, 8 reviews calculated the pooled sensitivity and specificity within the ranges of 75.4%-92% and 83%-90.6%, respectively. Four meta-analyses reported the ranges of sensitivity and specificity in ovarian cancer, and both were 75%-94%. Notably, in lung cancer, the pooled specificity was relatively low, primarily distributed between 65% and 80%. Furthermore, 80.4% (127/158) of the included studies were of high quality according to the JBI Critical Appraisal Checklist, with the remaining studies classified as medium quality. The GRADE assessment indicated that the overall quality of the evidence was moderate to low. Conclusions: Although AI shows great potential for achieving accelerated, accurate, and more objective diagnoses of multiple cancers, there are still hurdles to overcome before its implementation in clinical settings. The present findings highlight that a concerted effort from the research community, clinicians, and policymakers is required to overcome existing hurdles and translate this potential into improved patient outcomes and health care delivery. Trial Registration: PROSPERO CRD42022364278; https://www.crd.york.ac.uk/PROSPERO/view/CRD42022364278 %M 40167239 %R 10.2196/53567 %U https://www.jmir.org/2025/1/e53567 %U https://doi.org/10.2196/53567 %U http://www.ncbi.nlm.nih.gov/pubmed/40167239 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e68560 %T Online Health Information–Seeking in the Era of Large Language Models: Cross-Sectional Web-Based Survey Study %A Yun,Hye Sun %A Bickmore,Timothy %+ Khoury College of Computer Sciences, Northeastern University, 360 Huntington Avenue, Boston, MA, 02115, United States, 1 6173732000, yun.hy@northeastern.edu %K online health information–seeking %K large language models %K eHealth %K internet %K consumer health information %D 2025 %7 31.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: As large language model (LLM)–based chatbots such as ChatGPT (OpenAI) grow in popularity, it is essential to understand their role in delivering online health information compared to other resources. These chatbots often generate inaccurate content, posing potential safety risks. This motivates the need to examine how users perceive and act on health information provided by LLM-based chatbots. Objective: This study investigates the patterns, perceptions, and actions of users seeking health information online, including LLM-based chatbots. The relationships between online health information–seeking behaviors and important sociodemographic characteristics are examined as well. Methods: A web-based survey of crowd workers was conducted via Prolific. The questionnaire covered sociodemographic information, trust in health care providers, eHealth literacy, artificial intelligence (AI) attitudes, chronic health condition status, online health information source types, perceptions, and actions, such as cross-checking or adherence. Quantitative and qualitative analyses were applied. Results: Most participants consulted search engines (291/297, 98%) and health-related websites (203/297, 68.4%) for their health information, while 21.2% (63/297) used LLM-based chatbots, with ChatGPT and Microsoft Copilot being the most popular. Most participants (268/297, 90.2%) sought information on health conditions, with fewer seeking advice on medication (179/297, 60.3%), treatments (137/297, 46.1%), and self-diagnosis (62/297, 23.2%). Perceived information quality and trust varied little across source types. The preferred source for validating information from the internet was consulting health care professionals (40/132, 30.3%), while only a very small percentage of participants (5/214, 2.3%) consulted AI tools to cross-check information from search engines and health-related websites. For information obtained from LLM-based chatbots, 19.4% (12/63) of participants cross-checked the information, while 48.4% (30/63) of participants followed the advice. Both of these rates were lower than information from search engines, health-related websites, forums, or social media. Furthermore, use of LLM-based chatbots for health information was negatively correlated with age (ρ=–0.16, P=.006). In contrast, attitudes surrounding AI for medicine had significant positive correlations with the number of source types consulted for health advice (ρ=0.14, P=.01), use of LLM-based chatbots for health information (ρ=0.31, P<.001), and number of health topics searched (ρ=0.19, P<.001). Conclusions: Although traditional online sources remain dominant, LLM-based chatbots are emerging as a resource for health information for some users, specifically those who are younger and have a higher trust in AI. The perceived quality and trustworthiness of health information varied little across source types. However, the adherence to health information from LLM-based chatbots seemed more cautious compared to search engines or health-related websites. As LLMs continue to evolve, enhancing their accuracy and transparency will be essential in mitigating any potential risks by supporting responsible information-seeking while maximizing the potential of AI in health contexts. %M 40163112 %R 10.2196/68560 %U https://www.jmir.org/2025/1/e68560 %U https://doi.org/10.2196/68560 %U http://www.ncbi.nlm.nih.gov/pubmed/40163112 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e60887 %T Automatic Human Embryo Volume Measurement in First Trimester Ultrasound From the Rotterdam Periconception Cohort: Quantitative and Qualitative Evaluation of Artificial Intelligence %A Bastiaansen,Wietske A P %A Klein,Stefan %A Hojeij,Batoul %A Rubini,Eleonora %A Koning,Anton H J %A Niessen,Wiro %A Steegers-Theunissen,Régine P M %A Rousian,Melek %+ Department of Obstetrics and Gynecology, Erasmus MC, University Medical Center, Dr Molewaterplein 40, Rotterdam, 3000 CA, The Netherlands, 31 10 703 82 5, w.bastiaansen@erasmusmc.nl %K first trimester, artificial intelligence, embryo, ultrasound, biometry %K US %K Rotterdam %K The Netherlands %K Cohort %K quantitative %K qualitative %K evaluation %K noninvasive %K pregnancy %K embryonic growth %K algorithm %K embryonic volume %K monitoring %K development %D 2025 %7 31.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Noninvasive volumetric measurements during the first trimester of pregnancy provide unique insight into human embryonic growth and development. However, current methods, such as semiautomatic (eg, virtual reality [VR]) or manual segmentation (eg, VOCAL) are not used in routine care due to their time-consuming nature, requirement for specialized training, and introduction of inter- and intrarater variability. Objective: This study aimed to address the challenges of manual and semiautomatic measurements, our objective is to develop an automatic artificial intelligence (AI) algorithm to segment the region of interest and measure embryonic volume (EV) and head volume (HV) during the first trimester of pregnancy. Methods: We used 3D ultrasound datasets from the Rotterdam Periconception Cohort, collected between 7 and 11 weeks of gestational age. We measured the EV in gestational weeks 7, 9 and 11, and the HV in weeks 9 and 11. To develop the AI algorithms for measuring EV and HV, we used nnU-net, a state-of-the-art segmentation algorithm that is publicly available. We tested the algorithms on 164 (EV) and 92 (HV) datasets, both acquired before 2020. The AI algorithm’s generalization to data acquired in the future was evaluated by testing on 116 (EV) and 58 (HV) datasets from 2020. The performance of the model was assessed using the intraclass correlation coefficient (ICC) between the volume obtained using AI and using VR. In addition, 2 experts qualitatively rated both VR and AI segmentations for the EV and HV. Results: We found that segmentation of both the EV and HV using AI took around a minute additionally, rating took another minute, hence in total, volume measurement took 2 minutes per ultrasound dataset, while experienced raters needed 5-10 minutes using a VR tool. For both the EV and HV, we found an ICC of 0.998 on the test set acquired before 2020 and an ICC of 0.996 (EV) and 0.997 (HV) for data acquired in 2020. During qualitative rating for the EV, a comparable proportion (AI: 42%, VR: 38%) were rated as excellent; however, we found that major errors were more common with the AI algorithm, as it more frequently missed limbs. For the HV, the AI segmentations were rated as excellent in 79% of cases, compared with only 17% for VR. Conclusions: We developed 2 fully automatic AI algorithms to accurately measure the EV and HV in the first trimester on 3D ultrasound data. In depth qualitative analysis revealed that the quality of the measurement for AI and VR were similar. Since automatic volumetric assessment now only takes a couple of minutes, the use of these measurements in pregnancy for monitoring growth and development during this crucial period, becomes feasible, which may lead to better screening, diagnostics, and treatment of developmental disorders in pregnancy. %M 40163035 %R 10.2196/60887 %U https://www.jmir.org/2025/1/e60887 %U https://doi.org/10.2196/60887 %U http://www.ncbi.nlm.nih.gov/pubmed/40163035 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 11 %N %P e65984 %T Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review %A Chen,David %A Alnassar,Saif Addeen %A Avison,Kate Elizabeth %A Huang,Ryan S %A Raman,Srinivas %K artificial intelligence %K chatbot %K data extraction %K AI %K conversational agent %K health information %K oncology %K scoping review %K natural language processing %K NLP %K large language model %K LLM %K digital health %K health technology %K electronic health record %D 2025 %7 28.3.2025 %9 %J JMIR Cancer %G English %X Background: Natural language processing systems for data extraction from unstructured clinical text require expert-driven input for labeled annotations and model training. The natural language processing competency of large language models (LLM) can enable automated data extraction of important patient characteristics from electronic health records, which is useful for accelerating cancer clinical research and informing oncology care. Objective: This scoping review aims to map the current landscape, including definitions, frameworks, and future directions of LLMs applied to data extraction from clinical text in oncology. Methods: We queried Ovid MEDLINE for primary, peer-reviewed research studies published since 2000 on June 2, 2024, using oncology- and LLM-related keywords. This scoping review included studies that evaluated the performance of an LLM applied to data extraction from clinical text in oncology contexts. Study attributes and main outcomes were extracted to outline key trends of research in LLM-based data extraction. Results: The literature search yielded 24 studies for inclusion. The majority of studies assessed original and fine-tuned variants of the BERT LLM (n=18, 75%) followed by the Chat-GPT conversational LLM (n=6, 25%). LLMs for data extraction were commonly applied in pan-cancer clinical settings (n=11, 46%), followed by breast (n=4, 17%), and lung (n=4, 17%) cancer contexts, and were evaluated using multi-institution datasets (n=18, 75%). Comparing the studies published in 2022‐2024 versus 2019‐2021, both the total number of studies (18 vs 6) and the proportion of studies using prompt engineering increased (5/18, 28% vs 0/6, 0%), while the proportion using fine-tuning decreased (8/18, 44.4% vs 6/6, 100%). Advantages of LLMs included positive data extraction performance and reduced manual workload. Conclusions: LLMs applied to data extraction in oncology can serve as useful automated tools to reduce the administrative burden of reviewing patient health records and increase time for patient-facing care. Recent advances in prompt-engineering and fine-tuning methods, and multimodal data extraction present promising directions for future research. Further studies are needed to evaluate the performance of LLM-enabled data extraction in clinical domains beyond the training dataset and to assess the scope and integration of LLMs into real-world clinical environments. %R 10.2196/65984 %U https://cancer.jmir.org/2025/1/e65984 %U https://doi.org/10.2196/65984 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e68618 %T Automated Radiology Report Labeling in Chest X-Ray Pathologies: Development and Evaluation of a Large Language Model Framework %A Abdullah,Abdullah %A Kim,Seong Tae %K large language model %K generative pre-trained transformers %K radiology report %K labeling %K BERT %K thoracic pathologies %K LLM %K GPT %D 2025 %7 28.3.2025 %9 %J JMIR Med Inform %G English %X Background: Labeling unstructured radiology reports is crucial for creating structured datasets that facilitate downstream tasks, such as training large-scale medical imaging models. Current approaches typically rely on Bidirectional Encoder Representations from Transformers (BERT)-based methods or manual expert annotations, which have limitations in terms of scalability and performance. Objective: This study aimed to evaluate the effectiveness of a generative pretrained transformer (GPT)-based large language model (LLM) in labeling radiology reports, comparing it with 2 existing methods, CheXbert and CheXpert, on a large chest X-ray dataset (MIMIC Chest X-ray [MIMIC-CXR]). Methods: In this study, we introduce an LLM-based approach fine-tuned on expert-labeled radiology reports. Our model’s performance was evaluated on 687 radiologist-labeled chest X-ray reports, comparing F1 scores across 14 thoracic pathologies. The performance of our LLM model was compared with the CheXbert and CheXpert models across positive, negative, and uncertainty extraction tasks. Paired t tests and Wilcoxon signed-rank tests were performed to evaluate the statistical significance of differences between model performances. Results: The GPT-based LLM model achieved an average F1 score of 0.9014 across all certainty levels, outperforming CheXpert (0.8864) and approaching CheXbert’s performance (0.9047). For positive and negative certainty levels, our model scored 0.8708, surpassing CheXpert (0.8525) and closely matching CheXbert (0.8733). Statistically, paired t tests indicated no significant difference between our model and CheXbert (P=.35) but a significant improvement over CheXpert (P=.01). Wilcoxon signed-rank tests corroborated these findings, showing no significant difference between our model and CheXbert (P=.14) but confirming a significant difference with CheXpert (P=.005). The LLM also demonstrated superior performance for pathologies with longer and more complex descriptions, leveraging its extended context length. Conclusions: The GPT-based LLM model demonstrates competitive performance compared with CheXbert and outperforms CheXpert in radiology report labeling. These findings suggest that LLMs are a promising alternative to traditional BERT-based architectures for this task, offering enhanced context understanding and eliminating the need for extensive feature engineering. Furthermore, with large context length LLM-based models are better suited for this task as compared with the small context length of BERT based models. %R 10.2196/68618 %U https://medinform.jmir.org/2025/1/e68618 %U https://doi.org/10.2196/68618 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e64617 %T An Interpretable Model With Probabilistic Integrated Scoring for Mental Health Treatment Prediction: Design Study %A Kelly,Anthony %A Jensen,Esben Kjems %A Grua,Eoin Martino %A Mathiasen,Kim %A Van de Ven,Pepijn %+ Department of Electronic and Computer Engineering, University of Limerick, Castletroy, Limerick, V94 T9PX, Ireland, 353 087545973, anthony.kelly@ul.ie %K machine learning %K mental health %K Monte Carlo dropout %K explainability %K explainable AI %K XAI %K artificial intelligence %K AI %D 2025 %7 26.3.2025 %9 Original Paper %J JMIR Med Inform %G English %X Background: Machine learning (ML) systems in health care have the potential to enhance decision-making but often fail to address critical issues such as prediction explainability, confidence, and robustness in a context-based and easily interpretable manner. Objective: This study aimed to design and evaluate an ML model for a future decision support system for clinical psychopathological treatment assessments. The novel ML model is inherently interpretable and transparent. It aims to enhance clinical explainability and trust through a transparent, hierarchical model structure that progresses from questions to scores to classification predictions. The model confidence and robustness were addressed by applying Monte Carlo dropout, a probabilistic method that reveals model uncertainty and confidence. Methods: A model for clinical psychopathological treatment assessments was developed, incorporating a novel ML model structure. The model aimed at enhancing the graphical interpretation of the model outputs and addressing issues of prediction explainability, confidence, and robustness. The proposed ML model was trained and validated using patient questionnaire answers and demographics from a web-based treatment service in Denmark (N=1088). Results: The balanced accuracy score on the test set was 0.79. The precision was ≥0.71 for all 4 prediction classes (depression, panic, social phobia, and specific phobia). The area under the curve for the 4 classes was 0.93, 0.92, 0.91, and 0.98, respectively. Conclusions: We have demonstrated a mental health treatment ML model that supported a graphical interpretation of prediction class probability distributions. Their spread and overlap can inform clinicians of competing treatment possibilities for patients and uncertainty in treatment predictions. With the ML model achieving 79% balanced accuracy, we expect that the model will be clinically useful in both screening new patients and informing clinical interviews. %M 40138679 %R 10.2196/64617 %U https://medinform.jmir.org/2025/1/e64617 %U https://doi.org/10.2196/64617 %U http://www.ncbi.nlm.nih.gov/pubmed/40138679 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 9 %N %P e64266 %T Perceived Trust and Professional Identity Threat in AI-Based Clinical Decision Support Systems: Scenario-Based Experimental Study on AI Process Design Features %A Ackerhans,Sophia %A Wehkamp,Kai %A Petzina,Rainer %A Dumitrescu,Daniel %A Schultz,Carsten %+ , Kiel Institute of Responsible Innovation, University of Kiel, Westring 425, Kiel, 24118, Germany, 49 431880479, ackerhans@bwl.uni-kiel.de %K artificial intelligence %K clinical decision support systems %K explainable artificial intelligence %K professional identity threat %K health care %K physicians %K perceptions %K professional identity %D 2025 %7 26.3.2025 %9 Original Paper %J JMIR Form Res %G English %X Background: Artificial intelligence (AI)–based systems in medicine like clinical decision support systems (CDSSs) have shown promising results in health care, sometimes outperforming human specialists. However, the integration of AI may challenge medical professionals’ identities and lead to limited trust in technology, resulting in health care professionals rejecting AI-based systems. Objective: This study aims to explore the impact of AI process design features on physicians’ trust in the AI solution and on perceived threats to their professional identity. These design features involve the explainability of AI-based CDSS decision outcomes, the integration depth of the AI-generated advice into the clinical workflow, and the physician’s accountability for the AI system-induced medical decisions. Methods: We conducted a 3-factorial web-based between-subject scenario-based experiment with 292 medical students in their medical training and experienced physicians across different specialties. The participants were presented with an AI-based CDSS for sepsis prediction and prevention for use in a hospital. Each participant was given a scenario in which the 3 design features of the AI-based CDSS were manipulated in a 2×2×2 factorial design. SPSS PROCESS (IBM Corp) macro was used for hypothesis testing. Results: The results suggest that the explainability of the AI-based CDSS was positively associated with both trust in the AI system (β=.508; P<.001) and professional identity threat perceptions (β=.351; P=.02). Trust in the AI system was found to be negatively related to professional identity threat perceptions (β=–.138; P=.047), indicating a partially mediated effect on professional identity threat through trust. Deep integration of AI-generated advice into the clinical workflow was positively associated with trust in the system (β=.262; P=.009). The accountability of the AI-based decisions, that is, the system required a signature, was found to be positively associated with professional identity threat perceptions among the respondents (β=.339; P=.004). Conclusions: Our research highlights the role of process design features of AI systems used in medicine in shaping professional identity perceptions, mediated through increased trust in AI. An explainable AI-based CDSS and an AI-generated system advice, which is deeply integrated into the clinical workflow, reinforce trust, thereby mitigating perceived professional identity threats. However, explainable AI and individual accountability of the system directly exacerbate threat perceptions. Our findings illustrate the complex nature of the behavioral patterns of AI in health care and have broader implications for supporting the implementation of AI-based CDSSs in a context where AI systems may impact professional identity. %M 40138691 %R 10.2196/64266 %U https://formative.jmir.org/2025/1/e64266 %U https://doi.org/10.2196/64266 %U http://www.ncbi.nlm.nih.gov/pubmed/40138691 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e62774 %T Convolutional Neural Network Models for Visual Classification of Pressure Ulcer Stages: Cross-Sectional Study %A Lei,Changbin %A Jiang,Yan %A Xu,Ke %A Liu,Shanshan %A Cao,Hua %A Wang,Cong %K pressure ulcer %K deep learning %K artificial intelligence %K neural network %K CNN %K machine learning %K image %K imaging %K classification %K ulcer %K sore %K pressure %K wound %K skin %D 2025 %7 25.3.2025 %9 %J JMIR Med Inform %G English %X Background: Pressure injuries (PIs) pose a negative health impact and a substantial economic burden on patients and society. Accurate staging is crucial for treating PIs. Owing to the diversity in the clinical manifestations of PIs and the lack of objective biochemical and pathological examinations, accurate staging of PIs is a major challenge. The deep learning algorithm, which uses convolutional neural networks (CNNs), has demonstrated exceptional classification performance in the intricate domain of skin diseases and wounds and has the potential to improve the staging accuracy of PIs. Objective: We explored the potential of applying AlexNet, VGGNet16, ResNet18, and DenseNet121 to PI staging, aiming to provide an effective tool to assist in staging. Methods: PI images from patients—including those with stage I, stage II, stage III, stage IV, unstageable, and suspected deep tissue injury (SDTI)—were collected at a tertiary hospital in China. Additionally, we augmented the PI data by cropping and flipping the PI images 9 times. The collected images were then divided into training, validation, and test sets at a ratio of 8:1:1. We subsequently trained them via AlexNet, VGGNet16, ResNet18, and DenseNet121 to develop staging models. Results: We collected 853 raw PI images with the following distributions across stages: stage I (n=148), stage II (n=121), stage III (n=216), stage IV (n=110), unstageable (n=128), and SDTI (n=130). A total of 7677 images were obtained after data augmentation. Among all the CNN models, DenseNet121 demonstrated the highest overall accuracy of 93.71%. The classification performances of AlexNet, VGGNet16, and ResNet18 exhibited overall accuracies of 87.74%, 82.42%, and 92.42%, respectively. Conclusions: The CNN-based models demonstrated strong classification ability for PI images, which might promote highly efficient, intelligent PI staging methods. In the future, the models can be compared with nurses with different levels of experience to further verify the clinical application effect. %R 10.2196/62774 %U https://medinform.jmir.org/2025/1/e62774 %U https://doi.org/10.2196/62774 %0 Journal Article %@ 2291-9279 %I JMIR Publications %V 13 %N %P e65498 %T Feasibility and Usability of an Artificial Intelligence—Powered Gamification Intervention for Enhancing Physical Activity Among College Students: Quasi-Experimental Study %A Gao,Yanan %A Zhang,Jinxi %A He,Zhonghui %A Zhou,Zhixiong %K physical activity %K gamification %K artificial intelligence %K digital health %K digital intervention %K feasibility study %D 2025 %7 24.3.2025 %9 %J JMIR Serious Games %G English %X Background: Physical activity (PA) is vital for physical and mental health, but many college students fail to meet recommended levels. Artificial intelligence (AI)-powered gamification interventions through mobile app have the potential to improve PA levels among Chinese college students. Objective: This study aimed to assess the feasibility and usability of an AI-powered gamification intervention. Methods: A quasi-experimental study spanning 2 months was conducted on a sample of college students aged 18 to 25 years old from 18 universities in Beijing. PA data were recorded using the ShouTi Fitness app, and participant engagement was evaluated through surveys. User satisfaction was gauged through the System Usability Scale, while the intervention’s feasibility was assessed through Spearman rank correlation analysis, Mann-Whitney tests, and additional descriptive analyses. Results: As of July 2023, we enrolled 456 college students. In total, 18,073 PA sessions were recorded, with men completing 8068 sessions and women completing 10,055 sessions. The average PA intensity was 7 metabolic equivalent of energy (MET)s per session. Most participants preferred afternoon sessions and favored short-duration sessions, with men averaging 66 seconds per session and women 42 seconds. The System Usability Scale score for the intervention based on app is 65.2. Users responded positively to the integration of AI and gamification elements, including personalized recommendations, action recognition, smart grouping, dynamic management, collaborative, and competition. Specifically, 341 users (75%) found the AI features very interesting, 365 (80%) were motivated by the gamification elements, 364 (80%) reported that the intervention supported their fitness goals, and 365 (80%) considered the intervention reliable. A significant positive correlation was observed between the duration of individual PA and intervention duration for men (ρ=0.510, P<.001), although the correlation was weaker for women (ρ=0.258, P=.046). However, the frequency of PA declined after 35 days. Conclusions: This study provides pioneering evidence of the feasibility and usability of the AI-powered gamification intervention. While adherence was successfully demonstrated, further studies or interventions are needed to directly assess the impact on PA levels and focus on optimizing long-term adherence strategies and evaluating health outcomes. %R 10.2196/65498 %U https://games.jmir.org/2025/1/e65498 %U https://doi.org/10.2196/65498 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e63937 %T Explainable AI for Intraoperative Motor-Evoked Potential Muscle Classification in Neurosurgery: Bicentric Retrospective Study %A Parduzi,Qendresa %A Wermelinger,Jonathan %A Koller,Simon Domingo %A Sariyar,Murat %A Schneider,Ulf %A Raabe,Andreas %A Seidel,Kathleen %+ , Department of Neurosurgery, Lucerne Cantonal Hospital, Spitalstrasse, Lucerne, 6000, Switzerland, 41 412056631, qendresa.parduzi@students.unibe.ch %K intraoperative neuromonitoring %K motor evoked potential %K artificial intelligence %K machine learning %K deep learning %K random forest %K convolutional neural network %K explainability %K medical informatics %K personalized medicine %K neurophysiological %K monitoring %K orthopedic %K motor %K neurosurgery %D 2025 %7 24.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Intraoperative neurophysiological monitoring (IONM) guides the surgeon in ensuring motor pathway integrity during high-risk neurosurgical and orthopedic procedures. Although motor-evoked potentials (MEPs) are valuable for predicting motor outcomes, the key features of predictive signals are not well understood, and standardized warning criteria are lacking. Developing a muscle identification prediction model could increase patient safety while allowing the exploration of relevant features for the task. Objective: The aim of this study is to expand the development of machine learning (ML) methods for muscle classification and evaluate them in a bicentric setup. Further, we aim to identify key features of MEP signals that contribute to accurate muscle classification using explainable artificial intelligence (XAI) techniques. Methods: This study used ML and deep learning models, specifically random forest (RF) classifiers and convolutional neural networks (CNNs), to classify MEP signals from routine supratentorial neurosurgical procedures from two medical centers according to muscle identity of four muscles (extensor digitorum, abductor pollicis brevis, tibialis anterior, and abductor hallucis). The algorithms were trained and validated on a total of 36,992 MEPs from 151 surgeries in one center, and they were tested on 24,298 MEPs from 58 surgeries from the other center. Depending on the algorithm, time-series, feature-engineered, and time-frequency representations of the MEP data were used. XAI techniques, specifically Shapley Additive Explanation (SHAP) values and gradient class activation maps (Grad-CAM), were implemented to identify important signal features. Results: High classification accuracy was achieved with the RF classifier, reaching 87.9% accuracy on the validation set and 80% accuracy on the test set. The 1D- and 2D-CNNs demonstrated comparably strong performance. Our XAI findings indicate that frequency components and peak latencies are crucial for accurate MEP classification, providing insights that could inform intraoperative warning criteria. Conclusions: This study demonstrates the effectiveness of ML techniques and the importance of XAI in enhancing trust in and reliability of artificial intelligence–driven IONM applications. Further, it may help to identify new intrinsic features of MEP signals so far overlooked in conventional warning criteria. By reducing the risk of muscle mislabeling and by providing the basis for possible new warning criteria, this study may help to increase patient safety during surgical procedures. %M 40127441 %R 10.2196/63937 %U https://www.jmir.org/2025/1/e63937 %U https://doi.org/10.2196/63937 %U http://www.ncbi.nlm.nih.gov/pubmed/40127441 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 8 %N %P e63923 %T Exploring the Views of Dermatologists, General Practitioners, and Melanographers on the Use of AI Tools in the Context of Good Decision-Making When Detecting Melanoma: Qualitative Interview Study %A Partridge,Brad %A Gillespie,Nicole %A Soyer,H Peter %A Mar,Victoria %A Janda,Monika %+ Centre for Health Services Research, University of Queensland, Level 2, Building 33, Princess Alexandra Hospital, Brisbane, 4102, Australia, 61 7 3176 5530, b.partridge@uq.edu.au %K artificial intelligence %K melanoma %K skin cancer %K decision-making %K decision support %K qualitative %K attitudes %K dermatologists %K general practitioners %K melanographers %K Australia %K New Zealand %D 2025 %7 24.3.2025 %9 Original Paper %J JMIR Dermatol %G English %X Background: Evidence that artificial intelligence (AI) may improve melanoma detection has led to calls for increased human-AI collaboration in clinical workflows. However, AI-based support may entail a wide range of specific functions for AI. To appropriately integrate AI into decision-making processes, it is crucial to understand the precise role that clinicians see AI playing within their clinical deliberations. Objective: This study aims to provide an in-depth understanding of how a range of clinicians involved in melanoma screening and diagnosis conceptualize the role of AI within their decision-making and what these conceptualizations mean for good decision-making. Methods: This qualitative exploration used in-depth individual interviews with 30 clinicians, predominantly from Australia and New Zealand (n=26, 87%), who engaged in melanoma detection (n=17, 57% dermatologists; n=6, 20% general practitioners with an interest in skin cancer; and n=7, 23% melanographers). The vast majority of the sample (n=25, 83%) had interacted with or used 2D or 3D skin imaging technologies with AI tools for screening or diagnosis of melanoma, either as part of testing through clinical AI reader studies or within their clinical work. Results: We constructed the following 5 themes to describe how participants conceptualized the role of AI within decision-making when it comes to melanoma detection: theme 1 (integrative theme)—the importance of good clinical judgment; theme 2—AI as just one tool among many; theme 3—AI as an adjunct after a clinician’s decision; theme 4—AI as a second opinion for unresolved decisions; theme 5—AI as an expert guide before decision-making. Participants articulated a major conundrum—AI may benefit inexperienced clinicians when conceptualized as an “expert guide,” but overreliance, deskilling, and a failure to recognize AI errors may mean only experienced clinicians should use AI “as a tool.” However, experienced clinicians typically relied on their own clinical judgment, and some could be wary of allowing AI to “influence” their deliberations. The benefit of AI was often to reassure decisions once they had been reached by conceptualizing AI as a kind of “checker,” “validator,” or in a small number of equivocal cases, as a genuine “second opinion.” This raised questions about the extent to which experienced clinicians truly seek to “collaborate” with AI or use it to inform decisions. Conclusions: Clinicians conceptualized AI support in an array of disparate ways that have implications for how AI should be incorporated into clinical workflows. A priority for clinicians is the conservation of good clinical acumen, and our study encourages a more focused engagement with users about the precise way to incorporate AI into the clinical decision-making process for melanoma detection. %M 40127437 %R 10.2196/63923 %U https://derma.jmir.org/2025/1/e63923 %U https://doi.org/10.2196/63923 %U http://www.ncbi.nlm.nih.gov/pubmed/40127437 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67922 %T AI-Derived Blood Biomarkers for Ovarian Cancer Diagnosis: Systematic Review and Meta-Analysis %A Xu,He-Li %A Li,Xiao-Ying %A Jia,Ming-Qian %A Ma,Qi-Peng %A Zhang,Ying-Hua %A Liu,Fang-Hua %A Qin,Ying %A Chen,Yu-Han %A Li,Yu %A Chen,Xi-Yang %A Xu,Yi-Lin %A Li,Dong-Run %A Wang,Dong-Dong %A Huang,Dong-Hui %A Xiao,Qian %A Zhao,Yu-Hong %A Gao,Song %A Qin,Xue %A Tao,Tao %A Gong,Ting-Ting %A Wu,Qi-Jun %+ Department of Clinical Epidemiology, Shengjing Hospital of China Medical University, No. 36, San Hao Street, ShenYang, 110004, China, 86 024 96615 13652, wuqj@sj-hospital.org %K artificial intelligence %K AI %K blood biomarker %K ovarian cancer %K diagnosis %K PRISMA %D 2025 %7 24.3.2025 %9 Review %J J Med Internet Res %G English %X Background: Emerging evidence underscores the potential application of artificial intelligence (AI) in discovering noninvasive blood biomarkers. However, the diagnostic value of AI-derived blood biomarkers for ovarian cancer (OC) remains inconsistent. Objective: We aimed to evaluate the research quality and the validity of AI-based blood biomarkers in OC diagnosis. Methods: A systematic search was performed in the MEDLINE, Embase, IEEE Xplore, PubMed, Web of Science, and the Cochrane Library databases. Studies examining the diagnostic accuracy of AI in discovering OC blood biomarkers were identified. The risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies–AI tool. Pooled sensitivity, specificity, and area under the curve (AUC) were estimated using a bivariate model for the diagnostic meta-analysis. Results: A total of 40 studies were ultimately included. Most (n=31, 78%) included studies were evaluated as low risk of bias. Overall, the pooled sensitivity, specificity, and AUC were 85% (95% CI 83%-87%), 91% (95% CI 90%-92%), and 0.95 (95% CI 0.92-0.96), respectively. For contingency tables with the highest accuracy, the pooled sensitivity, specificity, and AUC were 95% (95% CI 90%-97%), 97% (95% CI 95%-98%), and 0.99 (95% CI 0.98-1.00), respectively. Stratification by AI algorithms revealed higher sensitivity and specificity in studies using machine learning (sensitivity=85% and specificity=92%) compared to those using deep learning (sensitivity=77% and specificity=85%). In addition, studies using serum reported substantially higher sensitivity (94%) and specificity (96%) than those using plasma (sensitivity=83% and specificity=91%). Stratification by external validation demonstrated significantly higher specificity in studies with external validation (specificity=94%) compared to those without external validation (specificity=89%), while the reverse was observed for sensitivity (74% vs 90%). No publication bias was detected in this meta-analysis. Conclusions: AI algorithms demonstrate satisfactory performance in the diagnosis of OC using blood biomarkers and are anticipated to become an effective diagnostic modality in the future, potentially avoiding unnecessary surgeries. Future research is warranted to incorporate external validation into AI diagnostic models, as well as to prioritize the adoption of deep learning methodologies. Trial Registration: PROSPERO CRD42023481232; https://www.crd.york.ac.uk/PROSPERO/view/CRD42023481232 %M 40126546 %R 10.2196/67922 %U https://www.jmir.org/2025/1/e67922 %U https://doi.org/10.2196/67922 %U http://www.ncbi.nlm.nih.gov/pubmed/40126546 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67967 %T Large Language Model–Based Assessment of Clinical Reasoning Documentation in the Electronic Health Record Across Two Institutions: Development and Validation Study %A Schaye,Verity %A DiTullio,David %A Guzman,Benedict Vincent %A Vennemeyer,Scott %A Shih,Hanniel %A Reinstein,Ilan %A Weber,Danielle E %A Goodman,Abbie %A Wu,Danny T Y %A Sartori,Daniel J %A Santen,Sally A %A Gruppen,Larry %A Aphinyanaphongs,Yindalon %A Burk-Rafel,Jesse %+ Institute for Innovations in Medical Education, NYU Grossman School of Medicine, 550 First Avenue, MS G 61, New York, NY, 10016, United States, 1 212 263 3006, verity.schaye@nyulangone.org %K large language models %K artificial intelligence %K clinical reasoning %K documentation %K assessment %K feedback %K electronic health record %D 2025 %7 21.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Clinical reasoning (CR) is an essential skill; yet, physicians often receive limited feedback. Artificial intelligence holds promise to fill this gap. Objective: We report the development of named entity recognition (NER), logic-based and large language model (LLM)–based assessments of CR documentation in the electronic health record across 2 institutions (New York University Grossman School of Medicine [NYU] and University of Cincinnati College of Medicine [UC]). Methods: The note corpus consisted of internal medicine resident admission notes (retrospective set: July 2020-December 2021, n=700 NYU and 450 UC notes and prospective validation set: July 2023-December 2023, n=155 NYU and 92 UC notes). Clinicians rated CR documentation quality in each note using a previously validated tool (Revised-IDEA), on 3-point scales across 2 domains: differential diagnosis (D0, D1, and D2) and explanation of reasoning, (EA0, EA1, and EA2). At NYU, the retrospective set was annotated for NER for 5 entities (diagnosis, diagnostic category, prioritization of diagnosis language, data, and linkage terms). Models were developed using different artificial intelligence approaches, including NER, logic-based model: a large word vector model (scispaCy en_core_sci_lg) with model weights adjusted with backpropagation from annotations, developed at NYU with external validation at UC, NYUTron LLM: an NYU internal 110 million parameter LLM pretrained on 7.25 million clinical notes, only validated at NYU, and GatorTron LLM: an open source 345 million parameter LLM pretrained on 82 billion words of clinical text, fined tuned on NYU retrospective sets, then externally validated and further fine-tuned at UC. Model performance was assessed in the prospective sets with F1-scores for the NER, logic-based model and area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC) for the LLMs. Results: At NYU, the NYUTron LLM performed best: the D0 and D2 models had AUROC/AUPRC 0.87/0.79 and 0.89/0.86, respectively. The D1, EA0, and EA1 models had insufficient performance for implementation (AUROC range 0.57-0.80, AUPRC range 0.33-0.63). For the D1 classification, the approach pivoted to a stepwise approach taking advantage of the more performant D0 and D2 models. For the EA model, the approach pivoted to a binary EA2 model (ie, EA2 vs not EA2) with excellent performance, AUROC/AUPRC 0.85/ 0.80. At UC, the NER, D-logic–based model was the best performing D model (F1-scores 0.80, 0.74, and 0.80 for D0, D1, D2, respectively. The GatorTron LLM performed best for EA2 scores AUROC/AUPRC 0.75/ 0.69. Conclusions: This is the first multi-institutional study to apply LLMs for assessing CR documentation in the electronic health record. Such tools can enhance feedback on CR. Lessons learned by implementing these models at distinct institutions support the generalizability of this approach. %M 40117575 %R 10.2196/67967 %U https://www.jmir.org/2025/1/e67967 %U https://doi.org/10.2196/67967 %U http://www.ncbi.nlm.nih.gov/pubmed/40117575 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e60148 %T Public Disclosure of Results From Artificial Intelligence/Machine Learning Research in Health Care: Comprehensive Analysis of ClinicalTrials.gov, PubMed, and Scopus Data (2010-2023) %A Maru,Shoko %A Kuwatsuru,Ryohei %A Matthias,Michael D %A Simpson Jr,Ross J %+ Real‑World Evidence and Data Assessment (READS), Graduate School of Medicine, Juntendo University, 2-1-1 Hongo, Bunkyo‑ku, Tokyo, 113-8421, Japan, 81 338133111, shoko.maru@alumni.griffithuni.edu.au %K machine learning %K ML %K artificial intelligence %K AI %K algorithm %K model %K analytics %K deep learning %K health care %K health disparities %K disparity %K social disparity %K social inequality %K social inequity %K data-source disparities %K ClinicalTrials.gov %K clinical trial %K database %K PubMed %K Scopus %K public disclosure of results %K public disclosure %K dissemination %D 2025 %7 21.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Despite the rapid growth of research in artificial intelligence/machine learning (AI/ML), little is known about how often study results are disclosed years after study completion. Objective: We aimed to estimate the proportion of AI/ML research that reported results through ClinicalTrials.gov or peer-reviewed publications indexed in PubMed or Scopus. Methods: Using data from the Clinical Trials Transformation Initiative Aggregate Analysis of ClinicalTrials.gov, we identified studies initiated and completed between January 2010 and December 2023 that contained AI/ML-specific terms in the official title, brief summary, interventions, conditions, detailed descriptions, primary outcomes, or keywords. For 842 completed studies, we searched PubMed and Scopus for publications containing study identifiers and AI/ML-specific terms in relevant fields, such as the title, abstract, and keywords. We calculated disclosure rates within 3 years of study completion and median times to disclosure—from the “primary completion date” to the “results first posted date” on ClinicalTrials.gov or the earliest date of journal publication. Results: When restricted to studies completed before 2021, ensuring at least 3 years of follow-up in which to report results, 7.0% (22/316) disclosed results on ClinicalTrials.gov, 16.5% (52/316) in journal publications, and 20.6% (65/316) through either route within 3 years of completion. Higher disclosure rates were observed for trials: 11.0% (15/136) on ClinicalTrials.gov, 25.0% (34/136) in journal publications, and 30.1% (41/136) through either route. Randomized controlled trials had even higher disclosure rates: 12.2% (9/74) on ClinicalTrials.gov, 31.1% (23/74) in journal publications, and 36.5% (27/74) through either route. Nevertheless, most study findings (79.4%; 251/316) remained undisclosed 3 years after study completion. Trials using randomization (vs nonrandomized) or masking (vs open label) had higher disclosure rates and shorter times to disclosure. Most trials (85%; 305/357) had sample sizes of ≤1000, yet larger trials (n>1000) had higher publication rates (30.8%; 16/52) than smaller trials (n≤1000) (17.4%; 53/305). Hospitals (12.4%; 42/340), academia (15.1%; 39/259), and industry (13.7%; 20/146) published the most. High-income countries accounted for 82.4% (89/108) of all published studies. Of studies with disclosed results, the median times to report through ClinicalTrials.gov and in journal publications were 505 days (IQR 399-676) and 407 days (IQR 257-674), respectively. Open-label trials were common (60%; 214/357). Single-center designs were prevalent in both trials (83.3%; 290/348) and observational studies (82.3%; 377/458). Conclusions: For nearly 80% of completed studies, findings remained undisclosed within the 3 years of follow-up, raising questions about the representativeness of publicly available evidence. While methodological rigor was generally associated with higher publication rates, the predominance of single-center designs and high-income countries may limit the generalizability of the results currently accessible. %R 10.2196/60148 %U https://www.jmir.org/2025/1/e60148 %U https://doi.org/10.2196/60148 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 11 %N %P e58375 %T Performance of Plug-In Augmented ChatGPT and Its Ability to Quantify Uncertainty: Simulation Study on the German Medical Board Examination %A Madrid,Julian %A Diehl,Philipp %A Selig,Mischa %A Rolauffs,Bernd %A Hans,Felix Patricius %A Busch,Hans-Jörg %A Scheef,Tobias %A Benning,Leo %K medical education %K artificial intelligence %K generative AI %K large language model %K LLM %K ChatGPT %K GPT-4 %K board licensing examination %K professional education %K examination %K student %K experimental %K bootstrapping %K confidence interval %D 2025 %7 21.3.2025 %9 %J JMIR Med Educ %G English %X Background: The GPT-4 is a large language model (LLM) trained and fine-tuned on an extensive dataset. After the public release of its predecessor in November 2022, the use of LLMs has seen a significant spike in interest, and a multitude of potential use cases have been proposed. In parallel, however, important limitations have been outlined. Particularly, current LLMs encounter limitations, especially in symbolic representation and accessing contemporary data. The recent version of GPT-4, alongside newly released plugin features, has been introduced to mitigate some of these limitations. Objective: Before this background, this work aims to investigate the performance of GPT-3.5, GPT-4, GPT-4 with plugins, and GPT-4 with plugins using pretranslated English text on the German medical board examination. Recognizing the critical importance of quantifying uncertainty for LLM applications in medicine, we furthermore assess this ability and develop a new metric termed “confidence accuracy” to evaluate it. Methods: We used GPT-3.5, GPT-4, GPT-4 with plugins, and GPT-4 with plugins and translation to answer questions from the German medical board examination. Additionally, we conducted an analysis to assess how the models justify their answers, the accuracy of their responses, and the error structure of their answers. Bootstrapping and CIs were used to evaluate the statistical significance of our findings. Results: This study demonstrated that available GPT models, as LLM examples, exceeded the minimum competency threshold established by the German medical board for medical students to obtain board certification to practice medicine. Moreover, the models could assess the uncertainty in their responses, albeit exhibiting overconfidence. Additionally, this work unraveled certain justification and reasoning structures that emerge when GPT generates answers. Conclusions: The high performance of GPTs in answering medical questions positions it well for applications in academia and, potentially, clinical practice. Its capability to quantify uncertainty in answers suggests it could be a valuable artificial intelligence agent within the clinical decision-making loop. Nevertheless, significant challenges must be addressed before artificial intelligence agents can be robustly and safely implemented in the medical domain. %R 10.2196/58375 %U https://mededu.jmir.org/2025/1/e58375 %U https://doi.org/10.2196/58375 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 8 %N %P e67299 %T Assessing the Diagnostic Accuracy of ChatGPT-4 in Identifying Diverse Skin Lesions Against Squamous and Basal Cell Carcinoma %A Chetla,Nitin %A Chen,Matthew %A Chang,Joseph %A Smith,Aaron %A Hage,Tamer Rajai %A Patel,Romil %A Gardner,Alana %A Bryer,Bridget %K chatbot %K ChatGPT %K ChatGPT-4 %K squamous cell carcinoma %K basal cell carcinoma %K skin cancer %K skin cancer detection %K dermatoscopic image analysis %K skin lesion differentiation %K dermatologist %K machine learning %K ML %K artificial intelligence %K AI %K AI in dermatology %K algorithm %K model %K analytics %K diagnostic accuracy %D 2025 %7 21.3.2025 %9 %J JMIR Dermatol %G English %X Our study evaluates the diagnostic accuracy of ChatGPT-4o in classifying various skin lesions, highlighting its limitations in distinguishing squamous cell carcinoma from basal cell carcinoma using dermatoscopic images. %R 10.2196/67299 %U https://derma.jmir.org/2025/1/e67299 %U https://doi.org/10.2196/67299 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 4 %N %P e70222 %T Using AI to Translate and Simplify Spanish Orthopedic Medical Text: Instrument Validation Study %A Andalib,Saman %A Spina,Aidin %A Picton,Bryce %A Solomon,Sean S %A Scolaro,John A %A Nelson,Ariana M %K large language models %K LLM %K patient education %K translation %K bilingual evaluation understudy %K GPT-4 %K Google Translate %D 2025 %7 21.3.2025 %9 %J JMIR AI %G English %X Background: Language barriers contribute significantly to health care disparities in the United States, where a sizable proportion of patients are exclusively Spanish speakers. In orthopedic surgery, such barriers impact both patients’ comprehension of and patients’ engagement with available resources. Studies have explored the utility of large language models (LLMs) for medical translation but have yet to robustly evaluate artificial intelligence (AI)–driven translation and simplification of orthopedic materials for Spanish speakers. Objective: This study used the bilingual evaluation understudy (BLEU) method to assess translation quality and investigated the ability of AI to simplify patient education materials (PEMs) in Spanish. Methods: PEMs (n=78) from the American Academy of Orthopaedic Surgery were translated from English to Spanish, using 2 LLMs (GPT-4 and Google Translate). The BLEU methodology was applied to compare AI translations with professionally human-translated PEMs. The Friedman test and Dunn multiple comparisons test were used to statistically quantify differences in translation quality. A readability analysis and feature analysis were subsequently performed to evaluate text simplification success and the impact of English text features on BLEU scores. The capability of an LLM to simplify medical language written in Spanish was also assessed. Results: As measured by BLEU scores, GPT-4 showed moderate success in translating PEMs into Spanish but was less successful than Google Translate. Simplified PEMs demonstrated improved readability when compared to original versions (P<.001) but were unable to reach the targeted grade level for simplification. The feature analysis revealed that the total number of syllables and average number of syllables per sentence had the highest impact on BLEU scores. GPT-4 was able to significantly reduce the complexity of medical text written in Spanish (P<.001). Conclusions: Although Google Translate outperformed GPT-4 in translation accuracy, LLMs, such as GPT-4, may provide significant utility in translating medical texts into Spanish and simplifying such texts. We recommend considering a dual approach—using Google Translate for translation and GPT-4 for simplification—to improve medical information accessibility and orthopedic surgery education among Spanish-speaking patients. %R 10.2196/70222 %U https://ai.jmir.org/2025/1/e70222 %U https://doi.org/10.2196/70222 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e65567 %T Trust and Acceptance Challenges in the Adoption of AI Applications in Health Care: Quantitative Survey Analysis %A Kauttonen,Janne %A Rousi,Rebekah %A Alamäki,Ari %+ Digital Transition and AI, Haaga-Helia University of Applied Sciences, Ratapihantie 13, Helsinki, 00520, Finland, 358 400 230 404, janne.kauttonen@haaga-helia.fi %K artificial intelligence %K AI %K health care technology %K technology adoption %K predictive modeling %K user trust %K user acceptance %D 2025 %7 21.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) has potential to transform health care, but its successful implementation depends on the trust and acceptance of consumers and patients. Understanding the factors that influence attitudes toward AI is crucial for effective adoption. Despite AI’s growing integration into health care, consumer and patient acceptance remains a critical challenge. Research has largely focused on applications or attitudes, lacking a comprehensive analysis of how factors, such as demographics, personality traits, technology attitudes, and AI knowledge, affect and interact across different health care AI contexts. Objective: We aimed to investigate people’s trust in and acceptance of AI across health care use cases and determine how context and perceived risk affect individuals’ propensity to trust and accept AI in specific health care scenarios. Methods: We collected and analyzed web-based survey data from 1100 Finnish participants, presenting them with 8 AI use cases in health care: 5 (62%) noninvasive applications (eg, activity monitoring and mental health support) and 3 (38%) physical interventions (eg, AI-controlled robotic surgery). Respondents evaluated intention to use, trust, and willingness to trade off personal data for these use cases. Gradient boosted tree regression models were trained to predict responses based on 33 demographic-, personality-, and technology-related variables. To interpret the results of our predictive models, we used the Shapley additive explanations method, a game theory–based approach for explaining the output of machine learning models. It quantifies the contribution of each feature to individual predictions, allowing us to determine the relative importance of various demographic-, personality-, and technology-related factors and their interactions in shaping participants’ trust in and acceptance of AI in health care. Results: Consumer attitudes toward technology, technology use, and personality traits were the primary drivers of trust and intention to use AI in health care. Use cases were ranked by acceptance, with noninvasive monitors being the most preferred. However, the specific use case had less impact in general than expected. Nonlinear dependencies were observed, including an inverted U-shaped pattern in positivity toward AI based on self-reported AI knowledge. Certain personality traits, such as being more disorganized and careless, were associated with more positive attitudes toward AI in health care. Women seemed more cautious about AI applications in health care than men. Conclusions: The findings highlight the complex interplay of factors influencing trust and acceptance of AI in health care. Consumer trust and intention to use AI in health care are driven by technology attitudes and use rather than specific use cases. AI service providers should consider demographic factors, personality traits, and technology attitudes when designing and implementing AI systems in health care. The study demonstrates the potential of using predictive AI models as decision-making tools for implementing and interacting with clients in health care AI applications. %M 40116853 %R 10.2196/65567 %U https://www.jmir.org/2025/1/e65567 %U https://doi.org/10.2196/65567 %U http://www.ncbi.nlm.nih.gov/pubmed/40116853 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 4 %N %P e65729 %T Utility-based Analysis of Statistical Approaches and Deep Learning Models for Synthetic Data Generation With Focus on Correlation Structures: Algorithm Development and Validation %A Miletic,Marko %A Sariyar,Murat %+ Institute for Optimisation and Data Analysis (IODA), Bern University of Applied Sciences, Höheweg 80, Biel, 2502, Switzerland, 41 32 321 64 37, murat.sariyar@bfh.ch %K synthetic data generation %K medical data synthesis %K random forests %K simulation study %K deep learning %K propensity score mean-squared error %D 2025 %7 20.3.2025 %9 Original Paper %J JMIR AI %G English %X Background: Recent advancements in Generative Adversarial Networks and large language models (LLMs) have significantly advanced the synthesis and augmentation of medical data. These and other deep learning–based methods offer promising potential for generating high-quality, realistic datasets crucial for improving machine learning applications in health care, particularly in contexts where data privacy and availability are limiting factors. However, challenges remain in accurately capturing the complex associations inherent in medical datasets. Objective: This study evaluates the effectiveness of various Synthetic Data Generation (SDG) methods in replicating the correlation structures inherent in real medical datasets. In addition, it examines their performance in downstream tasks using Random Forests (RFs) as the benchmark model. To provide a comprehensive analysis, alternative models such as eXtreme Gradient Boosting and Gated Additive Tree Ensembles are also considered. We compare the following SDG approaches: Synthetic Populations in R (synthpop), copula, copulagan, Conditional Tabular Generative Adversarial Network (ctgan), tabular variational autoencoder (tvae), and tabula for LLMs. Methods: We evaluated synthetic data generation methods using both real-world and simulated datasets. Simulated data consist of 10 Gaussian variables and one binary target variable with varying correlation structures, generated via Cholesky decomposition. Real-world datasets include the body performance dataset with 13,393 samples for fitness classification, the Wisconsin Breast Cancer dataset with 569 samples for tumor diagnosis, and the diabetes dataset with 768 samples for diabetes prediction. Data quality is evaluated by comparing correlation matrices, the propensity score mean-squared error (pMSE) for general utility, and F1-scores for downstream tasks as a specific utility metric, using training on synthetic data and testing on real data. Results: Our simulation study, supplemented with real-world data analyses, shows that the statistical methods copula and synthpop consistently outperform deep learning approaches across various sample sizes and correlation complexities, with synthpop being the most effective. Deep learning methods, including large LLMs, show mixed performance, particularly with smaller datasets or limited training epochs. LLMs often struggle to replicate numerical dependencies effectively. In contrast, methods like tvae with 10,000 epochs perform comparably well. On the body performance dataset, copulagan achieves the best performance in terms of pMSE. The results also highlight that model utility depends more on the relative correlations between features and the target variable than on the absolute magnitude of correlation matrix differences. Conclusions: Statistical methods, particularly synthpop, demonstrate superior robustness and utility preservation for synthetic tabular data compared with deep learning approaches. Copula methods show potential but face limitations with integer variables. Deep Learning methods underperform in this context. Overall, these findings underscore the dominance of statistical methods for synthetic data generation for tabular data, while highlighting the niche potential of deep learning approaches for highly complex datasets, provided adequate resources and tuning. %M 40112290 %R 10.2196/65729 %U https://ai.jmir.org/2025/1/e65729 %U https://doi.org/10.2196/65729 %U http://www.ncbi.nlm.nih.gov/pubmed/40112290 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 12 %N %P e57986 %T Exploring Biases of Large Language Models in the Field of Mental Health: Comparative Questionnaire Study of the Effect of Gender and Sexual Orientation in Anorexia Nervosa and Bulimia Nervosa Case Vignettes %A Schnepper,Rebekka %A Roemmel,Noa %A Schaefert,Rainer %A Lambrecht-Walzinger,Lena %A Meinlschmidt,Gunther %K anorexia nervosa %K artificial intelligence %K bulimia nervosa %K ChatGPT %K eating disorders %K LLM %K responsible AI %K transformer %K bias %K large language model %K gender %K vignette %K quality of life %K symptomatology %K questionnaire %K generative AI %K mental health %K AI %D 2025 %7 20.3.2025 %9 %J JMIR Ment Health %G English %X Background: Large language models (LLMs) are increasingly used in mental health, showing promise in assessing disorders. However, concerns exist regarding their accuracy, reliability, and fairness. Societal biases and underrepresentation of certain populations may impact LLMs. Because LLMs are already used for clinical practice, including decision support, it is important to investigate potential biases to ensure a responsible use of LLMs. Anorexia nervosa (AN) and bulimia nervosa (BN) show a lifetime prevalence of 1%‐2%, affecting more women than men. Among men, homosexual men face a higher risk of eating disorders (EDs) than heterosexual men. However, men are underrepresented in ED research, and studies on gender, sexual orientation, and their impact on AN and BN prevalence, symptoms, and treatment outcomes remain limited. Objectives: We aimed to estimate the presence and size of bias related to gender and sexual orientation produced by a common LLM as well as a smaller LLM specifically trained for mental health analyses, exemplified in the context of ED symptomatology and health-related quality of life (HRQoL) of patients with AN or BN. Methods: We extracted 30 case vignettes (22 AN and 8 BN) from scientific papers. We adapted each vignette to create 4 versions, describing a female versus male patient living with their female versus male partner (2 × 2 design), yielding 120 vignettes. We then fed each vignette into ChatGPT-4 and to “MentaLLaMA” based on the Large Language Model Meta AI (LLaMA) architecture thrice with the instruction to evaluate them by providing responses to 2 psychometric instruments, the RAND-36 questionnaire assessing HRQoL and the eating disorder examination questionnaire. With the resulting LLM-generated scores, we calculated multilevel models with a random intercept for gender and sexual orientation (accounting for within-vignette variance), nested in vignettes (accounting for between-vignette variance). Results: In ChatGPT-4, the multilevel model with 360 observations indicated a significant association with gender for the RAND-36 mental composite summary (conditional means: 12.8 for male and 15.1 for female cases; 95% CI of the effect –6.15 to −0.35; P=.04) but neither with sexual orientation (P=.71) nor with an interaction effect (P=.37). We found no indications for main effects of gender (conditional means: 5.65 for male and 5.61 for female cases; 95% CI –0.10 to 0.14; P=.88), sexual orientation (conditional means: 5.63 for heterosexual and 5.62 for homosexual cases; 95% CI –0.14 to 0.09; P=.67), or for an interaction effect (P=.61, 95% CI –0.11 to 0.19) for the eating disorder examination questionnaire overall score (conditional means 5.59‐5.65 95% CIs 5.45 to 5.7). MentaLLaMA did not yield reliable results. Conclusions: LLM-generated mental HRQoL estimates for AN and BN case vignettes may be biased by gender, with male cases scoring lower despite no real-world evidence supporting this pattern. This highlights the risk of bias in generative artificial intelligence in the field of mental health. Understanding and mitigating biases related to gender and other factors, such as ethnicity, and socioeconomic status are crucial for responsible use in diagnostics and treatment recommendations. %R 10.2196/57986 %U https://mental.jmir.org/2025/1/e57986 %U https://doi.org/10.2196/57986 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 8 %N %P e63686 %T Using Deep Learning to Perform Automatic Quantitative Measurement of Masseter and Tongue Muscles in Persons With Dementia: Cross-Sectional Study %A Imani,Mahdi %A Borda,Miguel G %A Vogrin,Sara %A Meijering,Erik %A Aarsland,Dag %A Duque,Gustavo %K artificial intelligence %K machine learning %K sarcopenia %K dementia %K masseter muscle %K tongue muscle %K deep learning %K head %K tongue %K face %K magnetic resonance imaging %K MRI %K image %K imaging %K muscle %K muscles %K neural network %K aging %K gerontology %K older adults %K geriatrics %K older adult health %D 2025 %7 19.3.2025 %9 %J JMIR Aging %G English %X Background: Sarcopenia (loss of muscle mass and strength) increases adverse outcomes risk and contributes to cognitive decline in older adults. Accurate methods to quantify muscle mass and predict adverse outcomes, particularly in older persons with dementia, are still lacking. Objective: This study’s main objective was to assess the feasibility of using deep learning techniques for segmentation and quantification of musculoskeletal tissues in magnetic resonance imaging (MRI) scans of the head in patients with neurocognitive disorders. This study aimed to pave the way for using automated techniques for opportunistic detection of sarcopenia in patients with neurocognitive disorder. Methods: In a cross-sectional analysis of 53 participants, we used 7 U-Net-like deep learning models to segment 5 different tissues in head MRI images and used the Dice similarity coefficient and average symmetric surface distance as main assessment techniques to compare results. We also analyzed the relationship between BMI and muscle and fat volumes. Results: Our framework accurately quantified masseter and subcutaneous fat on the left and right sides of the head and tongue muscle (mean Dice similarity coefficient 92.4%). A significant correlation exists between the area and volume of tongue muscle, left masseter muscle, and BMI. Conclusions: Our study demonstrates the successful application of a deep learning model to quantify muscle volumes in head MRI in patients with neurocognitive disorders. This is a promising first step toward clinically applicable artificial intelligence and deep learning methods for estimating masseter and tongue muscle and predicting adverse outcomes in this population. %R 10.2196/63686 %U https://aging.jmir.org/2025/1/e63686 %U https://doi.org/10.2196/63686 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e58021 %T Machine Learning–Based Explainable Automated Nonlinear Computation Scoring System for Health Score and an Application for Prediction of Perioperative Stroke: Retrospective Study %A Oh,Mi-Young %A Kim,Hee-Soo %A Jung,Young Mi %A Lee,Hyung-Chul %A Lee,Seung-Bo %A Lee,Seung Mi %+ Department of Obstetrics and Gynecology, College of Medicine, Seoul National University, 101 Daehak‐ro, Jongno‐gu, Seoul, 03080, Republic of Korea, 82 2 2072 4857, lbsm@snu.ac.kr %K machine learning %K explainability %K score %K computation scoring system %K Nonlinear computation %K application %K perioperative stroke %K perioperative %K stroke %K efficiency %K ML-based models %K patient %K noncardiac surgery %K noncardiac %K surgery %K effectiveness %K risk tool %K risk %K tool %K real-world data %D 2025 %7 19.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Machine learning (ML) has the potential to enhance performance by capturing nonlinear interactions. However, ML-based models have some limitations in terms of interpretability. Objective: This study aimed to develop and validate a more comprehensible and efficient ML-based scoring system using SHapley Additive exPlanations (SHAP) values. Methods: We developed and validated the Explainable Automated nonlinear Computation scoring system for Health (EACH) framework score. We developed a CatBoost-based prediction model, identified key features, and automatically detected the top 5 steepest slope change points based on SHAP plots. Subsequently, we developed a scoring system (EACH) and normalized the score. Finally, the EACH score was used to predict perioperative stroke. We developed the EACH score using data from the Seoul National University Hospital cohort and validated it using data from the Boramae Medical Center, which was geographically and temporally different from the development set. Results: When applied for perioperative stroke prediction among 38,737 patients undergoing noncardiac surgery, the EACH score achieved an area under the curve (AUC) of 0.829 (95% CI 0.753-0.892). In the external validation, the EACH score demonstrated superior predictive performance with an AUC of 0.784 (95% CI 0.694-0.871) compared with a traditional score (AUC=0.528, 95% CI 0.457-0.619) and another ML-based scoring generator (AUC=0.564, 95% CI 0.516-0.612). Conclusions: The EACH score is a more precise, explainable ML-based risk tool, proven effective in real-world data. The EACH score outperformed traditional scoring system and other prediction models based on different ML techniques in predicting perioperative stroke. %M 40106818 %R 10.2196/58021 %U https://www.jmir.org/2025/1/e58021 %U https://doi.org/10.2196/58021 %U http://www.ncbi.nlm.nih.gov/pubmed/40106818 %0 Journal Article %@ 2563-6316 %I JMIR Publications %V 6 %N %P e65263 %T Large Language Models for Pediatric Differential Diagnoses in Rural Health Care: Multicenter Retrospective Cohort Study Comparing GPT-3 With Pediatrician Performance %A Mansoor,Masab %A Ibrahim,Andrew F %A Grindem,David %A Baig,Asad %K natural language processing %K NLP %K machine learning %K ML %K artificial intelligence %K language model %K large language model %K LLM %K generative pretrained transformer %K GPT %K pediatrics %D 2025 %7 19.3.2025 %9 %J JMIRx Med %G English %X Background: Rural health care providers face unique challenges such as limited specialist access and high patient volumes, making accurate diagnostic support tools essential. Large language models like GPT-3 have demonstrated potential in clinical decision support but remain understudied in pediatric differential diagnosis. Objective: This study aims to evaluate the diagnostic accuracy and reliability of a fine-tuned GPT-3 model compared to board-certified pediatricians in rural health care settings. Methods: This multicenter retrospective cohort study analyzed 500 pediatric encounters (ages 0‐18 years; n=261, 52.2% female) from rural health care organizations in Central Louisiana between January 2020 and December 2021. The GPT-3 model (DaVinci version) was fine-tuned using the OpenAI application programming interface and trained on 350 encounters, with 150 reserved for testing. Five board-certified pediatricians (mean experience: 12, SD 5.8 years) provided reference standard diagnoses. Model performance was assessed using accuracy, sensitivity, specificity, and subgroup analyses. Results: The GPT-3 model achieved an accuracy of 87.3% (131/150 cases), sensitivity of 85% (95% CI 82%‐88%), and specificity of 90% (95% CI 87%‐93%), comparable to pediatricians’ accuracy of 91.3% (137/150 cases; P=.47). Performance was consistent across age groups (0‐5 years: 54/62, 87%; 6‐12 years: 47/53, 89%; 13‐18 years: 30/35, 86%) and common complaints (fever: 36/39, 92%; abdominal pain: 20/23, 87%). For rare diagnoses (n=20), accuracy was slightly lower (16/20, 80%) but comparable to pediatricians (17/20, 85%; P=.62). Conclusions: This study demonstrates that a fine-tuned GPT-3 model can provide diagnostic support comparable to pediatricians, particularly for common presentations, in rural health care. Further validation in diverse populations is necessary before clinical implementation. %R 10.2196/65263 %U https://xmed.jmir.org/2025/1/e65263 %U https://doi.org/10.2196/65263 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 12 %N %P e67682 %T AI Chatbots for Psychological Health for Health Professionals: Scoping Review %A Baek,Gumhee %A Cha,Chiyoung %A Han,Jin-Hui %K artificial intelligence %K AI chatbot %K psychological health %K health professionals %K burnout %K scoping review %D 2025 %7 19.3.2025 %9 %J JMIR Hum Factors %G English %X Background: Health professionals face significant psychological burdens including burnout, anxiety, and depression. These can negatively impact their well-being and patient care. Traditional psychological health interventions often encounter limitations such as a lack of accessibility and privacy. Artificial intelligence (AI) chatbots are being explored as potential solutions to these challenges, offering available and immediate support. Therefore, it is necessary to systematically evaluate the characteristics and effectiveness of AI chatbots designed specifically for health professionals. Objective: This scoping review aims to evaluate the existing literature on the use of AI chatbots for psychological health support among health professionals. Methods: Following Arksey and O’Malley’s framework, a comprehensive literature search was conducted across eight databases, covering studies published before 2024, including backward and forward citation tracking and manual searching from the included studies. Studies were screened for relevance based on inclusion and exclusion criteria, among 2465 studies retrieved, 10 studies met the criteria for review. Results: Among the 10 studies, six chatbots were delivered via mobile platforms, and four via web-based platforms, all enabling one-on-one interactions. Natural language processing algorithms were used in six studies and cognitive behavioral therapy techniques were applied to psychological health in four studies. Usability was evaluated in six studies through participant feedback and engagement metrics. Improvements in anxiety, depression, and burnout were observed in four studies, although one reported an increase in depressive symptoms. Conclusions: AI chatbots show potential tools to support the psychological health of health professionals by offering personalized and accessible interventions. Nonetheless, further research is required to establish standardized protocols and validate the effectiveness of these interventions. Future studies should focus on refining chatbot designs and assessing their impact on diverse health professionals. %R 10.2196/67682 %U https://humanfactors.jmir.org/2025/1/e67682 %U https://doi.org/10.2196/67682 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e60019 %T Application of Internet Hospitals in the Disease Management of Patients With Ulcerative Colitis: Retrospective Study %A Yu,Tianzhi %A Li,Wanyu %A Liu,Yingchun %A Jin,Chunjie %A Wang,Zimin %A Cao,Hailong %+ Department of Gastroenterology, National Key Clinical Specialty, Tianjin Medical University General Hospital, 154 Anshan Road in Heping District, Tianjin, 300052, China, 86 +86 022 6036155, caohailong@tmu.edu.cn %K inflammatory bowel disease %K ulcerative colitis %K intelligent diagnosis and treatment service %K internet hospital %K chronic disease management %D 2025 %7 18.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Ulcerative colitis (UC) is a chronic disease characterized by frequent relapses, requiring long-term management and consuming substantial medical and social resources. Effective management of UC remains challenging due to the need for sustainable remission strategies, continuity of care, and access to medical services. Intelligent diagnosis refers to the use of artificial intelligence–driven algorithms to analyze patient-reported symptoms, generate diagnostic probabilities, and provide treatment recommendations through interactive tools. This approach could potentially function as a method for UC management. Objective: This study aimed to analyze the diagnosis and treatment data of UC from both physical hospitals and internet hospitals, highlighting the potential benefits of the intelligent diagnosis and treatment service model offered by internet hospitals. Methods: We collected data on the visits of patients with UC to the Department of Gastroenterology at Tianjin Medical University General Hospital. A total of 852 patients with UC were included between July 1, 2020, and June 31, 2023. Statistical methods, including chi-square tests for categorical variables, t tests for continuous variables, and rank-sum tests for visit numbers, were used to evaluate the medical preferences and expenses of patients with UC. Results: We found that internet hospitals and physical hospitals presented different medical service models due to the different distribution of medical needs and patient groups. Patients who chose internet hospitals focused on disease consultation and prescription medication (3295/3528, 93.40%). Patients’ medical preferences gradually shifted to web-based services provided by internet hospitals. Over time, 58.57% (270/461) of patients chose either web-based services or a combination of web-based and offline services for UC diagnosis and treatment. The number of visits in the combination of web-based and offline service modes was the highest (mean 13.83, SD 11.07), and younger patients were inclined to visit internet hospitals (49.66%>34.71%). In addition, compared with physical hospitals, there was no difference in testing fees and examination fees for patients with UC in internet hospitals, but medicine fees were lower. Conclusions: The intelligent diagnosis and treatment model provided by internet hospitals demonstrates the potential benefits in managing UC, including feasibility, accessibility, convenience, and economics. %M 40101745 %R 10.2196/60019 %U https://www.jmir.org/2025/1/e60019 %U https://doi.org/10.2196/60019 %U http://www.ncbi.nlm.nih.gov/pubmed/40101745 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 11 %N %P e55709 %T Impact of Clinical Decision Support Systems on Medical Students’ Case-Solving Performance: Comparison Study with a Focus Group %A Montagna,Marco %A Chiabrando,Filippo %A De Lorenzo,Rebecca %A Rovere Querini,Patrizia %A , %K chatGPT %K chatbot %K machine learning %K ML %K artificial intelligence %K AI %K algorithm %K predictive model %K predictive analytics %K predictive system %K practical model %K deep learning %K large language models %K LLMs %K medical education %K medical teaching %K teaching environment %K clinical decision support systems %K CDSS %K decision support %K decision support tool %K clinical decision-making %K innovative teaching %D 2025 %7 18.3.2025 %9 %J JMIR Med Educ %G English %X Background: Health care practitioners use clinical decision support systems (CDSS) as an aid in the crucial task of clinical reasoning and decision-making. Traditional CDSS are online repositories (ORs) and clinical practice guidelines (CPG). Recently, large language models (LLMs) such as ChatGPT have emerged as potential alternatives. They have proven to be powerful, innovative tools, yet they are not devoid of worrisome risks. Objective: This study aims to explore how medical students perform in an evaluated clinical case through the use of different CDSS tools. Methods: The authors randomly divided medical students into 3 groups, CPG, n=6 (38%); OR, n=5 (31%); and ChatGPT, n=5 (31%); and assigned each group a different type of CDSS for guidance in answering prespecified questions, assessing how students’ speed and ability at resolving the same clinical case varied accordingly. External reviewers evaluated all answers based on accuracy and completeness metrics (score: 1‐5). The authors analyzed and categorized group scores according to the skill investigated: differential diagnosis, diagnostic workup, and clinical decision-making. Results: Answering time showed a trend for the ChatGPT group to be the fastest. The mean scores for completeness were as follows: CPG 4.0, OR 3.7, and ChatGPT 3.8 (P=.49). The mean scores for accuracy were as follows: CPG 4.0, OR 3.3, and ChatGPT 3.7 (P=.02). Aggregating scores according to the 3 students’ skill domains, trends in differences among the groups emerge more clearly, with the CPG group that performed best in nearly all domains and maintained almost perfect alignment between its completeness and accuracy. Conclusions: This hands-on session provided valuable insights into the potential perks and associated pitfalls of LLMs in medical education and practice. It suggested the critical need to include teachings in medical degree courses on how to properly take advantage of LLMs, as the potential for misuse is evident and real. %R 10.2196/55709 %U https://mededu.jmir.org/2025/1/e55709 %U https://doi.org/10.2196/55709 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e66279 %T Using Synthetic Health Care Data to Leverage Large Language Models for Named Entity Recognition: Development and Validation Study %A Šuvalov,Hendrik %A Lepson,Mihkel %A Kukk,Veronika %A Malk,Maria %A Ilves,Neeme %A Kuulmets,Hele-Andra %A Kolde,Raivo %+ Institute of Computer Science, University of Tartu, Narva mnt 28, Tartu, 51009, Estonia, 372 7375100, hendrik.suvalov@ut.ee %K natural language processing %K named entity recognition %K large language model %K synthetic data %K LLM %K NLP %K machine learning %K artificial intelligence %K language model %K NER %K medical entity %K Estonian %K health care data %K annotated data %K data annotation %K clinical decision support %K data mining %D 2025 %7 18.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Named entity recognition (NER) plays a vital role in extracting critical medical entities from health care records, facilitating applications such as clinical decision support and data mining. Developing robust NER models for low-resource languages, such as Estonian, remains a challenge due to the scarcity of annotated data and domain-specific pretrained models. Large language models (LLMs) have proven to be promising in understanding text from any language or domain. Objective: This study addresses the development of medical NER models for low-resource languages, specifically Estonian. We propose a novel approach by generating synthetic health care data and using LLMs to annotate them. These synthetic data are then used to train a high-performing NER model, which is applied to real-world medical texts, preserving patient data privacy. Methods: Our approach to overcoming the shortage of annotated Estonian health care texts involves a three-step pipeline: (1) synthetic health care data are generated using a locally trained GPT-2 model on Estonian medical records, (2) the synthetic data are annotated with LLMs, specifically GPT-3.5-Turbo and GPT-4, and (3) the annotated synthetic data are then used to fine-tune an NER model, which is later tested on real-world medical data. This paper compares the performance of different prompts; assesses the impact of GPT-3.5-Turbo, GPT-4, and a local LLM; and explores the relationship between the amount of annotated synthetic data and model performance. Results: The proposed methodology demonstrates significant potential in extracting named entities from real-world medical texts. Our top-performing setup achieved an F1-score of 0.69 for drug extraction and 0.38 for procedure extraction. These results indicate a strong performance in recognizing certain entity types while highlighting the complexity of extracting procedures. Conclusions: This paper demonstrates a successful approach to leveraging LLMs for training NER models using synthetic data, effectively preserving patient privacy. By avoiding reliance on human-annotated data, our method shows promise in developing models for low-resource languages, such as Estonian. Future work will focus on refining the synthetic data generation and expanding the method’s applicability to other domains and languages. %M 40101227 %R 10.2196/66279 %U https://www.jmir.org/2025/1/e66279 %U https://doi.org/10.2196/66279 %U http://www.ncbi.nlm.nih.gov/pubmed/40101227 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e57358 %T Enhancing Patient Outcome Prediction Through Deep Learning With Sequential Diagnosis Codes From Structured Electronic Health Record Data: Systematic Review %A Hama,Tuankasfee %A Alsaleh,Mohanad M %A Allery,Freya %A Choi,Jung Won %A Tomlinson,Christopher %A Wu,Honghan %A Lai,Alvina %A Pontikos,Nikolas %A Thygesen,Johan H %+ Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, United Kingdom, 44 0207679200, tuankasfee.hama.21@ucl.ac.uk %K deep learning %K electronic health records %K EHR %K diagnosis codes %K prediction %K patient outcomes %K systematic review %D 2025 %7 18.3.2025 %9 Review %J J Med Internet Res %G English %X Background: The use of structured electronic health records in health care systems has grown rapidly. These systems collect huge amounts of patient information, including diagnosis codes representing temporal medical history. Sequential diagnostic information has proven valuable for predicting patient outcomes. However, the extent to which these types of data have been incorporated into deep learning (DL) models has not been examined. Objective: This systematic review aims to describe the use of sequential diagnostic data in DL models, specifically to understand how these data are integrated, whether sample size improves performance, and whether the identified models are generalizable. Methods: Relevant studies published up to May 15, 2023, were identified using 4 databases: PubMed, Embase, IEEE Xplore, and Web of Science. We included all studies using DL algorithms trained on sequential diagnosis codes to predict patient outcomes. We excluded review articles and non–peer-reviewed papers. We evaluated the following aspects in the included papers: DL techniques, characteristics of the dataset, prediction tasks, performance evaluation, generalizability, and explainability. We also assessed the risk of bias and applicability of the studies using the Prediction Model Study Risk of Bias Assessment Tool (PROBAST). We used the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) checklist to report our findings. Results: Of the 740 identified papers, 84 (11.4%) met the eligibility criteria. Publications in this area increased yearly. Recurrent neural networks (and their derivatives; 47/84, 56%) and transformers (22/84, 26%) were the most commonly used architectures in DL-based models. Most studies (45/84, 54%) presented their input features as sequences of visit embeddings. Medications (38/84, 45%) were the most common additional feature. Of the 128 predictive outcome tasks, the most frequent was next-visit diagnosis (n=30, 23%), followed by heart failure (n=18, 14%) and mortality (n=17, 13%). Only 7 (8%) of the 84 studies evaluated their models in terms of generalizability. A positive correlation was observed between training sample size and model performance (area under the receiver operating characteristic curve; P=.02). However, 59 (70%) of the 84 studies had a high risk of bias. Conclusions: The application of DL for advanced modeling of sequential medical codes has demonstrated remarkable promise in predicting patient outcomes. The main limitation of this study was the heterogeneity of methods and outcomes. However, our analysis found that using multiple types of features, integrating time intervals, and including larger sample sizes were generally related to an improved predictive performance. This review also highlights that very few studies (7/84, 8%) reported on challenges related to generalizability and less than half (38/84, 45%) of the studies reported on challenges related to explainability. Addressing these shortcomings will be instrumental in unlocking the full potential of DL for enhancing health care outcomes and patient care. Trial Registration: PROSPERO CRD42018112161; https://tinyurl.com/yc6h9rwu %M 40100249 %R 10.2196/57358 %U https://www.jmir.org/2025/1/e57358 %U https://doi.org/10.2196/57358 %U http://www.ncbi.nlm.nih.gov/pubmed/40100249 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e66344 %T Revealing Patient Dissatisfaction With Health Care Resource Allocation in Multiple Dimensions Using Large Language Models and the International Classification of Diseases 11th Revision: Aspect-Based Sentiment Analysis %A Li,Jiaxuan %A Yang,Yunchu %A Mao,Chao %A Pang,Patrick Cheong-Iao %A Zhu,Quanjing %A Xu,Dejian %A Wang,Yapeng %+ Faculty of Applied Sciences, Macao Polytechnic University, Rua de Luís Gonzaga Gomes, Macao, 999078, Macao, 853 85996886, mail@patrickpang.net %K ICD-11 %K International Classification of Diseases 11th Revision %K disease classification %K patient reviews %K patient satisfaction %K ChatGPT %K Sustainable Development Goals %K chain of thought %K large language model %D 2025 %7 17.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Accurately measuring the health care needs of patients with different diseases remains a public health challenge for health care management worldwide. There is a need for new computational methods to be able to assess the health care resources required by patients with different diseases to avoid wasting resources. Objective: This study aimed to assessing dissatisfaction with allocation of health care resources from the perspective of patients with different diseases that can help optimize resource allocation and better achieve several of the Sustainable Development Goals (SDGs), such as SDG 3 (“Good Health and Well-being”). Our goal was to show the effectiveness and practicality of large language models (LLMs) in assessing the distribution of health care resources. Methods: We used aspect-based sentiment analysis (ABSA), which can divide textual data into several aspects for sentiment analysis. In this study, we used Chat Generative Pretrained Transformer (ChatGPT) to perform ABSA of patient reviews based on 3 aspects (patient experience, physician skills and efficiency, and infrastructure and administration)00 in which we embedded chain-of-thought (CoT) prompting and compared the performance of Chinese and English LLMs on a Chinese dataset. Additionally, we used the International Classification of Diseases 11th Revision (ICD-11) application programming interface (API) to classify the sentiment analysis results into different disease categories. Results: We evaluated the performance of the models by comparing predicted sentiments (either positive or negative) with the labels judged by human evaluators in terms of the aforementioned 3 aspects. The results showed that ChatGPT 3.5 is superior in a combination of stability, expense, and runtime considerations compared to ChatGPT-4o and Qwen-7b. The weighted total precision of our method based on the ABSA of patient reviews was 0.907, while the average accuracy of all 3 sampling methods was 0.893. Both values suggested that the model was able to achieve our objective. Using our approach, we identified that dissatisfaction is highest for sex-related diseases and lowest for circulatory diseases and that the need for better infrastructure and administration is much higher for blood-related diseases than for other diseases in China. Conclusions: The results prove that our method with LLMs can use patient reviews and the ICD-11 classification to assess the health care needs of patients with different diseases, which can assist with resource allocation rationally. %M 40096682 %R 10.2196/66344 %U https://www.jmir.org/2025/1/e66344 %U https://doi.org/10.2196/66344 %U http://www.ncbi.nlm.nih.gov/pubmed/40096682 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 4 %N %P e67239 %T Improving the Robustness and Clinical Applicability of Automatic Respiratory Sound Classification Using Deep Learning–Based Audio Enhancement: Algorithm Development and Validation %A Tzeng,Jing-Tong %A Li,Jeng-Lin %A Chen,Huan-Yu %A Huang,Chun-Hsiang %A Chen,Chi-Hsin %A Fan,Cheng-Yi %A Huang,Edward Pei-Chuan %A Lee,Chi-Chun %+ Department of Electrical Engineering, National Tsing Hua University, 101, Section 2, Kuang-Fu Road, Hsinchu, 300, Taiwan, 886 35162439, cclee@ee.nthu.edu.tw %K respiratory sound %K lung sound %K audio enhancement %K noise robustness %K clinical applicability %K artificial intelligence %K AI %D 2025 %7 13.3.2025 %9 Original Paper %J JMIR AI %G English %X Background: Deep learning techniques have shown promising results in the automatic classification of respiratory sounds. However, accurately distinguishing these sounds in real-world noisy conditions poses challenges for clinical deployment. In addition, predicting signals with only background noise could undermine user trust in the system. Objective: This study aimed to investigate the feasibility and effectiveness of incorporating a deep learning–based audio enhancement preprocessing step into automatic respiratory sound classification systems to improve robustness and clinical applicability. Methods: We conducted extensive experiments using various audio enhancement model architectures, including time-domain and time-frequency–domain approaches, in combination with multiple classification models to evaluate the effectiveness of the audio enhancement module in an automatic respiratory sound classification system. The classification performance was compared against the baseline noise injection data augmentation method. These experiments were carried out on 2 datasets: the International Conference in Biomedical and Health Informatics (ICBHI) respiratory sound dataset, which contains 5.5 hours of recordings, and the Formosa Archive of Breath Sound dataset, which comprises 14.6 hours of recordings. Furthermore, a physician validation study involving 7 senior physicians was conducted to assess the clinical utility of the system. Results: The integration of the audio enhancement module resulted in a 21.88% increase with P<.001 in the ICBHI classification score on the ICBHI dataset and a 4.1% improvement with P<.001 on the Formosa Archive of Breath Sound dataset in multi-class noisy scenarios. Quantitative analysis from the physician validation study revealed improvements in efficiency, diagnostic confidence, and trust during model-assisted diagnosis, with workflows that integrated enhanced audio leading to an 11.61% increase in diagnostic sensitivity and facilitating high-confidence diagnoses. Conclusions: Incorporating an audio enhancement algorithm significantly enhances the robustness and clinical utility of automatic respiratory sound classification systems, improving performance in noisy environments and fostering greater trust among medical professionals. %R 10.2196/67239 %U https://ai.jmir.org/2025/1/e67239 %U https://doi.org/10.2196/67239 %0 Journal Article %@ 2563-3570 %I JMIR Publications %V 6 %N %P e65001 %T A Hybrid Deep Learning–Based Feature Selection Approach for Supporting Early Detection of Long-Term Behavioral Outcomes in Survivors of Cancer: Cross-Sectional Study %A Huang,Tracy %A Ngan,Chun-Kit %A Cheung,Yin Ting %A Marcotte,Madelyn %A Cabrera,Benjamin %+ Worcester Polytechnic Institute, 100 Institute Rd, Worcester, MA, 01609, United States, 1 (508) 831 5000, cngan@wpi.edu %K machine learning %K data driven %K clinical domain–guided framework %K survivors of cancer %K cancer %K oncology %K behavioral outcome predictions %K behavioral study %K behavioral outcomes %K feature selection %K deep learning %K neural network %K hybrid %K prediction %K predictive modeling %K patients with cancer %K deep learning models %K leukemia %K computational study %K computational biology %D 2025 %7 13.3.2025 %9 Original Paper %J JMIR Bioinform Biotech %G English %X Background: The number of survivors of cancer is growing, and they often experience negative long-term behavioral outcomes due to cancer treatments. There is a need for better computational methods to handle and predict these outcomes so that physicians and health care providers can implement preventive treatments. Objective: This study aimed to create a new feature selection algorithm to improve the performance of machine learning classifiers to predict negative long-term behavioral outcomes in survivors of cancer. Methods: We devised a hybrid deep learning–based feature selection approach to support early detection of negative long-term behavioral outcomes in survivors of cancer. Within a data-driven, clinical domain–guided framework to select the best set of features among cancer treatments, chronic health conditions, and socioenvironmental factors, we developed a 2-stage feature selection algorithm, that is, a multimetric, majority-voting filter and a deep dropout neural network, to dynamically and automatically select the best set of features for each behavioral outcome. We also conducted an experimental case study on existing study data with 102 survivors of acute lymphoblastic leukemia (aged 15-39 years at evaluation and >5 years postcancer diagnosis) who were treated in a public hospital in Hong Kong. Finally, we designed and implemented radial charts to illustrate the significance of the selected features on each behavioral outcome to support clinical professionals’ future treatment and diagnoses. Results: In this pilot study, we demonstrated that our approach outperforms the traditional statistical and computation methods, including linear and nonlinear feature selectors, for the addressed top-priority behavioral outcomes. Our approach holistically has higher F1, precision, and recall scores compared to existing feature selection methods. The models in this study select several significant clinical and socioenvironmental variables as risk factors associated with the development of behavioral problems in young survivors of acute lymphoblastic leukemia. Conclusions: Our novel feature selection algorithm has the potential to improve machine learning classifiers’ capability to predict adverse long-term behavioral outcomes in survivors of cancer. %M 40080820 %R 10.2196/65001 %U https://bioinform.jmir.org/2025/1/e65001 %U https://doi.org/10.2196/65001 %U http://www.ncbi.nlm.nih.gov/pubmed/40080820 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e57257 %T Assessing Racial and Ethnic Bias in Text Generation by Large Language Models for Health Care–Related Tasks: Cross-Sectional Study %A Hanna,John J %A Wakene,Abdi D %A Johnson,Andrew O %A Lehmann,Christoph U %A Medford,Richard J %+ Information Services, ECU Health, 2100 Stantonsburg Rd, Greenville, NC, 27834, United States, 1 2528474100, john.hanna@ecuhealth.org %K sentiment analysis %K racism %K bias %K artificial intelligence %K reading ease %K word frequency %K large language models %K text generation %K healthcare %K task %K ChatGPT %K cross sectional %K consumer-directed %K human immunodeficiency virus %D 2025 %7 13.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Racial and ethnic bias in large language models (LLMs) used for health care tasks is a growing concern, as it may contribute to health disparities. In response, LLM operators implemented safeguards against prompts that are overtly seeking certain biases. Objective: This study aims to investigate a potential racial and ethnic bias among 4 popular LLMs: GPT-3.5-turbo (OpenAI), GPT-4 (OpenAI), Gemini-1.0-pro (Google), and Llama3-70b (Meta) in generating health care consumer–directed text in the absence of overtly biased queries. Methods: In this cross-sectional study, the 4 LLMs were prompted to generate discharge instructions for patients with HIV. Each patient’s encounter deidentified metadata including race/ethnicity as a variable was passed over in a table format through a prompt 4 times, altering only the race/ethnicity information (African American, Asian, Hispanic White, and non-Hispanic White) each time, while keeping all other information constant. The prompt requested the model to write discharge instructions for each encounter without explicitly mentioning race or ethnicity. The LLM-generated instructions were analyzed for sentiment, subjectivity, reading ease, and word frequency by race/ethnicity. Results: The only observed statistically significant difference between race/ethnicity groups was found in entity count (GPT-4, df=42, P=.047). However, post hoc chi-square analysis for GPT-4’s entity counts showed no significant pairwise differences among race/ethnicity categories after Bonferroni correction. Conclusions: A total of 4 LLMs were relatively invariant to race/ethnicity in terms of linguistic and readability measures. While our study used proxy linguistic and readability measures to investigate racial and ethnic bias among 4 LLM responses in a health care–related task, there is an urgent need to establish universally accepted standards for measuring bias in LLM-generated responses. Further studies are needed to validate these results and assess their implications. %M 40080818 %R 10.2196/57257 %U https://www.jmir.org/2025/1/e57257 %U https://doi.org/10.2196/57257 %U http://www.ncbi.nlm.nih.gov/pubmed/40080818 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 4 %N %P e55277 %T Creation of Scientific Response Documents for Addressing Product Medical Information Inquiries: Mixed Method Approach Using Artificial Intelligence %A Lau,Jerry %A Bisht,Shivani %A Horton,Robert %A Crisan,Annamaria %A Jones,John %A Gantotti,Sandeep %A Hermes-DeSantis,Evelyn %+ phactMI, 5931 NW 1st Place, Gainesville, FL, 32607, United States, 1 2155881585, evelyn@phactmi.org %K AI %K LLM %K GPT %K biopharmaceutical %K medical information %K content generation %K artificial intelligence %K pharmaceutical %K scientific response %K documentation %K information %K clinical data %K strategy %K reference %K feasibility %K development %K machine learning %K large language model %K accuracy %K context %K traceability %K accountability %K survey %K scientific response documentation %K SRD %K benefit %K content generator %K content analysis %K Generative Pre-trained Transformer %D 2025 %7 13.3.2025 %9 Original Paper %J JMIR AI %G English %X Background: Pharmaceutical manufacturers address health care professionals’ information needs through scientific response documents (SRDs), offering evidence-based answers to medication and disease state questions. Medical information departments, staffed by medical experts, develop SRDs that provide concise summaries consisting of relevant background information, search strategies, clinical data, and balanced references. With an escalating demand for SRDs and the increasing complexity of therapies, medical information departments are exploring advanced technologies and artificial intelligence (AI) tools like large language models (LLMs) to streamline content development. While AI and LLMs show promise in generating draft responses, a synergistic approach combining an LLM with traditional machine learning classifiers in a series of human-supervised and -curated steps could help address limitations, including hallucinations. This will ensure accuracy, context, traceability, and accountability in the development of the concise clinical data summaries of an SRD. Objective: This study aims to quantify the challenges of SRD development and develop a framework exploring the feasibility and value addition of integrating AI capabilities in the process of creating concise summaries for an SRD. Methods: To measure the challenges in SRD development, a survey was conducted by phactMI, a nonprofit consortium of medical information leaders in the pharmaceutical industry, assessing aspects of SRD creation among its member companies. The survey collected data on the time and tediousness of various activities related to SRD development. Another working group, consisting of medical information professionals and data scientists, used AI to aid SRD authoring, focusing on data extraction and abstraction. They used logistic regression on semantic embedding features to train classification models and transformer-based summarization pipelines to generate concise summaries. Results: Of the 33 companies surveyed, 64% (21/33) opened the survey, and 76% (16/21) of those responded. On average, medical information departments generate 614 new documents and update 1352 documents each year. Respondents considered paraphrasing scientific articles to be the most tedious and time-intensive task. In the project’s second phase, sentence classification models showed the ability to accurately distinguish target categories with receiver operating characteristic scores ranging from 0.67 to 0.85 (all P<.001), allowing for accurate data extraction. For data abstraction, the comparison of the bilingual evaluation understudy (BLEU) score and semantic similarity in the paraphrased texts yielded different results among reviewers, with each preferring different trade-offs between these metrics. Conclusions: This study establishes a framework for integrating LLM and machine learning into SRD development, supported by a pharmaceutical company survey emphasizing the challenges of paraphrasing content. While machine learning models show potential for section identification and content usability assessment in data extraction and abstraction, further optimization and research are essential before full-scale industry implementation. The working group’s insights guide an AI-driven content analysis; address limitations; and advance efficient, precise, and responsive frameworks to assist with pharmaceutical SRD development. %M 40080808 %R 10.2196/55277 %U https://ai.jmir.org/2025/1/e55277 %U https://doi.org/10.2196/55277 %U http://www.ncbi.nlm.nih.gov/pubmed/40080808 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e65776 %T Caregiving Artificial Intelligence Chatbot for Older Adults and Their Preferences, Well-Being, and Social Connectivity: Mixed-Method Study %A Wolfe,Brooke H %A Oh,Yoo Jung %A Choung,Hyesun %A Cui,Xiaoran %A Weinzapfel,Joshua %A Cooper,R Amanda %A Lee,Hae-Na %A Lehto,Rebecca %+ Department of Communication, Michigan State University, 404 Wilson Road, Room 473, East Lansing, MI, 48824, United States, 1 517 355 3470, wolfebro@msu.edu %K older adults %K technology use %K AI chatbots %K artificial intelligence %K well-being %K social connectedness %K mobile phone %D 2025 %7 13.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: The increasing number of older adults who are living alone poses challenges for maintaining their well-being, as they often need support with daily tasks, health care services, and social connections. However, advancements in artificial intelligence (AI) technologies have revolutionized health care and caregiving through their capacity to monitor health, provide medication and appointment reminders, and provide companionship to older adults. Nevertheless, the adaptability of these technologies for older adults is stymied by usability issues. This study explores how older adults use and adapt to AI technologies, highlighting both the persistent barriers and opportunities for potential enhancements. Objective: This study aimed to provide deeper insights into older adults’ engagement with technology and AI. The technologies currently used, potential technologies desired for daily life integration, personal technology concerns faced, and overall attitudes toward technology and AI are explored. Methods: Using mixed methods, participants (N=28) completed both a semistructured interview and surveys consisting of health and well-being measures. Participants then participated in a research team–facilitated interaction with an AI chatbot, Amazon Alexa. Interview transcripts were analyzed using thematic analysis, and surveys were evaluated using descriptive statistics. Results: Participants’ average age was 71 years (ranged from 65 years to 84 years). Most participants were familiar with technology use, especially using smartphones (26/28, 93%) and desktops and laptops (21/28, 75%). Participants rated appointment reminders (25/28, 89%), emergency assistance (22/28, 79%), and health monitoring (21/28, 75%). Participants rated appointment reminders (25/28, 89.3%), emergency assistance (22/28, 78.6%), and health monitoring (21/28, 75%) as the most desirable features of AI chatbots for adoption. Digital devices were commonly used for entertainment, health management, professional productivity, and social connectivity. Participants were most interested in integrating technology into their personal lives for scheduling reminders, chore assistance, and providing care to others. Challenges in using new technology included a commitment to learning new technologies, concerns about lack of privacy, and worries about future technology dependence. Overall, older adults’ attitudes coalesced into 3 orientations, which we label as technology adapters, technologically wary, and technology resisters. These results illustrate that not all older adults were resistant to technology and AI. Instead, older adults are aligned with categories on a spectrum between willing, hesitant but willing, and unwilling to use technology and AI. Researchers can use these findings by asking older adults about their orientation toward technology to facilitate the integration of new technologies with each person’s comfortability and preferences. Conclusions: To ensure that AI technologies effectively support older adults, it is essential to foster an ongoing dialogue among developers, older adults, families, and their caregivers, focusing on inclusive designs to meet older adults’ needs. %M 40080043 %R 10.2196/65776 %U https://www.jmir.org/2025/1/e65776 %U https://doi.org/10.2196/65776 %U http://www.ncbi.nlm.nih.gov/pubmed/40080043 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 4 %N %P e67696 %T Evaluation of ChatGPT Performance on Emergency Medicine Board Examination Questions: Observational Study %A Pastrak,Mila %A Kajitani,Sten %A Goodings,Anthony James %A Drewek,Austin %A LaFree,Andrew %A Murphy,Adrian %K artificial intelligence %K ChatGPT-4 %K medical education %K emergency medicine %K examination %K examination preparation %D 2025 %7 12.3.2025 %9 %J JMIR AI %G English %X Background: The ever-evolving field of medicine has highlighted the potential for ChatGPT as an assistive platform. However, its use in medical board examination preparation and completion remains unclear. Objective: This study aimed to evaluate the performance of a custom-modified version of ChatGPT-4, tailored with emergency medicine board examination preparatory materials (Anki flashcard deck), compared to its default version and previous iteration (3.5). The goal was to assess the accuracy of ChatGPT-4 answering board-style questions and its suitability as a tool to aid students and trainees in standardized examination preparation. Methods: A comparative analysis was conducted using a random selection of 598 questions from the Rosh In-Training Examination Question Bank. The subjects of the study included three versions of ChatGPT: the Default, a Custom, and ChatGPT-3.5. The accuracy, response length, medical discipline subgroups, and underlying causes of error were analyzed. Results: The Custom version did not demonstrate a significant improvement in accuracy over the Default version (P=.61), although both significantly outperformed ChatGPT-3.5 (P<.001). The Default version produced significantly longer responses than the Custom version, with the mean (SD) values being 1371 (444) and 929 (408), respectively (P<.001). Subgroup analysis revealed no significant difference in the performance across different medical subdisciplines between the versions (P>.05 in all cases). Both the versions of ChatGPT-4 had similar underlying error types (P>.05 in all cases) and had a 99% predicted probability of passing while ChatGPT-3.5 had an 85% probability. Conclusions: The findings suggest that while newer versions of ChatGPT exhibit improved performance in emergency medicine board examination preparation, specific enhancement with a comprehensive Anki flashcard deck on the topic does not significantly impact accuracy. The study highlights the potential of ChatGPT-4 as a tool for medical education, capable of providing accurate support across a wide range of topics in emergency medicine in its default form. %R 10.2196/67696 %U https://ai.jmir.org/2025/1/e67696 %U https://doi.org/10.2196/67696 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e64325 %T Patient Perspectives on Conversational Artificial Intelligence for Atrial Fibrillation Self-Management: Qualitative Analysis %A Trivedi,Ritu %A Shaw,Tim %A Sheahen,Brodie %A Chow,Clara K %A Laranjo,Liliana %+ Westmead Applied Research Centre, Faculty of Medicine and Health, The University of Sydney, Level 5, Block K, Westmead Hospital, Hawkesbury Road, Westmead, 2145, Australia, 61 2 8890 3125, liliana.laranjo@sydney.edu.au %K atrial fibrillation %K conversational agents %K qualitative research %K self-management %K digital health %K patient perspective %K conversational artificial intelligence %K speech recognition %D 2025 %7 12.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Conversational artificial intelligence (AI) allows for engaging interactions, however, its acceptability, barriers, and enablers to support patients with atrial fibrillation (AF) are unknown. Objective: This work stems from the Coordinating Health care with AI–supported Technology for patients with AF (CHAT-AF) trial and aims to explore patient perspectives on receiving support from a conversational AI support program. Methods: Patients with AF recruited for a randomized controlled trial who received the intervention were approached for semistructured interviews using purposive sampling. The 6-month intervention consisted of fully automated conversational AI phone calls (with speech recognition and natural language processing) that assessed patient health and provided self-management support and education. Interviews were recorded, transcribed, and thematically analyzed. Results: We conducted 30 interviews (mean age 65.4, SD 11.9 years; 21/30, 70% male). Four themes were identified: (1) interaction with a voice-based conversational AI program (human-like interactions, restriction to prespecified responses, trustworthiness of hospital-delivered conversational AI); (2) engagement is influenced by the personalization of content, delivery mode, and frequency (tailoring to own health context, interest in novel information regarding health, overwhelmed with large volumes of information, flexibility provided by multichannel delivery); (3) improving access to AF care and information (continuity in support, enhancing access to health-related information); (4) empowering patients to better self-manage their AF (encouraging healthy habits through frequent reminders, reassurance from rhythm-monitoring devices). Conclusions: Although conversational AI was described as an engaging way to receive education and self-management support, improvements such as enhanced dialogue flexibility to allow for more naturally flowing conversations and tailoring to patient health context were also mentioned. Trial Registration: Australian New Zealand Clinical Trials Registry ACTRN12621000174886; https://tinyurl.com/3nn7tk72 International Registered Report Identifier (IRRID): RR2-10.2196/34470 %M 40073398 %R 10.2196/64325 %U https://www.jmir.org/2025/1/e64325 %U https://doi.org/10.2196/64325 %U http://www.ncbi.nlm.nih.gov/pubmed/40073398 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e63895 %T The Perceptions of Potential Prerequisites for Artificial Intelligence in Danish General Practice: Vignette-Based Interview Study Among General Practitioners %A Jørgensen,Natasha Lee %A Merrild,Camilla Hoffmann %A Jensen,Martin Bach %A Moeslund,Thomas B %A Kidholm,Kristian %A Thomsen,Janus Laust %K general practice %K general practitioners %K GPs %K artificial intelligence %K AI %K prerequisites %K interviews %K vignettes %K qualitative study %K thematic analysis %D 2025 %7 12.3.2025 %9 %J JMIR Med Inform %G English %X Background: Artificial intelligence (AI) has been deemed revolutionary in medicine; however, no AI tools have been implemented or validated in Danish general practice. General practice in Denmark has an excellent digitization system for developing and using AI. Nevertheless, there is a lack of involvement of general practitioners (GPs) in developing AI. The perspectives of GPs as end users are essential for facilitating the next stage of AI development in general practice. Objective: This study aimed to identify the essential prerequisites that GPs perceive as necessary to realize the potential of AI in Danish general practice. Methods: This study used semistructured interviews and vignettes among GPs to gain perspectives on the potential of AI in general practice. A total of 12 GPs interested in the potential of AI in general practice were interviewed in 2019 and 2021. The interviews were transcribed verbatim and thematic analysis was conducted to identify the dominant themes throughout the data. Results: In the data analysis, four main themes were identified as essential prerequisites for GPs when considering the potential of AI in general practice: (1) AI must begin with the low-hanging fruit, (2) AI must be meaningful in the GP’s work, (3) the GP-patient relationship must be maintained despite AI, and (4) AI must be a free, active, and integrated option in the electronic health record (EHR). These 4 themes suggest that the development of AI should initially focus on low-complexity tasks that do not influence patient interactions but facilitate GPs’ work in a meaningful manner as an integrated part of the EHR. Examples of this include routine and administrative tasks. Conclusions: The research findings outline the participating GPs’ perceptions of the essential prerequisites to consider when exploring the potential applications of AI in primary care settings. We believe that these perceptions of potential prerequisites can support the initial stages of future development and assess the suitability of existing AI tools for general practice. %R 10.2196/63895 %U https://medinform.jmir.org/2025/1/e63895 %U https://doi.org/10.2196/63895 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e68442 %T Two-Year Hypertension Incidence Risk Prediction in Populations in the Desert Regions of Northwest China: Prospective Cohort Study %A Cheng,Yinlin %A Gu,Kuiying %A Ji,Weidong %A Hu,Zhensheng %A Yang,Yining %A Zhou,Yi %+ Zhongshan School of Medicine, Sun Yat-sen University, 74 Zhongshan 2nd Road, Yuexiu District, Guangzhou, 510080, China, 86 020 87332139, zhouyi@mail.sysu.edu.cn %K hypertension %K desert %K machine learning %K deep learning %K prevention %K clinical applicability %D 2025 %7 12.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Hypertension is a major global health issue and a significant modifiable risk factor for cardiovascular diseases, contributing to a substantial socioeconomic burden due to its high prevalence. In China, particularly among populations living near desert regions, hypertension is even more prevalent due to unique environmental and lifestyle conditions, exacerbating the disease burden in these areas, underscoring the urgent need for effective early detection and intervention strategies. Objective: This study aims to develop, calibrate, and prospectively validate a 2-year hypertension risk prediction model by using large-scale health examination data collected from populations residing in 4 regions surrounding the Taklamakan Desert of northwest China. Methods: We retrospectively analyzed the health examination data of 1,038,170 adults (2019-2021) and prospectively validated our findings in a separate cohort of 961,519 adults (2021-2023). Data included demographics, lifestyle factors, physical examinations, and laboratory measurements. Feature selection was performed using light gradient-boosting machine–based recursive feature elimination with cross-validation and Least Absolute Shrinkage and Selection Operator, yielding 24 key predictors. Multiple machine learning (logistic regression, random forest, extreme gradient boosting, light gradient-boosting machine) and deep learning (Feature Tokenizer + Transformer, SAINT) models were trained with Bayesian hyperparameter optimization. Results: Over a 2-year follow-up, 15.20% (157,766/1,038,170) of the participants in the retrospective cohort and 10.50% (101,077/961,519) in the prospective cohort developed hypertension. Among the models developed, the CatBoost model demonstrated the best performance, achieving area under the curve (AUC) values of 0.888 (95% CI 0.886-0.889) in the retrospective cohort and 0.803 (95% CI 0.801-0.804) in the prospective cohort. Calibration via isotonic regression improved the model’s probability estimates, with Brier scores of 0.090 (95% CI 0.089-0.091) and 0.102 (95% CI 0.101-0.103) in the internal validation and prospective cohorts, respectively. Participants were ranked by the positive predictive value calculated using the calibrated model and stratified into 4 risk categories (low, medium, high, and very high), with the very high group exhibiting a 41.08% (5741/13,975) hypertension incidence over 2 years. Age, BMI, and socioeconomic factors were identified as significant predictors of hypertension. Conclusions: Our machine learning model effectively predicted the 2-year risk of hypertension, making it particularly suitable for preventive health care management in high-risk populations residing in the desert regions of China. Our model exhibited excellent predictive performance and has potential for clinical application. A web-based application was developed based on our predictive model, which further enhanced the accessibility for clinical and public health use, aiding in reducing the burden of hypertension through timely prevention strategies. %M 40072485 %R 10.2196/68442 %U https://www.jmir.org/2025/1/e68442 %U https://doi.org/10.2196/68442 %U http://www.ncbi.nlm.nih.gov/pubmed/40072485 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67488 %T Accuracy of Large Language Models for Literature Screening in Thoracic Surgery: Diagnostic Study %A Dai,Zhang-Yi %A Wang,Fu-Qiang %A Shen,Cheng %A Ji,Yan-Li %A Li,Zhi-Yang %A Wang,Yun %A Pu,Qiang %+ Department of Thoracic Surgery, West China Hospital of Sichuan University, No.37, Guoxue Alley, Chengdu, 610041, China, 86 18980606738, puqiang100@163.com %K accuracy %K large language models %K meta-analysis %K literature screening %K thoracic surgery %D 2025 %7 11.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Systematic reviews and meta-analyses rely on labor-intensive literature screening. While machine learning offers potential automation, its accuracy remains suboptimal. This raises the question of whether emerging large language models (LLMs) can provide a more accurate and efficient approach. Objective: This paper evaluates the sensitivity, specificity, and summary receiver operating characteristic (SROC) curve of LLM-assisted literature screening. Methods: We conducted a diagnostic study comparing the accuracy of LLM-assisted screening versus manual literature screening across 6 thoracic surgery meta-analyses. Manual screening by 2 investigators served as the reference standard. LLM-assisted screening was performed using ChatGPT-4o (OpenAI) and Claude-3.5 (Anthropic) sonnet, with discrepancies resolved by Gemini-1.5 pro (Google). In addition, 2 open-source, machine learning–based screening tools, ASReview (Utrecht University) and Abstrackr (Center for Evidence Synthesis in Health, Brown University School of Public Health), were also evaluated. We calculated sensitivity, specificity, and 95% CIs for the title and abstract, as well as full-text screening, generating pooled estimates and SROC curves. LLM prompts were revised based on a post hoc error analysis. Results: LLM-assisted full-text screening demonstrated high pooled sensitivity (0.87, 95% CI 0.77-0.99) and specificity (0.96, 95% CI 0.91-0.98), with the area under the curve (AUC) of 0.96 (95% CI 0.94-0.97). Title and abstract screening achieved a pooled sensitivity of 0.73 (95% CI 0.57-0.85) and specificity of 0.99 (95% CI 0.97-0.99), with an AUC of 0.97 (95% CI 0.96-0.99). Post hoc revisions improved sensitivity to 0.98 (95% CI 0.74-1.00) while maintaining high specificity (0.98, 95% CI 0.94-0.99). In comparison, the pooled sensitivity and specificity of ASReview tool-assisted screening were 0.58 (95% CI 0.53-0.64) and 0.97 (95% CI 0.91-0.99), respectively, with an AUC of 0.66 (95% CI 0.62-0.70). The pooled sensitivity and specificity of Abstrackr tool-assisted screening were 0.48 (95% CI 0.35-0.62) and 0.96 (95% CI 0.88-0.99), respectively, with an AUC of 0.78 (95% CI 0.74-0.82). A post hoc meta-analysis revealed comparable effect sizes between LLM-assisted and conventional screening. Conclusions: LLMs hold significant potential for streamlining literature screening in systematic reviews, reducing workload without sacrificing quality. Importantly, LLMs outperformed traditional machine learning-based tools (ASReview and Abstrackr) in both sensitivity and AUC values, suggesting that LLMs offer a more accurate and efficient approach to literature screening. %M 40068152 %R 10.2196/67488 %U https://www.jmir.org/2025/1/e67488 %U https://doi.org/10.2196/67488 %U http://www.ncbi.nlm.nih.gov/pubmed/40068152 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e58855 %T The Reliability and Quality of Videos as Guidance for Gastrointestinal Endoscopy: Cross-Sectional Study %A Liu,Jinpei %A Qiu,Yifan %A Liu,Yilong %A Xu,Wenping %A Ning,Weichen %A Shi,Peimei %A Yuan,Zongli %A Wang,Fang %A Shi,Yihai %+ Department of Gastroenterology, Gongli Hospital of Shanghai Pudong New Area, Pudong New Area 219 Miaopu Road, Shanghai, 200135, China, 86 5885873, syh01206@163.com %K gastrointestinal endoscopy %K YouTube %K patient education %K social media gastrointestinal %K large language model %K LLM %K reliability %K quality %K video %K cross-sectional study %K endoscopy-related videos %K health information %K endoscopy %K gastroscopy %K colonoscopy %D 2025 %7 11.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Gastrointestinal endoscopy represents a useful tool for the diagnosis and treatment of gastrointestinal diseases. Video platforms for spreading endoscopy-related knowledge may help patients understand the pros and cons of endoscopy on the premise of ensuring accuracy. However, videos with misinformation may lead to adverse consequences. Objective: This study aims to evaluate the quality of gastrointestinal endoscopy-related videos on YouTube and to assess whether large language models (LLMs) can help patients obtain information from videos more efficiently. Methods: We collected information from YouTube videos about 3 commonly used gastrointestinal endoscopes (gastroscopy, colonoscopy, and capsule endoscopy) and assessed their quality (rated by the modified DISCERN Tool, mDISCERN), reliability (rated by the Journal of the American Medical Association), and recommendation (rated by the Global Quality Score). We tasked LLM with summarizing the video content and assessed it from 3 perspectives: accuracy, completeness, and readability. Results: A total of 167 videos were included. According to the indicated scoring, the quality, reliability, and recommendation of the 3 gastrointestinal endoscopy-related videos on YouTube were overall unsatisfactory, and the quality of the videos released by patients was particularly poor. Capsule endoscopy yielded a significantly lower Global Quality Score than did gastroscopy and colonoscopy. LLM-based summaries yielded accuracy scores of 4 (IQR 4-5), completeness scores of 4 (IQR 4-5), and readability scores of 2 (IQR 1-2). Conclusions: The quality of gastrointestinal endoscope-related videos currently on YouTube is poor. Moreover, additional regulatory and improvement strategies are needed in the future. LLM may be helpful in generalizing video-related information, but there is still room for improvement in its ability. %M 40068165 %R 10.2196/58855 %U https://www.jmir.org/2025/1/e58855 %U https://doi.org/10.2196/58855 %U http://www.ncbi.nlm.nih.gov/pubmed/40068165 %0 Journal Article %@ 2561-6722 %I JMIR Publications %V 8 %N %P e59377 %T Fetal Birth Weight Prediction in the Third Trimester: Retrospective Cohort Study and Development of an Ensemble Model %A Gao,Jing %A Jie,Xu %A Yao,Yujun %A Xue,Jingdong %A Chen,Lei %A Chen,Ruiyao %A Chen,Jiayuan %A Cheng,Weiwei %K fetal birthweight %K ensemble learning model %K machine learning %K prediction model %K ultrasonography %K macrosomia %K low birth weight %K birth weight %K fetal %K AI %K artificial intelligence %K prenatal %K prenatal care %K Shanghai %K neonatal %K maternal %K parental %D 2025 %7 10.3.2025 %9 %J JMIR Pediatr Parent %G English %X Background: Accurate third-trimester birth weight prediction is vital for reducing adverse outcomes, and machine learning (ML) offers superior precision over traditional ultrasound methods. Objective: This study aims to develop an ML model on the basis of clinical big data for accurate prediction of birth weight in the third trimester of pregnancy, which can help reduce adverse maternal and fetal outcomes. Methods: From January 1, 2018 to December 31, 2019, a retrospective cohort study involving 16,655 singleton live births without congenital anomalies (>28 weeks of gestation) was conducted in a tertiary first-class hospital in Shanghai. The initial set of data was divided into a train set for algorithm development and a test set on which the algorithm was divided in a ratio of 4:1. We extracted maternal and neonatal delivery outcomes, as well as parental demographics, obstetric clinical data, and sonographic fetal biometry, from electronic medical records. A total of 5 basic ML algorithms, including Ridge, SVM, Random Forest, extreme gradient boosting (XGBoost), and Multi-Layer Perceptron, were used to develop the prediction model, which was then averaged into an ensemble learning model. The models were compared using accuracy, mean squared error, root mean squared error, and mean absolute error. International Peace Maternity and Child Health Hospital's Research Ethics Committee granted ethical approval for the usage of patient information (GKLW2021-20). Results: Train and test sets contained a total of 13,324 and 3331 cases, respectively. From a total of 59 variables, we selected 17 variables that were readily available for the “few feature model,” which achieved high predictive power with an accuracy of 81% and significantly exceeded ultrasound formula methods. In addition, our model maintained superior performance for low birth weight and macrosomic fetal populations. Conclusions: Our research investigated an innovative artificial intelligence model for predicting fetal birth weight and maximizing health care resource use. In the era of big data, our model improves maternal and fetal outcomes and promotes precision medicine. %R 10.2196/59377 %U https://pediatrics.jmir.org/2025/1/e59377 %U https://doi.org/10.2196/59377 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e65651 %T Assessment of the Efficiency of a ChatGPT-Based Tool, MyGenAssist, in an Industry Pharmacovigilance Department for Case Documentation: Cross-Over Study %A Benaïche,Alexandre %A Billaut-Laden,Ingrid %A Randriamihaja,Herivelo %A Bertocchio,Jean-Philippe %+ Bayer Healthcare SAS France, 1 Rue Claude Bernard, Lille, 59000, France, 33 320445962, benaichealexandre@gmail.com %K MyGenAssist %K large language model %K artificial intelligence %K ChatGPT %K pharmacovigilance %K efficiency %D 2025 %7 10.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: At the end of 2023, Bayer AG launched its own internal large language model (LLM), MyGenAssist, based on ChatGPT technology to overcome data privacy concerns. It may offer the possibility to decrease their harshness and save time spent on repetitive and recurrent tasks that could then be dedicated to activities with higher added value. Although there is a current worldwide reflection on whether artificial intelligence should be integrated into pharmacovigilance, medical literature does not provide enough data concerning LLMs and their daily applications in such a setting. Here, we studied how this tool could improve the case documentation process, which is a duty for authorization holders as per European and French good vigilance practices. Objective: The aim of the study is to test whether the use of an LLM could improve the pharmacovigilance documentation process. Methods: MyGenAssist was trained to draft templates for case documentation letters meant to be sent to the reporters. Information provided within the template changes depending on the case: such data come from a table sent to the LLM. We then measured the time spent on each case for a period of 4 months (2 months before using the tool and 2 months after its implementation). A multiple linear regression model was created with the time spent on each case as the explained variable, and all parameters that could influence this time were included as explanatory variables (use of MyGenAssist, type of recipient, number of questions, and user). To test if the use of this tool impacts the process, we compared the recipients’ response rates with and without the use of MyGenAssist. Results: An average of 23.3% (95% CI 13.8%-32.8%) of time saving was made thanks to MyGenAssist (P<.001; adjusted R2=0.286) on each case, which could represent an average of 10.7 (SD 3.6) working days saved each year. The answer rate was not modified by the use of MyGenAssist (20/48, 42% vs 27/74, 36%; P=.57) whether the recipient was a physician or a patient. No significant difference was found regarding the time spent by the recipient to answer (mean 2.20, SD 3.27 days vs mean 2.65, SD 3.30 days after the last attempt of contact; P=.64). The implementation of MyGenAssist for this activity only required a 2-hour training session for the pharmacovigilance team. Conclusions: Our study is the first to show that a ChatGPT-based tool can improve the efficiency of a good practice activity without needing a long training session for the affected workforce. These first encouraging results could be an incentive for the implementation of LLMs in other processes. %M 40063946 %R 10.2196/65651 %U https://www.jmir.org/2025/1/e65651 %U https://doi.org/10.2196/65651 %U http://www.ncbi.nlm.nih.gov/pubmed/40063946 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e59892 %T Intelligent Robot Interventions for People With Dementia: Systematic Review and Meta-Analysis of Randomized Controlled Trials %A Fan,Wenqi %A Zhao,Rui %A Liu,Xiaoxia %A Ge,Lina %+ Department of Obstetrics and Gynecology, Shengjing Hospital of China Medical University, Heping District/Sanhao Street, 36th, Shenyang, 110004, China, 86 18940251669, geln@sj-hospital.org %K intelligent robot %K artificial intelligence %K dementia %K agitation %K anxiety %K meta-analysis %D 2025 %7 10.3.2025 %9 Review %J J Med Internet Res %G English %X Background: The application of intelligent robots in therapy is becoming more and more important for people with dementia. More extensive research is still needed to evaluate its impact on behavioral and psychological dementia symptoms, as well as quality of life in different care settings. Objective: The purpose of this research is to methodically assess how well intelligence robot interventions work for patients with dementia. Methods: In accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 guidelines, a comprehensive search was conducted on PubMed, CINAHL, the Cochrane Library, Embase, and Web of Science from the time of their founding to February 2024, to identify relevant randomized controlled trials on the use of intelligent robots in people with dementia. Two authors (WF and RZ) independently applied the Cochrane Collaboration bias assessment tool to assess the included studies’ quality. The intervention effect of intelligent robots on patients with dementia was summarized using a fixed-effect model or a random-effects model with Stata software (version 16.0; StataCorp). Subgroup analysis was performed according to the intelligent robot type and the intervention duration. Publication bias was tested using funnel plots, Egger tests, and the trim-and-fill method. Results: In total, 15 studies were finally included for systematic review, encompassing 705 participants, of which 12 studies were subjected to meta-analysis. The meta-analysis found that compared with the control group, intelligent robot intervention significantly reduced the levels of agitation (standardized mean difference –0.36, 95% CI –0.56 to –0.17; P<.001) and anxiety (weighted mean difference –1.93, 95% CI –3.13 to –0.72; P=.002) in patients with dementia. However, the intervention of intelligent robots had no significant effect on the following (all P>.05): cognitive function, neuropsychiatric symptoms, depression, quality of life, step count during the day, and the hours of lying down during the night of patients with dementia. Subgroup analysis revealed that the improvement of depression was related to the duration of the intervention (≤12 vs 12 weeks: 0.08, 95% CI –0.20 to 0.37 vs –0.68, 95% CI –1.00 to –0.37; P=.26) and was independent of the type of intelligent robots (animal robots vs humanoid robots: –0.30, 95% CI –0.75 to 0.15 vs 0.07, 95% CI –0.21 to –0.34; P=.26). Conclusions: This study shows that intelligent robot intervention can help improve the agitation and anxiety levels of people with dementia. The intervention may be more effective the longer it is implemented. The appearance of the intelligent robot has no effect on the intervention effect. Further research is needed to help collect physiological data, such as physical activity in people with dementia; explore the impact of other intelligent robot design features on the intervention effect; and provide a reference for improving intelligent robots and intervention programs. Trial Registration: PROSPERO CRD42024523007; https://tinyurl.com/mwscn985 %M 40063933 %R 10.2196/59892 %U https://www.jmir.org/2025/1/e59892 %U https://doi.org/10.2196/59892 %U http://www.ncbi.nlm.nih.gov/pubmed/40063933 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e59792 %T Generative AI Models in Time-Varying Biomedical Data: Scoping Review %A He,Rosemary %A Sarwal,Varuni %A Qiu,Xinru %A Zhuang,Yongwen %A Zhang,Le %A Liu,Yue %A Chiang,Jeffrey %+ Department of Neurosurgery, David Geffen School of Medicine, University of California, Los Angeles, 300 Stein Plaza, Suite 560, Los Angeles, CA, 90095, United States, 1 310 825 5111, njchiang@g.ucla.edu %K generative artificial intelligence %K artificial intelligence %K time series %K electronic health records %K electronic medical records %K systematic reviews %K disease trajectory %K machine learning %K algorithms %K forecasting %D 2025 %7 10.3.2025 %9 Review %J J Med Internet Res %G English %X Background: Trajectory modeling is a long-standing challenge in the application of computational methods to health care. In the age of big data, traditional statistical and machine learning methods do not achieve satisfactory results as they often fail to capture the complex underlying distributions of multimodal health data and long-term dependencies throughout medical histories. Recent advances in generative artificial intelligence (AI) have provided powerful tools to represent complex distributions and patterns with minimal underlying assumptions, with major impact in fields such as finance and environmental sciences, prompting researchers to apply these methods for disease modeling in health care. Objective: While AI methods have proven powerful, their application in clinical practice remains limited due to their highly complex nature. The proliferation of AI algorithms also poses a significant challenge for nondevelopers to track and incorporate these advances into clinical research and application. In this paper, we introduce basic concepts in generative AI and discuss current algorithms and how they can be applied to health care for practitioners with little background in computer science. Methods: We surveyed peer-reviewed papers on generative AI models with specific applications to time-series health data. Our search included single- and multimodal generative AI models that operated over structured and unstructured data, physiological waveforms, medical imaging, and multi-omics data. We introduce current generative AI methods, review their applications, and discuss their limitations and future directions in each data modality. Results: We followed the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines and reviewed 155 articles on generative AI applications to time-series health care data across modalities. Furthermore, we offer a systematic framework for clinicians to easily identify suitable AI methods for their data and task at hand. Conclusions: We reviewed and critiqued existing applications of generative AI to time-series health data with the aim of bridging the gap between computational methods and clinical application. We also identified the shortcomings of existing approaches and highlighted recent advances in generative AI that represent promising directions for health care modeling. %M 40063929 %R 10.2196/59792 %U https://www.jmir.org/2025/1/e59792 %U https://doi.org/10.2196/59792 %U http://www.ncbi.nlm.nih.gov/pubmed/40063929 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67871 %T Application of Machine Learning for Patients With Cardiac Arrest: Systematic Review and Meta-Analysis %A Wei,Shengfeng %A Guo,Xiangjian %A He,Shilin %A Zhang,Chunhua %A Chen,Zhizhuan %A Chen,Jianmei %A Huang,Yanmei %A Zhang,Fan %A Liu,Qiangqiang %+ Department of Emergency Medicine, The First Affiliated Hospital, Sun Yat-sen University, 58 Zhongshan 2nd Road, Yuexiu District, Guangzhou, 510000, China, 86 18928825921, liuqq9@mail.sysu.edu.cn %K cardiac arrest %K machine learning %K prognosis %K systematic review %K artificial intelligence %K AI %D 2025 %7 10.3.2025 %9 Review %J J Med Internet Res %G English %X Background: Currently, there is a lack of effective early assessment tools for predicting the onset and development of cardiac arrest (CA). With the increasing attention of clinical researchers on machine learning (ML), some researchers have developed ML models for predicting the occurrence and prognosis of CA, with certain models appearing to outperform traditional scoring tools. However, these models still lack systematic evidence to substantiate their efficacy. Objective: This systematic review and meta-analysis was conducted to evaluate the prediction value of ML in CA for occurrence, good neurological prognosis, mortality, and the return of spontaneous circulation (ROSC), thereby providing evidence-based support for the development and refinement of applicable clinical tools. Methods: PubMed, Embase, the Cochrane Library, and Web of Science were systematically searched from their establishment until May 17, 2024. The risk of bias in all prediction models was assessed using the Prediction Model Risk of Bias Assessment Tool. Results: In total, 93 studies were selected, encompassing 5,729,721 in-hospital and out-of-hospital patients. The meta-analysis revealed that, for predicting CA, the pooled C-index, sensitivity, and specificity derived from the imbalanced validation dataset were 0.90 (95% CI 0.87-0.93), 0.83 (95% CI 0.79-0.87), and 0.93 (95% CI 0.88-0.96), respectively. On the basis of the balanced validation dataset, the pooled C-index, sensitivity, and specificity were 0.88 (95% CI 0.86-0.90), 0.72 (95% CI 0.49-0.95), and 0.79 (95% CI 0.68-0.91), respectively. For predicting the good cerebral performance category score 1 to 2, the pooled C-index, sensitivity, and specificity based on the validation dataset were 0.86 (95% CI 0.85-0.87), 0.72 (95% CI 0.61-0.81), and 0.79 (95% CI 0.66-0.88), respectively. For predicting CA mortality, the pooled C-index, sensitivity, and specificity based on the validation dataset were 0.85 (95% CI 0.82-0.87), 0.83 (95% CI 0.79-0.87), and 0.79 (95% CI 0.74-0.83), respectively. For predicting ROSC, the pooled C-index, sensitivity, and specificity based on the validation dataset were 0.77 (95% CI 0.74-0.80), 0.53 (95% CI 0.31-0.74), and 0.88 (95% CI 0.71-0.96), respectively. In predicting CA, the most significant modeling variables were respiratory rate, blood pressure, age, and temperature. In predicting a good cerebral performance category score 1 to 2, the most significant modeling variables in the in-hospital CA group were rhythm (shockable or nonshockable), age, medication use, and gender; the most significant modeling variables in the out-of-hospital CA group were age, rhythm (shockable or nonshockable), medication use, and ROSC. Conclusions: ML represents a currently promising approach for predicting the occurrence and outcomes of CA. Therefore, in future research on CA, we may attempt to systematically update traditional scoring tools based on the superior performance of ML in specific outcomes, achieving artificial intelligence–driven enhancements. Trial Registration: PROSPERO International Prospective Register of Systematic Reviews CRD42024518949; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=518949 %M 40063076 %R 10.2196/67871 %U https://www.jmir.org/2025/1/e67871 %U https://doi.org/10.2196/67871 %U http://www.ncbi.nlm.nih.gov/pubmed/40063076 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e60435 %T Generative AI–Enabled Therapy Support Tool for Improved Clinical Outcomes and Patient Engagement in Group Therapy: Real-World Observational Study %A Habicht,Johanna %A Dina,Larisa-Maria %A McFadyen,Jessica %A Stylianou,Mona %A Harper,Ross %A Hauser,Tobias U %A Rollwage,Max %+ Limbic Ltd, Kemp House, 128 City Road, London, EC1V 2NX, United Kingdom, 44 020 3818 3240, max@limbic.ai %K artificial intelligence %K National Health Service %K NHS Talking Therapies %K mental health %K therapy support tool %K cognitive behavioral therapy %K CBT %K chatbot %K conversational agent %K clinical %K patient engagement %K therapist %K treatment %K medication %K depression %K anxiety disorder %K exercise %K observational study %K control group %K patient adherence %D 2025 %7 10.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Cognitive behavioral therapy (CBT) is a highly effective treatment for depression and anxiety disorders. Nonetheless, a substantial proportion of patients do not respond to treatment. The lack of engagement with therapeutic materials and exercises between sessions, a necessary component of CBT, is a key determinant of unsuccessful treatment. Objective: The objective of this study was to test whether the deployment of a generative artificial intelligence (AI)–enabled therapy support tool, which helps patients to engage with therapeutic materials and exercises in between sessions, leads to improved treatment success and patient treatment adherence compared with the standard delivery of CBT exercises through static workbooks. Methods: We conducted a real-world observational study of 244 patients receiving group-based CBT in 5 of the United Kingdom’s National Health Service Talking Therapies services, comparing 150 (61.5%) patients who used the AI-enabled therapy support tool to 94 (38.5%) patients who used the standard delivery of CBT exercises. The groups were equivalent with respect to the content of the CBT materials and the human-led therapy sessions; however, the intervention group received support from the AI-enabled therapy support tool in conducting CBT exercises. Results: Patients using the AI-enabled therapy support tool exhibited greater attendance at therapy sessions and fewer dropouts from treatment. Furthermore, these patients demonstrated higher reliable improvement, recovery, and reliable recovery rates when compared to the control group, which was related to the degree of use of the AI-enabled therapy support tool. Moreover, we found that engagement with AI-supported CBT interventions, relative to psychoeducational materials, predicted better treatment adherence and treatment success, highlighting the role of personalization in the intervention’s effectiveness. To investigate the mechanisms of these effects further, we conducted a separate qualitative experiment in a nonclinical sample of users (n=113). Results indicated that users perceived the AI-enabled therapy support tool as most useful for discussing their problems to gain awareness and clarity of their situation as well as learning how to apply coping skills and CBT techniques in their daily lives. Conclusions: Our results show that an AI-enabled, personalized therapy support tool in combination with human-led group therapy is a promising avenue to improve the efficacy of and adherence to mental health care. %M 40063074 %R 10.2196/60435 %U https://www.jmir.org/2025/1/e60435 %U https://doi.org/10.2196/60435 %U http://www.ncbi.nlm.nih.gov/pubmed/40063074 %0 Journal Article %@ 2561-3278 %I JMIR Publications %V 10 %N %P e65366 %T Cardiac Repair and Regeneration via Advanced Technology: Narrative Literature Review %A Lee,Yugyung %A Shelke,Sushil %A Lee,Chi %+ Division of Pharmacology and Pharmaceutics Sciences, School of Pharmacy, University of Missouri Kansas City, 5000 Holmes St, Kansas City, MO, 64110, United States, 1 8162352408, leech@umkc.edu %K advanced technologies %K genetics %K biomaterials %K bioengineering %K medical devices %K implantable devices %K wearables %K cardiovascular repair and regeneration %K cardiac care %K cardiovascular disease %D 2025 %7 8.3.2025 %9 Review %J JMIR Biomed Eng %G English %X Background: Cardiovascular diseases (CVDs) are the leading cause of death globally, and almost one-half of all adults in the United States have at least one form of heart disease. This review focused on advanced technologies, genetic variables in CVD, and biomaterials used for organ-independent cardiovascular repair systems. Objective: A variety of implantable and wearable devices, including biosensor-equipped cardiovascular stents and biocompatible cardiac patches, have been developed and evaluated. The incorporation of those strategies will hold a bright future in the management of CVD in advanced clinical practice. Methods: This study employed widely used academic search systems, such as Google Scholar, PubMed, and Web of Science. Recent progress in diagnostic and treatment methods against CVD, as described in the content, are extensively examined. The innovative bioengineering, gene delivery, cell biology, and artificial intelligence–based technologies that will continuously revolutionize biomedical devices for cardiovascular repair and regeneration are also discussed. The novel, balanced, contemporary, query-based method adapted in this manuscript defined the extent to which an updated literature review could efficiently provide research on the evidence-based, comprehensive applicability of cardiovascular devices for clinical treatment against CVD. Results: Advanced technologies along with artificial intelligence–based telehealth will be essential to create efficient implantable biomedical devices, including cardiovascular stents. The proper statistical approaches along with results from clinical studies including model-based risk probability prediction from genetic and physiological variables are integral for monitoring and treatment of CVD risk. Conclusions: To overcome the current obstacles in cardiac repair and regeneration and achieve successful therapeutic applications, future interdisciplinary collaborative work is essential. Novel cardiovascular devices and their targeted treatments will accomplish enhanced health care delivery and improved therapeutic efficacy against CVD. As the review articles contain comprehensive sources for state-of-the-art evidence for clinicians, these high-quality reviews will serve as a first outline of the updated progress on cardiovascular devices before undertaking clinical studies. %R 10.2196/65366 %U https://biomedeng.jmir.org/2025/1/e65366 %U https://doi.org/10.2196/65366 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e69068 %T Diagnostic Performance of Artificial Intelligence–Based Methods for Tuberculosis Detection: Systematic Review %A Hansun,Seng %A Argha,Ahmadreza %A Bakhshayeshi,Ivan %A Wicaksana,Arya %A Alinejad-Rokny,Hamid %A Fox,Greg J %A Liaw,Siaw-Teng %A Celler,Branko G %A Marks,Guy B %+ School of Clinical Medicine, South West Sydney, UNSW Medicine & Health, UNSW Sydney, High Street, Kensington, NSW, Sydney, 2052, Australia, 61 456541224, s.hansun@unsw.edu.au %K AI %K artificial intelligence %K deep learning %K diagnostic performance %K machine learning %K PRISMA %K Preferred Reporting Items for Systematic Reviews and Meta-Analysis %K QUADAS-2 %K Quality Assessment of Diagnostic Accuracy Studies version 2 %K systematic literature review %K tuberculosis detection %D 2025 %7 7.3.2025 %9 Review %J J Med Internet Res %G English %X Background: Tuberculosis (TB) remains a significant health concern, contributing to the highest mortality among infectious diseases worldwide. However, none of the various TB diagnostic tools introduced is deemed sufficient on its own for the diagnostic pathway, so various artificial intelligence (AI)–based methods have been developed to address this issue. Objective: We aimed to provide a comprehensive evaluation of AI-based algorithms for TB detection across various data modalities. Methods: Following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) 2020 guidelines, we conducted a systematic review to synthesize current knowledge on this topic. Our search across 3 major databases (Scopus, PubMed, Association for Computing Machinery [ACM] Digital Library) yielded 1146 records, of which we included 152 (13.3%) studies in our analysis. QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies version 2) was performed for the risk-of-bias assessment of all included studies. Results: Radiographic biomarkers (n=129, 84.9%) and deep learning (DL; n=122, 80.3%) approaches were predominantly used, with convolutional neural networks (CNNs) using Visual Geometry Group (VGG)-16 (n=37, 24.3%), ResNet-50 (n=33, 21.7%), and DenseNet-121 (n=19, 12.5%) architectures being the most common DL approach. The majority of studies focused on model development (n=143, 94.1%) and used a single modality approach (n=141, 92.8%). AI methods demonstrated good performance in all studies: mean accuracy=91.93% (SD 8.10%, 95% CI 90.52%-93.33%; median 93.59%, IQR 88.33%-98.32%), mean area under the curve (AUC)=93.48% (SD 7.51%, 95% CI 91.90%-95.06%; median 95.28%, IQR 91%-99%), mean sensitivity=92.77% (SD 7.48%, 95% CI 91.38%-94.15%; median 94.05% IQR 89%-98.87%), and mean specificity=92.39% (SD 9.4%, 95% CI 90.30%-94.49%; median 95.38%, IQR 89.42%-99.19%). AI performance across different biomarker types showed mean accuracies of 92.45% (SD 7.83%), 89.03% (SD 8.49%), and 84.21% (SD 0%); mean AUCs of 94.47% (SD 7.32%), 88.45% (SD 8.33%), and 88.61% (SD 5.9%); mean sensitivities of 93.8% (SD 6.27%), 88.41% (SD 10.24%), and 93% (SD 0%); and mean specificities of 94.2% (SD 6.63%), 85.89% (SD 14.66%), and 95% (SD 0%) for radiographic, molecular/biochemical, and physiological types, respectively. AI performance across various reference standards showed mean accuracies of 91.44% (SD 7.3%), 93.16% (SD 6.44%), and 88.98% (SD 9.77%); mean AUCs of 90.95% (SD 7.58%), 94.89% (SD 5.18%), and 92.61% (SD 6.01%); mean sensitivities of 91.76% (SD 7.02%), 93.73% (SD 6.67%), and 91.34% (SD 7.71%); and mean specificities of 86.56% (SD 12.8%), 93.69% (SD 8.45%), and 92.7% (SD 6.54%) for bacteriological, human reader, and combined reference standards, respectively. The transfer learning (TL) approach showed increasing popularity (n=89, 58.6%). Notably, only 1 (0.7%) study conducted domain-shift analysis for TB detection. Conclusions: Findings from this review underscore the considerable promise of AI-based methods in the realm of TB detection. Future research endeavors should prioritize conducting domain-shift analyses to better simulate real-world scenarios in TB detection. Trial Registration: PROSPERO CRD42023453611; https://www.crd.york.ac.uk/PROSPERO/view/CRD42023453611 %M 40053773 %R 10.2196/69068 %U https://www.jmir.org/2025/1/e69068 %U https://doi.org/10.2196/69068 %U http://www.ncbi.nlm.nih.gov/pubmed/40053773 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 4 %N %P e60391 %T GPT-4 as a Clinical Decision Support Tool in Ischemic Stroke Management: Evaluation Study %A Shmilovitch,Amit Haim %A Katson,Mark %A Cohen-Shelly,Michal %A Peretz,Shlomi %A Aran,Dvir %A Shelly,Shahar %+ Department of Neurology, Rambam Medical Center, HaAliya HaShniya Street 8, PO Box 9602, Haifa, 3109601, Israel, 972 543541995, s_shelly@rmc.gov.il %K GPT-4 %K ischemic stroke %K clinical decision support %K artificial intelligence %K neurology %D 2025 %7 7.3.2025 %9 Original Paper %J JMIR AI %G English %X Background: Cerebrovascular diseases are the second most common cause of death worldwide and one of the major causes of disability burden. Advancements in artificial intelligence have the potential to revolutionize health care delivery, particularly in critical decision-making scenarios such as ischemic stroke management. Objective: This study aims to evaluate the effectiveness of GPT-4 in providing clinical support for emergency department neurologists by comparing its recommendations with expert opinions and real-world outcomes in acute ischemic stroke management. Methods: A cohort of 100 patients with acute stroke symptoms was retrospectively reviewed. Data used for decision-making included patients’ history, clinical evaluation, imaging study results, and other relevant details. Each case was independently presented to GPT-4, which provided scaled recommendations (1-7) regarding the appropriateness of treatment, the use of tissue plasminogen activator, and the need for endovascular thrombectomy. Additionally, GPT-4 estimated the 90-day mortality probability for each patient and elucidated its reasoning for each recommendation. The recommendations were then compared with a stroke specialist’s opinion and actual treatment decisions. Results: In our cohort of 100 patients, treatment recommendations by GPT-4 showed strong agreement with expert opinion (area under the curve [AUC] 0.85, 95% CI 0.77-0.93) and real-world treatment decisions (AUC 0.80, 95% CI 0.69-0.91). GPT-4 showed near-perfect agreement with real-world decisions in recommending endovascular thrombectomy (AUC 0.94, 95% CI 0.89-0.98) and strong agreement for tissue plasminogen activator treatment (AUC 0.77, 95% CI 0.68-0.86). Notably, in some cases, GPT-4 recommended more aggressive treatment than human experts, with 11 instances where GPT-4 suggested tissue plasminogen activator use against expert opinion. For mortality prediction, GPT-4 accurately identified 10 (77%) out of 13 deaths within its top 25 high-risk predictions (AUC 0.89, 95% CI 0.8077-0.9739; hazard ratio 6.98, 95% CI 2.88-16.9; P<.001), outperforming supervised machine learning models such as PRACTICE (AUC 0.70; log-rank P=.02) and PREMISE (AUC 0.77; P=.07). Conclusions: This study demonstrates the potential of GPT-4 as a viable clinical decision-support tool in the management of acute stroke. Its ability to provide explainable recommendations without requiring structured data input aligns well with the routine workflows of treating physicians. However, the tendency toward more aggressive treatment recommendations highlights the importance of human oversight in clinical decision-making. Future studies should focus on prospective validations and exploring the safe integration of such artificial intelligence tools into clinical practice. %M 40053715 %R 10.2196/60391 %U https://ai.jmir.org/2025/1/e60391 %U https://doi.org/10.2196/60391 %U http://www.ncbi.nlm.nih.gov/pubmed/40053715 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e52244 %T Perspectives on Using Artificial Intelligence to Derive Social Determinants of Health Data From Medical Records in Canada: Large Multijurisdictional Qualitative Study %A Davis,Victoria H %A Qiang,Jinfan Rose %A Adekoya MacCarthy,Itunuoluwa %A Howse,Dana %A Seshie,Abigail Zita %A Kosowan,Leanne %A Delahunty-Pike,Alannah %A Abaga,Eunice %A Cooney,Jane %A Robinson,Marjeiry %A Senior,Dorothy %A Zsager,Alexander %A Aubrey-Bassler,Kris %A Irwin,Mandi %A Jackson,Lois A %A Katz,Alan %A Marshall,Emily Gard %A Muhajarine,Nazeem %A Neudorf,Cory %A Garies,Stephanie %A Pinto,Andrew D %+ Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, 30 Bond Street, Toronto, ON, M5B 1W8, Canada, 1 416 864 6060 ext 76148, andrew.pinto@utoronto.ca %K artificial intelligence %K social determinants of health %K sociodemographic data %K social needs %K social care %K primary care %K machine learning %K qualitative study %D 2025 %7 6.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Data on the social determinants of health could be used to improve care, support quality improvement initiatives, and track progress toward health equity. However, this data collection is not widespread. Artificial intelligence (AI), specifically natural language processing and machine learning, could be used to derive social determinants of health data from electronic medical records. This could reduce the time and resources required to obtain social determinants of health data. Objective: This study aimed to understand perspectives of a diverse sample of Canadians on the use of AI to derive social determinants of health information from electronic medical record data, including benefits and concerns. Methods: Using a qualitative description approach, in-depth interviews were conducted with 195 participants purposefully recruited from Ontario, Newfoundland and Labrador, Manitoba, and Saskatchewan. Transcripts were analyzed using an inductive and deductive content analysis. Results: A total of 4 themes were identified. First, AI was described as the inevitable future, facilitating more efficient, accessible social determinants of health information and use in primary care. Second, participants expressed concerns about potential health care harms and a distrust in AI and public systems. Third, some participants indicated that AI could lead to a loss of the human touch in health care, emphasizing a preference for strong relationships with providers and individualized care. Fourth, participants described the critical importance of consent and the need for strong safeguards to protect patient data and trust. Conclusions: These findings provide important considerations for the use of AI in health care, and particularly when health care administrators and decision makers seek to derive social determinants of health data. %M 40053728 %R 10.2196/52244 %U https://www.jmir.org/2025/1/e52244 %U https://doi.org/10.2196/52244 %U http://www.ncbi.nlm.nih.gov/pubmed/40053728 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 13 %N %P e59660 %T Applying AI in the Context of the Association Between Device-Based Assessment of Physical Activity and Mental Health: Systematic Review %A Woll,Simon %A Birkenmaier,Dennis %A Biri,Gergely %A Nissen,Rebecca %A Lutz,Luisa %A Schroth,Marc %A Ebner-Priemer,Ulrich W %A Giurgiu,Marco %+ Mental mHealth Lab, Institute of Sports and Sports Science, Karlsruhe Institute of Technology, Hertzstrasse 16, Karlsruhe, 76187, Germany, 49 721 608 ext 41974, simon.woll@kit.edu %K machine learning %K mental health %K wearables %K physical behavior %K artificial intelligence %K mobile phone %K smartphone %D 2025 %7 6.3.2025 %9 Review %J JMIR Mhealth Uhealth %G English %X Background: Wearable technology is used by consumers worldwide for continuous activity monitoring in daily life but more recently also for classifying or predicting mental health parameters like stress or depression levels. Previous studies identified, based on traditional approaches, that physical activity is a relevant factor in the prevention or management of mental health. However, upcoming artificial intelligence methods have not yet been fully established in the research field of physical activity and mental health. Objective: This systematic review aims to provide a comprehensive overview of studies that integrated passive monitoring of physical activity data measured via wearable technology in machine learning algorithms for the detection, prediction, or classification of mental health states and traits. Methods: We conducted a review of studies processing wearable data to gain insights into mental health parameters. Eligibility criteria were (1) the study uses wearables or smartphones to acquire physical behavior and optionally other sensor measurement data, (2) the study must use machine learning to process the acquired data, and (3) the study had to be published in a peer-reviewed English language journal. Studies were identified via a systematic search in 5 electronic databases. Results: Of 11,057 unique search results, 49 published papers between 2016 and 2023 were included. Most studies examined the connection between wearable sensor data and stress (n=15, 31%) or depression (n=14, 29%). In total, 71% (n=35) of the studies had less than 100 participants, and 47% (n=23) had less than 14 days of data recording. More than half of the studies (n=27, 55%) used step count as movement measurement, and 44% (n=21) used raw accelerometer values. The quality of the studies was assessed, scoring between 0 and 18 points in 9 categories (maximum 2 points per category). On average, studies were rated 6.47 (SD 3.1) points. Conclusions: The use of wearable technology for the detection, prediction, or classification of mental health states and traits is promising and offers a variety of applications across different settings and target groups. However, based on the current state of literature, the application of artificial intelligence cannot realize its full potential mostly due to a lack of methodological shortcomings and data availability. Future research endeavors may focus on the following suggestions to improve the quality of new applications in this context: first, by using raw data instead of already preprocessed data. Second, by using only relevant data based on empirical evidence. In particular, crafting optimal feature sets rather than using many individual detached features and consultation with in-field professionals. Third, by validating and replicating the existing approaches (ie, applying the model to unseen data). Fourth, depending on the research aim (ie, generalization vs personalization) maximizing the sample size or the duration over which data are collected. %M 40053765 %R 10.2196/59660 %U https://mhealth.jmir.org/2025/1/e59660 %U https://doi.org/10.2196/59660 %U http://www.ncbi.nlm.nih.gov/pubmed/40053765 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e68509 %T Machine Learning Models With Prognostic Implications for Predicting Gastrointestinal Bleeding After Coronary Artery Bypass Grafting and Guiding Personalized Medicine: Multicenter Cohort Study %A Dong,Jiale %A Jin,Zhechuan %A Li,Chengxiang %A Yang,Jian %A Jiang,Yi %A Li,Zeqian %A Chen,Cheng %A Zhang,Bo %A Ye,Zhaofei %A Hu,Yang %A Ma,Jianguo %A Li,Ping %A Li,Yulin %A Wang,Dongjin %A Ji,Zhili %+ Department of General Surgery, Beijing Chaoyang Hospital, Capital Medical University, 8 Gongren Tiyuchang Nanlua, Chaoyang District, Beijing, 100020, China, 86 010 85231610, anzhenjzl@mail.ccmu.edu.cn %K machine learning %K personalized medicine %K coronary artery bypass grafting %K adverse outcome %K gastrointestinal bleeding %D 2025 %7 6.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Gastrointestinal bleeding is a serious adverse event of coronary artery bypass grafting and lacks tailored risk assessment tools for personalized prevention. Objective: This study aims to develop and validate predictive models to assess the risk of gastrointestinal bleeding after coronary artery bypass grafting (GIBCG) and to guide personalized prevention. Methods: Participants were recruited from 4 medical centers, including a prospective cohort and the Medical Information Mart for Intensive Care IV (MIMIC-IV) database. From an initial cohort of 18,938 patients, 16,440 were included in the final analysis after applying the exclusion criteria. Thirty combinations of machine learning algorithms were compared, and the optimal model was selected based on integrated performance metrics, including the area under the receiver operating characteristic curve (AUROC) and the Brier score. This model was then developed into a web-based risk prediction calculator. The Shapley Additive Explanations method was used to provide both global and local explanations for the predictions. Results: The model was developed using data from 3 centers and a prospective cohort (n=13,399) and validated on the Drum Tower cohort (n=2745) and the MIMIC cohort (n=296). The optimal model, based on 15 easily accessible admission features, demonstrated an AUROC of 0.8482 (95% CI 0.8328-0.8618) in the derivation cohort. In external validation, the AUROC was 0.8513 (95% CI 0.8221-0.8782) for the Drum Tower cohort and 0.7811 (95% CI 0.7275-0.8343) for the MIMIC cohort. The analysis indicated that high-risk patients identified by the model had a significantly increased mortality risk (odds ratio 2.98, 95% CI 1.784-4.978; P<.001). For these high-risk populations, preoperative use of proton pump inhibitors was an independent protective factor against the occurrence of GIBCG. By contrast, dual antiplatelet therapy and oral anticoagulants were identified as independent risk factors. However, in low-risk populations, the use of proton pump inhibitors (χ21=0.13, P=.72), dual antiplatelet therapy (χ21=0.38, P=.54), and oral anticoagulants (χ21=0.15, P=.69) were not significantly associated with the occurrence of GIBCG. Conclusions: Our machine learning model accurately identified patients at high risk of GIBCG, who had a poor prognosis. This approach can aid in early risk stratification and personalized prevention. Trial Registration: Chinese Clinical Registry Center ChiCTR2400086050; http://www.chictr.org.cn/showproj.html?proj=226129 %M 40053791 %R 10.2196/68509 %U https://www.jmir.org/2025/1/e68509 %U https://doi.org/10.2196/68509 %U http://www.ncbi.nlm.nih.gov/pubmed/40053791 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e66032 %T Health Communication on the Internet: Promoting Public Health and Exploring Disparities in the Generative AI Era %A Uddin,Jamal %A Feng,Cheng %A Xu,Junfang %+ Department of Pharmacy, Second Affiliated Hospital, School of Public health, Zhejiang University School of Medicine, 866 Yuhangtang road, Xihu district, Hangzhou, 310058, China, 86 18801230482 ext 000, junfangxuhappy1987@163.com %K internet %K generative AI %K artificial intelligence %K ChatGPT %K health communication %K health promotion %K health disparity %K health %K communication %K internet %K AI %K generative %K tool %K genAI %K gratification theory %K gratification %K public health %K inequity %K disparity %D 2025 %7 6.3.2025 %9 Viewpoint %J J Med Internet Res %G English %X Health communication and promotion on the internet have evolved over time, driven by the development of new technologies, including generative artificial intelligence (GenAI). These technological tools offer new opportunities for both the public and professionals. However, these advancements also pose risks of exacerbating health disparities. Limited research has focused on combining these health communication mediums, particularly those enabled by new technologies like GenAI, and their applications for health promotion and health disparities. Therefore, this viewpoint, adopting a conceptual approach, provides an updated overview of health communication mediums and their role in understanding health promotion and disparities in the GenAI era. Additionally, health promotion and health disparities associated with GenAI are briefly discussed through the lens of the Technology Acceptance Model 2, the uses and gratifications theory, and the knowledge gap hypothesis. This viewpoint discusses the limitations and barriers of previous internet-based communication mediums regarding real-time responses, personalized advice, and follow-up inquiries, highlighting the potential of new technology for public health promotion. It also discusses the health disparities caused by the limitations of GenAI, such as individuals’ inability to evaluate information, restricted access to services, and the lack of skill development. Overall, this study lays the groundwork for future research on how GenAI could be leveraged for public health promotion and how its challenges and barriers may exacerbate health inequities. It underscores the need for more empirical studies, as well as the importance of enhancing digital literacy and increasing access to technology for socially disadvantaged populations. %M 40053755 %R 10.2196/66032 %U https://www.jmir.org/2025/1/e66032 %U https://doi.org/10.2196/66032 %U http://www.ncbi.nlm.nih.gov/pubmed/40053755 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e64349 %T The Role of AI in Cardiovascular Event Monitoring and Early Detection: Scoping Literature Review %A Elvas,Luis B %A Almeida,Ana %A Ferreira,Joao C %K artificial intelligence %K machine learning %K cardiovascular diseases %K cardiovascular events %K health care %K monitoring %K early detection %K AI %K cardiovascular %K literature review %K medical data %K detect %K patient outcomes %K neural network %K ML model %K mobile phone %D 2025 %7 6.3.2025 %9 %J JMIR Med Inform %G English %X Background: Artificial intelligence (AI) has shown exponential growth and advancements, revolutionizing various fields, including health care. However, domain adaptation remains a significant challenge, as machine learning (ML) models often need to be applied across different health care settings with varying patient demographics and practices. This issue is critical for ensuring effective and equitable AI deployment. Cardiovascular diseases (CVDs), the leading cause of global mortality with 17.9 million annual deaths, encompass conditions like coronary heart disease and hypertension. The increasing availability of medical data, coupled with AI advancements, offers new opportunities for early detection and intervention in cardiovascular events, leveraging AI’s capacity to analyze complex datasets and uncover critical patterns. Objective: This review aims to examine AI methodologies combined with medical data to advance the intelligent monitoring and detection of CVDs, identifying areas for further research to enhance patient outcomes and support early interventions. Methods: This review follows the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) methodology to ensure a rigorous and transparent literature review process. This structured approach facilitated a comprehensive overview of the current state of research in this field. Results: Through the methodology used, 64 documents were retrieved, of which 40 documents met the inclusion criteria. The reviewed papers demonstrate advancements in AI and ML for CVD detection, classification, prediction, diagnosis, and patient monitoring. Techniques such as ensemble learning, deep neural networks, and feature selection improve prediction accuracy over traditional methods. ML models predict cardiovascular events and risks, with applications in monitoring via wearable technology. The integration of AI in health care supports early detection, personalized treatment, and risk assessment, possibly improving the management of CVDs. Conclusions: The study concludes that AI and ML techniques can improve the accuracy of CVD classification, prediction, diagnosis, and monitoring. The integration of multiple data sources and noninvasive methods supports continuous monitoring and early detection. These advancements help enhance CVD management and patient outcomes, indicating the potential for AI to offer more precise and cost-effective solutions in health care. %R 10.2196/64349 %U https://medinform.jmir.org/2025/1/e64349 %U https://doi.org/10.2196/64349 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 9 %N %P e65190 %T Machine Learning–Based Prediction of Delirium and Risk Factor Identification in Intensive Care Unit Patients With Burns: Retrospective Observational Study %A Esumi,Ryo %A Funao,Hiroki %A Kawamoto,Eiji %A Sakamoto,Ryota %A Ito-Masui,Asami %A Okuno,Fumito %A Shinkai,Toru %A Hane,Atsuya %A Ikejiri,Kaoru %A Akama,Yuichi %A Gaowa,Arong %A Park,Eun Jeong %A Momosaki,Ryo %A Kaku,Ryuji %A Shimaoka,Motomu %+ Department of Molecular Pathobiology and Cell Adhesion Biology, Mie University Graduate School of Medicine, Mie University, Edobashi 2-174, Tsu, 5140001, Japan, 81 0592321111, a_2.uk@mac.com %K burns %K delirium %K intensive care unit %K machine learning %K prediction model %K artificial intelligence %K AI %D 2025 %7 5.3.2025 %9 Original Paper %J JMIR Form Res %G English %X Background: The incidence of delirium in patients with burns receiving treatment in the intensive care unit (ICU) is high, reaching up to 77%, and has been associated with increased mortality rates. Therefore, early identification of patients at high risk of delirium onset is essential for improving treatment strategies. Objective: This study aimed to create a machine learning model for predicting delirium in patients with burns during their ICU stay using patient data from the first day of ICU admission and identify predictive factors for ICU delirium in patients with burns. Methods: This study focused on 82 patients with burns aged ≥18 years who were admitted to the ICU at Mie University Hospital for ≥24 hours between January 2015 and June 2023. In total, 70 variables were measured in patients upon ICU admission and used as explanatory variables in the ICU delirium prediction model. Delirium was assessed using the Intensive Care Delirium Screening Checklist every 8 hours after ICU admission. A total of 10 different machine learning methods were used to predict ICU delirium. Multiple receiver operating characteristic curves were plotted for various machine learning models, and the area under the curve (AUC) for each was compared. In addition, the top 15 risk factors contributing to delirium onset were identified using Shapley additive explanations analysis. Results: Among the 10 machine learning models tested, logistic regression (mean AUC 0.906, SD 0.073), support vector machine (mean AUC 0.897, SD 0.056), k-nearest neighbor (mean AUC 0.894, SD 0.060), neural network (mean AUC 0.857, SD 0.058), random forest (mean AUC 0.850, SD 0.074), adaptive boosting (mean AUC 0.832, SD 0.094), gradient boosting machine (mean AUC 0.821, SD 0.074), and naïve Bayes (mean AUC 0.827, SD 0.095) demonstrated the highest accuracy in predicting ICU delirium. Specifically, 24-hour urine output (from ICU admission to 24 hours), oxygen saturation, burn area, total bilirubin level, and intubation upon ICU admission were identified as the major risk factors for delirium onset. In addition, variables, such as the proportion of white blood cell fractions, including monocytes; methemoglobin concentration; and respiratory rate, were identified as important risk factors for ICU delirium. Conclusions: This study demonstrated the ability of machine learning models trained using vital signs and blood data upon ICU admission to predict delirium in patients with burns during their ICU stay. %M 39895101 %R 10.2196/65190 %U https://formative.jmir.org/2025/1/e65190 %U https://doi.org/10.2196/65190 %U http://www.ncbi.nlm.nih.gov/pubmed/39895101 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 11 %N %P e65108 %T ChatGPT’s Performance on Portuguese Medical Examination Questions: Comparative Analysis of ChatGPT-3.5 Turbo and ChatGPT-4o Mini %A Prazeres,Filipe %K ChatGPT-3.5 Turbo %K ChatGPT-4o mini %K medical examination %K European Portuguese %K AI performance evaluation %K Portuguese %K evaluation %K medical examination questions %K examination question %K chatbot %K ChatGPT %K model %K artificial intelligence %K AI %K GPT %K LLM %K NLP %K natural language processing %K machine learning %K large language model %D 2025 %7 5.3.2025 %9 %J JMIR Med Educ %G English %X Background: Advancements in ChatGPT are transforming medical education by providing new tools for assessment and learning, potentially enhancing evaluations for doctors and improving instructional effectiveness. Objective: This study evaluates the performance and consistency of ChatGPT-3.5 Turbo and ChatGPT-4o mini in solving European Portuguese medical examination questions (2023 National Examination for Access to Specialized Training; Prova Nacional de Acesso à Formação Especializada [PNA]) and compares their performance to human candidates. Methods: ChatGPT-3.5 Turbo was tested on the first part of the examination (74 questions) on July 18, 2024, and ChatGPT-4o mini on the second part (74 questions) on July 19, 2024. Each model generated an answer using its natural language processing capabilities. To test consistency, each model was asked, “Are you sure?” after providing an answer. Differences between the first and second responses of each model were analyzed using the McNemar test with continuity correction. A single-parameter t test compared the models’ performance to human candidates. Frequencies and percentages were used for categorical variables, and means and CIs for numerical variables. Statistical significance was set at P<.05. Results: ChatGPT-4o mini achieved an accuracy rate of 65% (48/74) on the 2023 PNA examination, surpassing ChatGPT-3.5 Turbo. ChatGPT-4o mini outperformed medical candidates, while ChatGPT-3.5 Turbo had a more moderate performance. Conclusions: This study highlights the advancements and potential of ChatGPT models in medical education, emphasizing the need for careful implementation with teacher oversight and further research. %R 10.2196/65108 %U https://mededu.jmir.org/2025/1/e65108 %U https://doi.org/10.2196/65108 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e66821 %T Augmenting Insufficiently Accruing Oncology Clinical Trials Using Generative Models: Validation Study %A El Kababji,Samer %A Mitsakakis,Nicholas %A Jonker,Elizabeth %A Beltran-Bless,Ana-Alicia %A Pond,Gregory %A Vandermeer,Lisa %A Radhakrishnan,Dhenuka %A Mosquera,Lucy %A Paterson,Alexander %A Shepherd,Lois %A Chen,Bingshu %A Barlow,William %A Gralow,Julie %A Savard,Marie-France %A Fesl,Christian %A Hlauschek,Dominik %A Balic,Marija %A Rinnerthaler,Gabriel %A Greil,Richard %A Gnant,Michael %A Clemons,Mark %A El Emam,Khaled %+ School of Epidemiology and Public Health, Faculty of Medicine, University of Ottawa, 75 Laurier Ave E, Ottawa, ON, K1N 6N5, Canada, 1 6137975412, kelemam@ehealthinformation.ca %K generative models %K study accrual %K recruitment %K clinical trial replication %K oncology %K validation %K simulated patient %K simulation %K retrospective %K dataset %K patient %K artificial intelligence %K machine learning %D 2025 %7 5.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Insufficient patient accrual is a major challenge in clinical trials and can result in underpowered studies, as well as exposing study participants to toxicity and additional costs, with limited scientific benefit. Real-world data can provide external controls, but insufficient accrual affects all arms of a study, not just controls. Studies that used generative models to simulate more patients were limited in the accrual scenarios considered, replicability criteria, number of generative models, and number of clinical trials evaluated. Objective: This study aimed to perform a comprehensive evaluation on the extent generative models can be used to simulate additional patients to compensate for insufficient accrual in clinical trials. Methods: We performed a retrospective analysis using 10 datasets from 9 fully accrued, completed, and published cancer trials. For each trial, we removed the latest recruited patients (from 10% to 50%), trained a generative model on the remaining patients, and simulated additional patients to replace the removed ones using the generative model to augment the available data. We then replicated the published analysis on this augmented dataset to determine if the findings remained the same. Four different generative models were evaluated: sequential synthesis with decision trees, Bayesian network, generative adversarial network, and a variational autoencoder. These generative models were compared to sampling with replacement (ie, bootstrap) as a simple alternative. Replication of the published analyses used 4 metrics: decision agreement, estimate agreement, standardized difference, and CI overlap. Results: Sequential synthesis performed well on the 4 replication metrics for the removal of up to 40% of the last recruited patients (decision agreement: 88% to 100% across datasets, estimate agreement: 100%, cannot reject standardized difference null hypothesis: 100%, and CI overlap: 0.8-0.92). Sampling with replacement was the next most effective approach, with decision agreement varying from 78% to 89% across all datasets. There was no evidence of a monotonic relationship in the estimated effect size with recruitment order across these studies. This suggests that patients recruited earlier in a trial were not systematically different than those recruited later, at least partially explaining why generative models trained on early data can effectively simulate patients recruited later in a trial. The fidelity of the generated data relative to the training data on the Hellinger distance was high in all cases. Conclusions: For an oncology study with insufficient accrual with as few as 60% of target recruitment, sequential synthesis can enable the simulation of the full dataset had the study continued accruing patients and can be an alternative to drawing conclusions from an underpowered study. These results provide evidence demonstrating the potential for generative models to rescue poorly accruing clinical trials, but additional studies are needed to confirm these findings and to generalize them for other diseases. %M 40053790 %R 10.2196/66821 %U https://www.jmir.org/2025/1/e66821 %U https://doi.org/10.2196/66821 %U http://www.ncbi.nlm.nih.gov/pubmed/40053790 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67891 %T Competency of Large Language Models in Evaluating Appropriate Responses to Suicidal Ideation: Comparative Study %A McBain,Ryan K %A Cantor,Jonathan H %A Zhang,Li Ang %A Baker,Olesya %A Zhang,Fang %A Halbisen,Alyssa %A Kofner,Aaron %A Breslau,Joshua %A Stein,Bradley %A Mehrotra,Ateev %A Yu,Hao %+ RAND, 1200 S Hayes St, Arlington, VA, United States, 1 5088433901, rmcbain@rand.org %K depression %K suicide %K mental health %K large language model %K chatbot %K digital health %K Suicidal Ideation Response Inventory %K ChatGPT %K suicidologist %K artificial intelligence %D 2025 %7 5.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: With suicide rates in the United States at an all-time high, individuals experiencing suicidal ideation are increasingly turning to large language models (LLMs) for guidance and support. Objective: The objective of this study was to assess the competency of 3 widely used LLMs to distinguish appropriate versus inappropriate responses when engaging individuals who exhibit suicidal ideation. Methods: This observational, cross-sectional study evaluated responses to the revised Suicidal Ideation Response Inventory (SIRI-2) generated by ChatGPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro. Data collection and analyses were conducted in July 2024. A common training module for mental health professionals, SIRI-2 provides 24 hypothetical scenarios in which a patient exhibits depressive symptoms and suicidal ideation, followed by two clinician responses. Clinician responses were scored from –3 (highly inappropriate) to +3 (highly appropriate). All 3 LLMs were provided with a standardized set of instructions to rate clinician responses. We compared LLM responses to those of expert suicidologists, conducting linear regression analyses and converting LLM responses to z scores to identify outliers (z score>1.96 or <–1.96; P<0.05). Furthermore, we compared final SIRI-2 scores to those produced by health professionals in prior studies. Results: All 3 LLMs rated responses as more appropriate than ratings provided by expert suicidologists. The item-level mean difference was 0.86 for ChatGPT (95% CI 0.61-1.12; P<.001), 0.61 for Claude (95% CI 0.41-0.81; P<.001), and 0.73 for Gemini (95% CI 0.35-1.11; P<.001). In terms of z scores, 19% (9 of 48) of ChatGPT responses were outliers when compared to expert suicidologists. Similarly, 11% (5 of 48) of Claude responses were outliers compared to expert suicidologists. Additionally, 36% (17 of 48) of Gemini responses were outliers compared to expert suicidologists. ChatGPT produced a final SIRI-2 score of 45.7, roughly equivalent to master’s level counselors in prior studies. Claude produced an SIRI-2 score of 36.7, exceeding prior performance of mental health professionals after suicide intervention skills training. Gemini produced a final SIRI-2 score of 54.5, equivalent to untrained K-12 school staff. Conclusions: Current versions of 3 major LLMs demonstrated an upward bias in their evaluations of appropriate responses to suicidal ideation; however, 2 of the 3 models performed equivalent to or exceeded the performance of mental health professionals. %M 40053817 %R 10.2196/67891 %U https://www.jmir.org/2025/1/e67891 %U https://doi.org/10.2196/67891 %U http://www.ncbi.nlm.nih.gov/pubmed/40053817 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e66760 %T Investigating Whether AI Will Replace Human Physicians and Understanding the Interplay of the Source of Consultation, Health-Related Stigma, and Explanations of Diagnoses on Patients’ Evaluations of Medical Consultations: Randomized Factorial Experiment %A Guo,Weiqi %A Chen,Yang %+ , School of Journalism and Communication, Renmin University of China, Room 720, Mingde Xinwen Building, No.59 Zhongguancun St., Haidian, Beijing, 100872, China, 86 1062514835, 20050022@ruc.edu.cn %K artificial intelligence %K AI %K medical artificial intelligence %K medical AI %K human–artificial intelligence interaction %K human-AI interaction %K medical consultation %K health-related stigma %K diagnosis explanation %K health communication %D 2025 %7 5.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: The increasing use of artificial intelligence (AI) in medical diagnosis and consultation promises benefits such as greater accuracy and efficiency. However, there is little evidence to systematically test whether the ideal technological promises translate into an improved evaluation of the medical consultation from the patient’s perspective. This perspective is significant because AI as a technological solution does not necessarily improve patient confidence in diagnosis and adherence to treatment at the functional level, create meaningful interactions between the medical agent and the patient at the relational level, evoke positive emotions, or reduce the patient’s pessimism at the emotional level. Objective: This study aims to investigate, from a patient-centered perspective, whether AI or human-involved AI can replace the role of human physicians in diagnosis at the functional, relational, and emotional levels as well as how some health-related differences between human-AI and human-human interactions affect patients’ evaluations of the medical consultation. Methods: A 3 (consultation source: AI vs human-involved AI vs human) × 2 (health-related stigma: low vs high) × 2 (diagnosis explanation: without vs with explanation) factorial experiment was conducted with 249 participants. The main effects and interaction effects of the variables were examined on individuals’ functional, relational, and emotional evaluations of the medical consultation. Results: Functionally, people trusted the diagnosis of the human physician (mean 4.78-4.85, SD 0.06-0.07) more than medical AI (mean 4.34-4.55, SD 0.06-0.07) or human-involved AI (mean 4.39-4.56, SD 0.06-0.07; P<.001), but at the relational and emotional levels, there was no significant difference between human-AI and human-human interactions (P>.05). Health-related stigma had no significant effect on how people evaluated the medical consultation or contributed to preferring AI-powered systems over humans (P>.05); however, providing explanations of the diagnosis significantly improved the functional (P<.001), relational (P<.05), and emotional (P<.05) evaluations of the consultation for all 3 medical agents. Conclusions: The findings imply that at the current stage of AI development, people trust human expertise more than accurate AI, especially for decisions traditionally made by humans, such as medical diagnosis, supporting the algorithm aversion theory. Surprisingly, even for highly stigmatized diseases such as AIDS, where we assume anonymity and privacy are preferred in medical consultations, the dehumanization of AI does not contribute significantly to the preference for AI-powered medical agents over humans, suggesting that instrumental needs of diagnosis override patient privacy concerns. Furthermore, explaining the diagnosis effectively improves treatment adherence, strengthens the physician-patient relationship, and fosters positive emotions during the consultation. This provides insights for the design of AI medical agents, which have long been criticized for lacking transparency while making highly consequential decisions. This study concludes by outlining theoretical contributions to research on health communication and human-AI interaction and discusses the implications for the design and application of medical AI. %M 40053785 %R 10.2196/66760 %U https://www.jmir.org/2025/1/e66760 %U https://doi.org/10.2196/66760 %U http://www.ncbi.nlm.nih.gov/pubmed/40053785 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e64364 %T Retrieval Augmented Therapy Suggestion for Molecular Tumor Boards: Algorithmic Development and Validation Study %A Berman,Eliza %A Sundberg Malek,Holly %A Bitzer,Michael %A Malek,Nisar %A Eickhoff,Carsten %+ Center for Digital Health, University Hospital Tuebingen, Schaffhausenstrasse 77, Tuebingen, 72072, Germany, 49 70712984350, eliza_berman@alumni.brown.edu %K large language models %K retrieval augmented generation %K LLaMA %K precision oncology %K molecular tumor board %K molecular tumor %K LLMs %K augmented therapy %K MTB %K oncology %K tumor %K clinical trials %K patient care %K treatment %K evidence-based %K accessibility to care %D 2025 %7 5.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Molecular tumor boards (MTBs) require intensive manual investigation to generate optimal treatment recommendations for patients. Large language models (LLMs) can catalyze MTB recommendations, decrease human error, improve accessibility to care, and enhance the efficiency of precision oncology. Objective: In this study, we aimed to investigate the efficacy of LLM-generated treatments for MTB patients. We specifically investigate the LLMs’ ability to generate evidence-based treatment recommendations using PubMed references. Methods: We built a retrieval augmented generation pipeline using PubMed data. We prompted the resulting LLM to generate treatment recommendations with PubMed references using a test set of patients from an MTB conference at a large comprehensive cancer center at a tertiary care institution. Members of the MTB manually assessed the relevancy and correctness of the generated responses. Results: A total of 75% of the referenced articles were properly cited from PubMed, while 17% of the referenced articles were hallucinations, and the remaining were not properly cited from PubMed. Clinician-generated LLM queries achieved higher accuracy through clinician evaluation than automated queries, with clinicians labeling 25% of LLM responses as equal to their recommendations and 37.5% as alternative plausible treatments. Conclusions: This study demonstrates how retrieval augmented generation–enhanced LLMs can be a powerful tool in accelerating MTB conferences, as LLMs are sometimes capable of achieving clinician-equal treatment recommendations. However, further investigation is required to achieve stable results with zero hallucinations. LLMs signify a scalable solution to the time-intensive process of MTB investigations. However, LLM performance demonstrates that they must be used with heavy clinician supervision, and cannot yet fully automate the MTB pipeline. %M 40053768 %R 10.2196/64364 %U https://www.jmir.org/2025/1/e64364 %U https://doi.org/10.2196/64364 %U http://www.ncbi.nlm.nih.gov/pubmed/40053768 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e63631 %T Large Language Models’ Accuracy in Emulating Human Experts’ Evaluation of Public Sentiments about Heated Tobacco Products on Social Media: Evaluation Study %A Kim,Kwanho %A Kim,Soojong %+ Department of Communication, University of California Davis, 1 Shields Ave, Kerr Hall #361, Davis, CA, 95616, United States, 1 530 752 0966, sjokim@ucdavis.edu %K heated tobacco products %K artificial intelligence %K large language models %K social media %K sentiment analysis %K ChatGPT %K generative pre-trained transformer %K GPT %K LLM %K NLP %K natural language processing %K machine learning %K language model %K sentiment %K evaluation %K social media %K tobacco %K alternative %K prevention %K nicotine %K OpenAI %D 2025 %7 4.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Sentiment analysis of alternative tobacco products discussed on social media is crucial in tobacco control research. Large language models (LLMs) are artificial intelligence models that were trained on extensive text data to emulate the linguistic patterns of humans. LLMs may hold the potential to streamline the time-consuming and labor-intensive process of human sentiment analysis. Objective: This study aimed to examine the accuracy of LLMs in replicating human sentiment evaluation of social media messages relevant to heated tobacco products (HTPs). Methods: GPT-3.5 and GPT-4 Turbo (OpenAI) were used to classify 500 Facebook (Meta Platforms) and 500 Twitter (subsequently rebranded X) messages. Each set consisted of 200 human-labeled anti-HTPs, 200 pro-HTPs, and 100 neutral messages. The models evaluated each message up to 20 times to generate multiple response instances reporting its classification decisions. The majority of the labels from these responses were assigned as a model’s decision for the message. The models’ classification decisions were then compared with those of human evaluators. Results: GPT-3.5 accurately replicated human sentiment evaluation in 61.2% of Facebook messages and 57% of Twitter messages. GPT-4 Turbo demonstrated higher accuracies overall, with 81.7% for Facebook messages and 77% for Twitter messages. GPT-4 Turbo’s accuracy with 3 response instances reached 99% of the accuracy achieved with 20 response instances. GPT-4 Turbo’s accuracy was higher for human-labeled anti- and pro-HTP messages compared with neutral messages. Most of the GPT-3.5 misclassifications occurred when anti- or pro-HTP messages were incorrectly classified as neutral or irrelevant by the model, whereas GPT-4 Turbo showed improvements across all sentiment categories and reduced misclassifications, especially in incorrectly categorized messages as irrelevant. Conclusions: LLMs can be used to analyze sentiment in social media messages about HTPs. Results from GPT-4 Turbo suggest that accuracy can reach approximately 80% compared with the results of human experts, even with a small number of labeling decisions generated by the model. A potential risk of using LLMs is the misrepresentation of the overall sentiment due to the differences in accuracy across sentiment categories. Although this issue could be reduced with the newer language model, future efforts should explore the mechanisms underlying the discrepancies and how to address them systematically. %M 40053746 %R 10.2196/63631 %U https://www.jmir.org/2025/1/e63631 %U https://doi.org/10.2196/63631 %U http://www.ncbi.nlm.nih.gov/pubmed/40053746 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e56692 %T Use of Artificial Intelligence, Internet of Things, and Edge Intelligence in Long-Term Care for Older People: Comprehensive Analysis Through Bibliometric, Google Trends, and Content Analysis %A Chien,Shuo-Chen %A Yen,Chia-Ming %A Chang,Yu-Hung %A Chen,Ying-Erh %A Liu,Chia-Chun %A Hsiao,Yu-Ping %A Yang,Ping-Yen %A Lin,Hong-Ming %A Yang,Tsung-En %A Lu,Xing-Hua %A Wu,I-Chien %A Hsu,Chih-Cheng %A Chiou,Hung-Yi %A Chung,Ren-Hua %+ Institute of Population Health Sciences, National Health Research Institutes, 35, Keyan Road, Zhunan Town, Miaoli County, 350, Taiwan, 886 37 246 166 ext 36105, rchung@nhri.edu.tw %K bibliometric analysis %K Google Trends %K content analysis %K long-term care %K older adults %K artificial intelligence %K Internet of Things %K edge intelligence %D 2025 %7 4.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: The global aging population poses critical challenges for long-term care (LTC), including workforce shortages, escalating health care costs, and increasing demand for high-quality care. Integrating artificial intelligence (AI), the Internet of Things (IoT), and edge intelligence (EI) offers transformative potential to enhance care quality, improve safety, and streamline operations. However, existing research lacks a comprehensive analysis that synthesizes academic trends, public interest, and deeper insights regarding these technologies. Objective: This study aims to provide a holistic overview of AI, IoT, and EI applications in LTC for older adults through a comprehensive bibliometric analysis, public interest insights from Google Trends, and content analysis of the top-cited research papers. Methods: Bibliometric analysis was conducted using data from Web of Science, PubMed, and Scopus to identify key themes and trends in the field, while Google Trends was used to assess public interest. A content analysis of the top 1% of most-cited papers provided deeper insights into practical applications. Results: A total of 6378 papers published between 2014 and 2023 were analyzed. The bibliometric analysis revealed that the United States, China, and Canada are leading contributors, with strong thematic overlaps in areas such as dementia care, machine learning, and wearable health monitoring technologies. High correlations were found between academic and public interest, in key topics such as “long-term care” (τ=0.89, P<.001) and “caregiver” (τ=0.72, P=.004). The content analysis demonstrated that social robots, particularly PARO, significantly improved mood and reduced agitation in patients with dementia. However, limitations, including small sample sizes, short study durations, and a narrow focus on dementia care, were noted. Conclusions: AI, IoT, and EI collectively form a powerful ecosystem in LTC settings, addressing different aspects of care for older adults. Our study suggests that increased international collaboration and the integration of emerging themes such as “rehabilitation,” “stroke,” and “mHealth” are necessary to meet the evolving care needs of this population. Additionally, incorporating high-interest keywords such as “machine learning,” “smart home,” and “caregiver” can enhance discoverability and relevance for both academic and public audiences. Future research should focus on expanding sample sizes, conducting long-term multicenter trials, and exploring broader health conditions beyond dementia, such as frailty and depression. %M 40053718 %R 10.2196/56692 %U https://www.jmir.org/2025/1/e56692 %U https://doi.org/10.2196/56692 %U http://www.ncbi.nlm.nih.gov/pubmed/40053718 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e55341 %T Enhancing Doctor-Patient Shared Decision-Making: Design of a Novel Collaborative Decision Description Language %A Guo,XiaoRui %A Xiao,Liang %A Liu,Xinyu %A Chen,Jianxia %A Tong,Zefang %A Liu,Ziji %+ School of Computer Science, Hubei University of Technology, 28 Nanli Road, Hongshan District, Hubei Province, Wuhan, 430068, China, 86 18062500600, lx@mail.hbut.edu.cn %K shared decision-making %K speech acts %K agent %K argumentation %K interaction protocol %D 2025 %7 4.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Effective shared decision-making between patients and physicians is crucial for enhancing health care quality and reducing medical errors. The literature shows that the absence of effective methods to facilitate shared decision-making can result in poor patient engagement and unfavorable decision outcomes. Objective: In this paper, we propose a Collaborative Decision Description Language (CoDeL) to model shared decision-making between patients and physicians, offering a theoretical foundation for studying various shared decision scenarios. Methods: CoDeL is based on an extension of the interaction protocol language of Lightweight Social Calculus. The language utilizes speech acts to represent the attitudes of shared decision-makers toward decision propositions, as well as their semantic relationships within dialogues. It supports interactive argumentation among decision makers by embedding clinical evidence into each segment of decision protocols. Furthermore, CoDeL enables personalized decision-making, allowing for the demonstration of characteristics such as persistence, critical thinking, and openness. Results: The feasibility of the approach is demonstrated through a case study of shared decision-making in the disease domain of atrial fibrillation. Our experimental results show that integrating the proposed language with GPT can further enhance its capabilities in interactive decision-making, improving interpretability. Conclusions: The proposed novel CoDeL can enhance doctor-patient shared decision-making in a rational, personalized, and interpretable manner. %M 40053763 %R 10.2196/55341 %U https://www.jmir.org/2025/1/e55341 %U https://doi.org/10.2196/55341 %U http://www.ncbi.nlm.nih.gov/pubmed/40053763 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 12 %N %P e66699 %T An AI-Based Clinical Decision Support System for Antibiotic Therapy in Sepsis (KINBIOTICS): Use Case Analysis %A Düvel,Juliane Andrea %A Lampe,David %A Kirchner,Maren %A Elkenkamp,Svenja %A Cimiano,Philipp %A Düsing,Christoph %A Marchi,Hannah %A Schmiegel,Sophie %A Fuchs,Christiane %A Claßen,Simon %A Meier,Kirsten-Laura %A Borgstedt,Rainer %A Rehberg,Sebastian %A Greiner,Wolfgang %K CDSS %K use case analysis %K technology acceptance %K sepsis %K infection %K infectious disease %K antimicrobial resistance %K clinical decision support system %K decision-making %K clinical support %K machine learning %K ML %K artificial intelligence %K AI %K algorithm %K model %K analytics %K predictive models %K deep learning %K early warning %K early detection %D 2025 %7 4.3.2025 %9 %J JMIR Hum Factors %G English %X Background: Antimicrobial resistances pose significant challenges in health care systems. Clinical decision support systems (CDSSs) represent a potential strategy for promoting a more targeted and guideline-based use of antibiotics. The integration of artificial intelligence (AI) into these systems has the potential to support physicians in selecting the most effective drug therapy for a given patient. Objective: This study aimed to analyze the feasibility of an AI-based CDSS pilot version for antibiotic therapy in sepsis patients and identify facilitating and inhibiting conditions for its implementation in intensive care medicine. Methods: The evaluation was conducted in 2 steps, using a qualitative methodology. Initially, expert interviews were conducted, in which intensive care physicians were asked to assess the AI-based recommendations for antibiotic therapy in terms of plausibility, layout, and design. Subsequently, focus group interviews were conducted to examine the technology acceptance of the AI-based CDSS. The interviews were anonymized and evaluated using content analysis. Results: In terms of the feasibility, barriers included variability in previous antibiotic administration practices, which affected the predictive ability of AI recommendations, and the increased effort required to justify deviations from these recommendations. Physicians’ confidence in accepting or rejecting recommendations depended on their level of professional experience. The ability to re-evaluate CDSS recommendations and an intuitive, user-friendly system design were identified as factors that enhanced acceptance and usability. Overall, barriers included low levels of digitization in clinical practice, limited availability of cross-sectoral data, and negative previous experiences with CDSSs. Conversely, facilitators to CDSS implementation were potential time savings, physicians’ openness to adopting new technologies, and positive previous experiences. Conclusions: Early integration of users is beneficial for both the identification of relevant context factors and the further development of an effective CDSS. Overall, the potential of AI-based CDSSs is offset by inhibiting contextual conditions that impede its acceptance and implementation. The advancement of AI-based CDSSs and the mitigation of these inhibiting conditions are crucial for the realization of its full potential. %R 10.2196/66699 %U https://humanfactors.jmir.org/2025/1/e66699 %U https://doi.org/10.2196/66699 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e68354 %T Machine Learning–Based Prediction of Early Complications Following Surgery for Intestinal Obstruction: Multicenter Retrospective Study %A Huang,Pinjie %A Yang,Jirong %A Zhao,Dizhou %A Ran,Taojia %A Luo,Yuheng %A Yang,Dong %A Zheng,Xueqin %A Zhou,Shaoli %A Chen,Chaojin %+ Department of Anesthesiology, Third Affiliated Hospital of Sun Yat-sen University, No.600, Tianhe Road, Guangzhou, Guangzhou, 510630, China, 86 13430322182, chenchj28@mail.sysu.edu.cn %K postoperative complications %K intestinal obstruction %K machine learning %K early intervention %K risk calculator %K prediction model %K Shapley additive explanations %D 2025 %7 3.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Early complications increase in-hospital stay and mortality after intestinal obstruction surgery. It is important to identify the risk of postoperative early complications for patients with intestinal obstruction at a sufficiently early stage, which would allow preemptive individualized enhanced therapy to be conducted to improve the prognosis of patients with intestinal obstruction. A risk predictive model based on machine learning is helpful for early diagnosis and timely intervention. Objective: This study aimed to construct an online risk calculator for early postoperative complications in patients after intestinal obstruction surgery based on machine learning algorithms. Methods: A total of 396 patients undergoing intestinal obstruction surgery from April 2013 to April 2021 at an independent medical center were enrolled as the training cohort. Overall, 7 machine learning methods were used to establish prediction models, with their performance appraised via the area under the receiver operating characteristic curve (AUROC), accuracy, sensitivity, specificity, and F1-score. The best model was validated through 2 independent medical centers, a publicly available perioperative dataset the Informative Surgical Patient dataset for Innovative Research Environment (INSPIRE), and a mixed cohort consisting of the above 3 datasets, involving 50, 66, 48, and 164 cases, respectively. Shapley Additive Explanations were measured to identify risk factors. Results: The incidence of postoperative complications in the training cohort was 47.44% (176/371), while the incidences in 4 external validation cohorts were 34% (17/50), 56.06% (37/66), 52.08% (25/48), and 48.17% (79/164), respectively. Postoperative complications were associated with 8-item features: Physiological Severity Score for the Enumeration of Mortality and Morbidity (POSSUM physiological score), the amount of colloid infusion, shock index before anesthesia induction, ASA (American Society of Anesthesiologists) classification, the percentage of neutrophils, shock index at the end of surgery, age, and total protein. The random forest model showed the best overall performance, with an AUROC of 0.788 (95% CI 0.709-0.869), accuracy of 0.756, sensitivity of 0.695, specificity of 0.810, and F1-score of 0.727 in the training cohort. The random forest model also achieved a comparable AUROC of 0.755 (95% CI 0.652-0.839) in validation cohort 1, a greater AUROC of 0.817 (95% CI 0.695-0.913) in validation cohort 2, a similar AUROC of 0.786 (95% CI 0.628-0.902) in validation cohort 3, and the comparable AUROC of 0.720 (95% CI 0.671-0.768) in validation cohort 4. We visualized the random forest model and created a web-based online risk calculator. Conclusions: We have developed and validated a generalizable random forest model to predict postoperative early complications in patients undergoing intestinal obstruction surgery, enabling clinicians to screen high-risk patients and implement early individualized interventions. An online risk calculator for early postoperative complications was developed to make the random forest model accessible to clinicians around the world. %M 40053794 %R 10.2196/68354 %U https://www.jmir.org/2025/1/e68354 %U https://doi.org/10.2196/68354 %U http://www.ncbi.nlm.nih.gov/pubmed/40053794 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67576 %T Deep Learning–Based Electrocardiogram Model (EIANet) to Predict Emergency Department Cardiac Arrest: Development and External Validation Study %A Lu,Shao-Chi %A Chen,Guang-Yuan %A Liu,An-Sheng %A Sun,Jen-Tang %A Gao,Jun-Wan %A Huang,Chien-Hua %A Tsai,Chu-Lin %A Fu,Li-Chen %+ , Department of Emergency Medicine, National Taiwan University Hospital and National Taiwan University College of Medicine, No.1 Jen-Ai Road, Taipei, 100, Taiwan, 886 23123456 ext 267684, chulintsai@ntu.edu.tw %K cardiac arrest %K emergency department %K deep learning %K computer vision %K electrocardiogram %D 2025 %7 28.2.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: In-hospital cardiac arrest (IHCA) is a severe and sudden medical emergency that is characterized by the abrupt cessation of circulatory function, leading to death or irreversible organ damage if not addressed immediately. Emergency department (ED)–based IHCA (EDCA) accounts for 10% to 20% of all IHCA cases. Early detection of EDCA is crucial, yet identifying subtle signs of cardiac deterioration is challenging. Traditional EDCA prediction methods primarily rely on structured vital signs or electrocardiogram (ECG) signals, which require additional preprocessing or specialized devices. This study introduces a novel approach using image-based 12-lead ECG data obtained at ED triage, leveraging the inherent richness of visual ECG patterns to enhance prediction and integration into clinical workflows. Objective: This study aims to address the challenge of early detection of EDCA by developing an innovative deep learning model, the ECG-Image-Aware Network (EIANet), which uses 12-lead ECG images for early prediction of EDCA. By focusing on readily available triage ECG images, this research seeks to create a practical and accessible solution that seamlessly integrates into real-world ED workflows. Methods: For adult patients with EDCA (cases), 12-lead ECG images at ED triage were obtained from 2 independent data sets: National Taiwan University Hospital (NTUH) and Far Eastern Memorial Hospital (FEMH). Control ECGs were randomly selected from adult ED patients without cardiac arrest during the same study period. In EIANet, ECG images were first converted to binary form, followed by noise reduction, connected component analysis, and morphological opening. A spatial attention module was incorporated into the ResNet50 architecture to enhance feature extraction, and a custom binary recall loss (BRLoss) was used to balance precision and recall, addressing slight data set imbalance. The model was developed and internally validated on the NTUH-ECG data set and was externally validated on an independent FEMH-ECG data set. The model performance was evaluated using the F1-score, area under the receiver operating characteristic curve (AUROC), and area under the precision-recall curve (AUPRC). Results: There were 571 case ECGs and 826 control ECGs in the NTUH data set and 378 case ECGs and 713 control ECGs in the FEMH data set. The novel EIANet model achieved an F1-score of 0.805, AUROC of 0.896, and AUPRC of 0.842 on the NTUH-ECG data set with a 40% positive sample ratio. It achieved an F1-score of 0.650, AUROC of 0.803, and AUPRC of 0.678 on the FEMH-ECG data set with a 34.6% positive sample ratio. The feature map showed that the region of interest in the ECG was the ST segment. Conclusions: EIANet demonstrates promising potential for accurately predicting EDCA using triage ECG images, offering an effective solution for early detection of high-risk cases in emergency settings. This approach may enhance the ability of health care professionals to make timely decisions, with the potential to improve patient outcomes by enabling earlier interventions for EDCA. %M 40053733 %R 10.2196/67576 %U https://www.jmir.org/2025/1/e67576 %U https://doi.org/10.2196/67576 %U http://www.ncbi.nlm.nih.gov/pubmed/40053733 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e66622 %T Effectiveness of AI for Enhancing Computed Tomography Image Quality and Radiation Protection in Radiology: Systematic Review and Meta-Analysis %A Zhang,Subo %A Zhu,Zhitao %A Yu,Zhenfei %A Sun,Haifeng %A Sun,Yi %A Huang,Hai %A Xu,Lei %A Wan,Jinxin %+ Department of Medical Imaging, The Second People's Hospital of Lianyungang, No 41 Hailian East Road, Lianyungang, 222000, China, 86 051885775003, jxwlyg@126.com %K artificial intelligence %K computed tomography %K image quality %K radiation protection %K meta-analysis %D 2025 %7 27.2.2025 %9 Review %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) presents a promising approach to balancing high image quality with reduced radiation exposure in computed tomography (CT) imaging. Objective: This meta-analysis evaluates the effectiveness of AI in enhancing CT image quality and lowering radiation doses. Methods: A thorough literature search was performed across several databases, including PubMed, Embase, Web of Science, Science Direct, and Cochrane Library, with the final update in 2024. We included studies that compared AI-based interventions to conventional CT techniques. The quality of these studies was assessed using the Newcastle-Ottawa Scale. Random effect models were used to pool results, and heterogeneity was measured using the I² statistic. Primary outcomes included image quality, CT dose index, and diagnostic accuracy. Results: This meta-analysis incorporated 5 clinical validation studies published between 2022 and 2024, totaling 929 participants. Results indicated that AI-based interventions significantly improved image quality (mean difference 0.70, 95% CI 0.43-0.96; P<.001) and showed a positive trend in reducing the CT dose index, though not statistically significant (mean difference 0.47, 95% CI –0.21 to 1.15; P=.18). AI also enhanced image analysis efficiency (odds ratio 1.57, 95% CI 1.08-2.27; P=.02) and demonstrated high accuracy and sensitivity in detecting intracranial aneurysms, with low-dose CT using AI reconstruction showing noninferiority for liver lesion detection. Conclusions: The findings suggest that AI-based interventions can significantly enhance CT imaging practices by improving image quality and potentially reducing radiation doses, which may lead to better diagnostic accuracy and patient safety. However, these results should be interpreted with caution due to the limited number of studies and the variability in AI algorithms. Further research is needed to clarify AI’s impact on radiation reduction and to establish clinical standards. %M 40053787 %R 10.2196/66622 %U https://www.jmir.org/2025/1/e66622 %U https://doi.org/10.2196/66622 %U http://www.ncbi.nlm.nih.gov/pubmed/40053787 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e53892 %T Future Use of AI in Diagnostic Medicine: 2-Wave Cross-Sectional Survey Study %A Cabral,Bernardo Pereira %A Braga,Luiza Amara Maciel %A Conte Filho,Carlos Gilbert %A Penteado,Bruno %A Freire de Castro Silva,Sandro Luis %A Castro,Leonardo %A Fornazin,Marcelo %A Mota,Fabio %+ Cellular Communication Laboratory, Oswaldo Cruz Institute, Oswaldo Cruz Foundation, Avenida Brasil, 4365, Manguinhos, Rio de Janeiro, 21040-900, Brazil, 55 2125984220, fabio.mota@fiocruz.br %K artificial intelligence %K AI %K diagnostic medicine %K survey research %K researcher opinion %K future %D 2025 %7 27.2.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: The rapid evolution of artificial intelligence (AI) presents transformative potential for diagnostic medicine, offering opportunities to enhance diagnostic accuracy, reduce costs, and improve patient outcomes. Objective: This study aimed to assess the expected future impact of AI on diagnostic medicine by comparing global researchers’ expectations using 2 cross-sectional surveys. Methods: The surveys were conducted in September 2020 and February 2023. Each survey captured a 10-year projection horizon, gathering insights from >3700 researchers with expertise in AI and diagnostic medicine from all over the world. The survey sought to understand the perceived benefits, integration challenges, and evolving attitudes toward AI use in diagnostic settings. Results: Results indicated a strong expectation among researchers that AI will substantially influence diagnostic medicine within the next decade. Key anticipated benefits include enhanced diagnostic reliability, reduced screening costs, improved patient care, and decreased physician workload, addressing the growing demand for diagnostic services outpacing the supply of medical professionals. Specifically, x-ray diagnosis, heart rhythm interpretation, and skin malignancy detection were identified as the diagnostic tools most likely to be integrated with AI technologies due to their maturity and existing AI applications. The surveys highlighted the growing optimism regarding AI’s ability to transform traditional diagnostic pathways and enhance clinical decision-making processes. Furthermore, the study identified barriers to the integration of AI in diagnostic medicine. The primary challenges cited were the difficulties of embedding AI within existing clinical workflows, ethical and regulatory concerns, and data privacy issues. Respondents emphasized uncertainties around legal responsibility and accountability for AI-supported clinical decisions, data protection challenges, and the need for robust regulatory frameworks to ensure safe AI deployment. Ethical concerns, particularly those related to algorithmic transparency and bias, were noted as increasingly critical, reflecting a heightened awareness of the potential risks associated with AI adoption in clinical settings. Differences between the 2 survey waves indicated a growing focus on ethical and regulatory issues, suggesting an evolving recognition of these challenges over time. Conclusions: Despite these barriers, there was notable consistency in researchers’ expectations across the 2 survey periods, indicating a stable and sustained outlook on AI’s transformative potential in diagnostic medicine. The findings show the need for interdisciplinary collaboration among clinicians, AI developers, and regulators to address ethical and practical challenges while maximizing AI’s benefits. This study offers insights into the projected trajectory of AI in diagnostic medicine, guiding stakeholders, including health care providers, policy makers, and technology developers, on navigating the opportunities and challenges of AI integration. %M 40053779 %R 10.2196/53892 %U https://www.jmir.org/2025/1/e53892 %U https://doi.org/10.2196/53892 %U http://www.ncbi.nlm.nih.gov/pubmed/40053779 %0 Journal Article %@ 2562-7600 %I JMIR Publications %V 8 %N %P e63058 %T Advancing Clinical Chatbot Validation Using AI-Powered Evaluation With a New 3-Bot Evaluation System: Instrument Validation Study %A Choo,Seungheon %A Yoo,Suyoung %A Endo,Kumiko %A Truong,Bao %A Son,Meong Hi %K artificial intelligence %K patient education %K therapy %K computer-assisted %K computer %K understandable %K accurate %K understandability %K automation %K chatbots %K bots %K conversational agents %K emotions %K emotional %K depression %K depressive %K anxiety %K anxious %K nervous %K nervousness %K empathy %K empathetic %K communication %K interactions %K frustrated %K frustration %K relationships %D 2025 %7 27.2.2025 %9 %J JMIR Nursing %G English %X Background: The health care sector faces a projected shortfall of 10 million workers by 2030. Artificial intelligence (AI) automation in areas such as patient education and initial therapy screening presents a strategic response to mitigate this shortage and reallocate medical staff to higher-priority tasks. However, current methods of evaluating early-stage health care AI chatbots are highly limited due to safety concerns and the amount of time and effort that goes into evaluating them. Objective: This study introduces a novel 3-bot method for efficiently testing and validating early-stage AI health care provider chatbots. To extensively test AI provider chatbots without involving real patients or researchers, various AI patient bots and an evaluator bot were developed. Methods: Provider bots interacted with AI patient bots embodying frustrated, anxious, or depressed personas. An evaluator bot reviewed interaction transcripts based on specific criteria. Human experts then reviewed each interaction transcript, and the evaluator bot’s results were compared to human evaluation results to ensure accuracy. Results: The patient-education bot’s evaluations by the AI evaluator and the human evaluator were nearly identical, with minimal variance, limiting the opportunity for further analysis. The screening bot’s evaluations also yielded similar results between the AI evaluator and human evaluator. Statistical analysis confirmed the reliability and accuracy of the AI evaluations. Conclusions: The innovative evaluation method ensures a safe, adaptable, and effective means to test and refine early versions of health care provider chatbots without risking patient safety or investing excessive researcher time and effort. Our patient-education evaluator bots could have benefitted from larger evaluation criteria, as we had extremely similar results from the AI and human evaluators, which could have arisen because of the small number of evaluation criteria. We were limited in the amount of prompting we could input into each bot due to the practical consideration that response time increases with larger and larger prompts. In the future, using techniques such as retrieval augmented generation will allow the system to receive more information and become more specific and accurate in evaluating the chatbots. This evaluation method will allow for rapid testing and validation of health care chatbots to automate basic medical tasks, freeing providers to address more complex tasks. %R 10.2196/63058 %U https://nursing.jmir.org/2025/1/e63058 %U https://doi.org/10.2196/63058 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e63601 %T Predicting Agitation-Sedation Levels in Intensive Care Unit Patients: Development of an Ensemble Model %A Dai,Pei-Yu %A Lin,Pei-Yi %A Sheu,Ruey-Kai %A Liu,Shu-Fang %A Wu,Yu-Cheng %A Wu,Chieh-Liang %A Chen,Wei-Lin %A Huang,Chien-Chung %A Lin,Guan-Yin %A Chen,Lun-Chi %K intensive care units %K ICU %K agitation %K sedation %K ensemble learning %K machine learning %K ML %K artificial intelligence %K AI %K patient safety %K efficiency %K automation %K ICU care %K ensemble model %K learning model %K explanatory analysis %D 2025 %7 26.2.2025 %9 %J JMIR Med Inform %G English %X Background: Agitation and sedation management is critical in intensive care as it affects patient safety. Traditional nursing assessments suffer from low frequency and subjectivity. Automating these assessments can boost intensive care unit (ICU) efficiency, treatment capacity, and patient safety. Objectives: The aim of this study was to develop a machine-learning based assessment of agitation and sedation. Methods: Using data from the Taichung Veterans General Hospital ICU database (2020), an ensemble learning model was developed for classifying the levels of agitation and sedation. Different ensemble learning model sequences were compared. In addition, an interpretable artificial intelligence approach, SHAP (Shapley additive explanations), was employed for explanatory analysis. Results: With 20 features and 121,303 data points, the random forest model achieved high area under the curve values across all models (sedation classification: 0.97; agitation classification: 0.88). The ensemble learning model enhanced agitation sensitivity (0.82) while maintaining high AUC values across all categories (all >0.82). The model explanations aligned with clinical experience. Conclusions: This study proposes an ICU agitation-sedation assessment automation using machine learning, enhancing efficiency and safety. Ensemble learning improves agitation sensitivity while maintaining accuracy. Real-time monitoring and future digital integration have the potential for advancements in intensive care. %R 10.2196/63601 %U https://medinform.jmir.org/2025/1/e63601 %U https://doi.org/10.2196/63601 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67010 %T Stroke Diagnosis and Prediction Tool Using ChatGLM: Development and Validation Study %A Song,Xiaowei %A Wang,Jiayi %A He,Feifei %A Yin,Wei %A Ma,Weizhi %A Wu,Jian %+ Department of Neurology, Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsinghua University, No.168 of Litang Road, Beijing, 102218, China, 86 01056118918, wujianxuanwu@126.com %K stroke %K diagnosis %K large language model %K ChatGLM %K generative language model %K primary care %K acute stroke %K prediction tool %K stroke detection %K treatment %K electronic health records %K noncontrast computed tomography %D 2025 %7 26.2.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Stroke is a globally prevalent disease that imposes a significant burden on health care systems and national economies. Accurate and rapid stroke diagnosis can substantially increase reperfusion rates, mitigate disability, and reduce mortality. However, there are considerable discrepancies in the diagnosis and treatment of acute stroke. Objective: The aim of this study is to develop and validate a stroke diagnosis and prediction tool using ChatGLM-6B, which uses free-text information from electronic health records in conjunction with noncontrast computed tomography (NCCT) reports to enhance stroke detection and treatment. Methods: A large language model (LLM) using ChatGLM-6B was proposed to facilitate stroke diagnosis by identifying optimal input combinations, using external tools, and applying instruction tuning and low-rank adaptation (LoRA) techniques. A dataset containing details of 1885 patients with and those without stroke from 2016 to 2024 was used for training and internal validation; another 335 patients from two hospitals were used as an external test set, including 230 patients from the training hospital but admitted at different periods, and 105 patients from another hospital. Results: The LLM, which is based on clinical notes and NCCT, demonstrates exceptionally high accuracy in stroke diagnosis, achieving 99% in the internal validation dataset and 95.5% and 79.1% in two external test cohorts. It effectively distinguishes between ischemia and hemorrhage, with an accuracy of 100% in the validation dataset and 99.1% and 97.1% in the other test cohorts. In addition, it identifies large vessel occlusions (LVO) with an accuracy of 80% in the validation dataset and 88.6% and 83.3% in the other test cohorts. Furthermore, it screens patients eligible for intravenous thrombolysis (IVT) with an accuracy of 89.4% in the validation dataset and 60% and 80% in the other test cohorts. Conclusions: We developed an LLM that leverages clinical text and NCCT to identify strokes and guide recanalization therapy. While our results necessitate validation through widespread deployment, they hold the potential to enhance stroke identification and reduce reperfusion time. %M 40009850 %R 10.2196/67010 %U https://www.jmir.org/2025/1/e67010 %U https://doi.org/10.2196/67010 %U http://www.ncbi.nlm.nih.gov/pubmed/40009850 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e55492 %T Complete Blood Count and Monocyte Distribution Width–Based Machine Learning Algorithms for Sepsis Detection: Multicentric Development and External Validation Study %A Campagner,Andrea %A Agnello,Luisa %A Carobene,Anna %A Padoan,Andrea %A Del Ben,Fabio %A Locatelli,Massimo %A Plebani,Mario %A Ognibene,Agostino %A Lorubbio,Maria %A De Vecchi,Elena %A Cortegiani,Andrea %A Piva,Elisa %A Poz,Donatella %A Curcio,Francesco %A Cabitza,Federico %A Ciaccio,Marcello %+ Department of Computer Science, Systems and Communication, University of Milano-Bicocca, Piazza dell'Ateneo Nuovo, 1, Milano, 20126, Italy, 39 0264487888, federico.cabitza@unimib.it %K sepsis %K medical machine learning %K external validation %K complete blood count %K controllable AI %K machine learning %K artificial intelligence %K development study %K validation study %K organ %K organ dysfunction %K detection %K clinical signs %K clinical symptoms %K biomarker %K diagnostic %K machine learning model %K sepsis detection %K early detection %K data distribution %D 2025 %7 26.2.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Sepsis is an organ dysfunction caused by a dysregulated host response to infection. Early detection is fundamental to improving the patient outcome. Laboratory medicine can play a crucial role by providing biomarkers whose alteration can be detected before the onset of clinical signs and symptoms. In particular, the relevance of monocyte distribution width (MDW) as a sepsis biomarker has emerged in the previous decade. However, despite encouraging results, MDW has poor sensitivity and positive predictive value when compared to other biomarkers. Objective: This study aims to investigate the use of machine learning (ML) to overcome the limitations mentioned earlier by combining different parameters and therefore improving sepsis detection. However, making ML models function in clinical practice may be problematic, as their performance may suffer when deployed in contexts other than the research environment. In fact, even widely used commercially available models have been demonstrated to generalize poorly in out-of-distribution scenarios. Methods: In this multicentric study, we developed ML models whose intended use is the early detection of sepsis on the basis of MDW and complete blood count parameters. In total, data from 6 patient cohorts (encompassing 5344 patients) collected at 5 different Italian hospitals were used to train and externally validate ML models. The models were trained on a patient cohort encompassing patients enrolled at the emergency department, and it was externally validated on 5 different cohorts encompassing patients enrolled at both the emergency department and the intensive care unit. The cohorts were selected to exhibit a variety of data distribution shifts compared to the training set, including label, covariate, and missing data shifts, enabling a conservative validation of the developed models. To improve generalizability and robustness to different types of distribution shifts, the developed ML models combine traditional methodologies with advanced techniques inspired by controllable artificial intelligence (AI), namely cautious classification, which gives the ML models the ability to abstain from making predictions, and explainable AI, which provides health operators with useful information about the models’ functioning. Results: The developed models achieved good performance on the internal validation (area under the receiver operating characteristic curve between 0.91 and 0.98), as well as consistent generalization performance across the external validation datasets (area under the receiver operating characteristic curve between 0.75 and 0.95), outperforming baseline biomarkers and state-of-the-art ML models for sepsis detection. Controllable AI techniques were further able to improve performance and were used to derive an interpretable set of diagnostic rules. Conclusions: Our findings demonstrate how controllable AI approaches based on complete blood count and MDW may be used for the early detection of sepsis while also demonstrating how the proposed methodology can be used to develop ML models that are more resistant to different types of data distribution shifts. %M 40009841 %R 10.2196/55492 %U https://www.jmir.org/2025/1/e55492 %U https://doi.org/10.2196/55492 %U http://www.ncbi.nlm.nih.gov/pubmed/40009841 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 12 %N %P e52358 %T Comparison of a Novel Machine Learning–Based Clinical Query Platform With Traditional Guideline Searches for Hospital Emergencies: Prospective Pilot Study of User Experience and Time Efficiency %A Ejaz,Hamza %A Tsui,Hon Lung Keith %A Patel,Mehul %A Ulloa Paredes,Luis Rafael %A Knights,Ellen %A Aftab,Shah Bakht %A Subbe,Christian Peter %K artificial intelligence %K machine learning %K information search %K emergency care %K developing %K testing %K information retrieval %K hospital care %K training %K clinical practice %K clinical experience %K user satisfaction %K clinical impact %K user group %K users %K study design %K mobile phone %D 2025 %7 25.2.2025 %9 %J JMIR Hum Factors %G English %X Background: Emergency and acute medicine doctors require easily accessible evidence-based information to safely manage a wide range of clinical presentations. The inability to find evidence-based local guidelines on the trust’s intranet leads to information retrieval from the World Wide Web. Artificial intelligence (AI) has the potential to make evidence-based information retrieval faster and easier. Objective: The aim of the study is to conduct a time-motion analysis, comparing cohorts of junior doctors using (1) an AI-supported search engine versus (2) the traditional hospital intranet. The study also aims to examine the impact of the AI-supported search engine on the duration of searches and workflow when seeking answers to clinical queries at the point of care. Methods: This pre- and postobservational study was conducted in 2 phases. In the first phase, clinical information searches by 10 doctors caring for acutely unwell patients in acute medicine were observed during 10 working days. Based on these findings and input from a focus group of 14 clinicians, an AI-supported, context-sensitive search engine was implemented. In the second phase, clinical practice was observed for 10 doctors for an additional 10 working days using the new search engine. Results: The hospital intranet group (n=10) had a median of 23 months of clinical experience, while the AI-supported search engine group (n=10) had a median of 54 months. Participants using the AI-supported engine conducted fewer searches. User satisfaction and query resolution rates were similar between the 2 phases. Searches with the AI-supported engine took 43 seconds longer on average. Clinicians rated the new app with a favorable Net Promoter Score of 20. Conclusions: We report a successful feasibility pilot of an AI-driven search engine for clinical guidelines. Further development of the engine including the incorporation of large language models might improve accuracy and speed. More research is required to establish clinical impact in different user groups. Focusing on new staff at beginning of their post might be the most suitable study design. %R 10.2196/52358 %U https://humanfactors.jmir.org/2025/1/e52358 %U https://doi.org/10.2196/52358 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e56774 %T Reporting Quality of AI Intervention in Randomized Controlled Trials in Primary Care: Systematic Review and Meta-Epidemiological Study %A Zhong,Jinjia %A Zhu,Ting %A Huang,Yafang %+ , School of General Practice and Continuing Education, Capital Medical University, 4th Fl, Jieping Building, Capital Medical University, No.10 You An Men Wai Xi Tou Tiao, Fengtai district, Beijing, 100069, China, 86 18810673886, yafang@ccmu.edu.cn %K artificial intelligence %K randomized controlled trial %K reporting quality %K primary care %K meta-epidemiological study %D 2025 %7 25.2.2025 %9 Review %J J Med Internet Res %G English %X Background: The surge in artificial intelligence (AI) interventions in primary care trials lacks a study on reporting quality. Objective: This study aimed to systematically evaluate the reporting quality of both published randomized controlled trials (RCTs) and protocols for RCTs that investigated AI interventions in primary care. Methods: PubMed, Embase, Cochrane Library, MEDLINE, Web of Science, and CINAHL databases were searched for RCTs and protocols on AI interventions in primary care until November 2024. Eligible studies were published RCTs or full protocols for RCTs exploring AI interventions in primary care. The reporting quality was assessed using CONSORT-AI (Consolidated Standards of Reporting Trials–Artificial Intelligence) and SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials–Artificial Intelligence) checklists, focusing on AI intervention–related items. Results: A total of 11,711 records were identified. In total, 19 published RCTs and 21 RCT protocols for 35 trials were included. The overall proportion of adequately reported items was 65% (172/266; 95% CI 59%-70%) and 68% (214/315; 95% CI 62%-73%) for RCTs and protocols, respectively. The percentage of RCTs and protocols that reported a specific item ranged from 11% (2/19) to 100% (19/19) and from 10% (2/21) to 100% (21/21), respectively. The reporting of both RCTs and protocols exhibited similar characteristics and trends. They both lack transparency and completeness, which can be summarized in three aspects: without providing adequate information regarding the input data, without mentioning the methods for identifying and analyzing performance errors, and without stating whether and how the AI intervention and its code can be accessed. Conclusions: The reporting quality could be improved in both RCTs and protocols. This study helps promote the transparent and complete reporting of trials with AI interventions in primary care. %M 39998876 %R 10.2196/56774 %U https://www.jmir.org/2025/1/e56774 %U https://doi.org/10.2196/56774 %U http://www.ncbi.nlm.nih.gov/pubmed/39998876 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 4 %N %P e53026 %T Survey on Pain Detection Using Machine Learning Models: Narrative Review %A Fang,Ruijie %A Hosseini,Elahe %A Zhang,Ruoyu %A Fang,Chongzhou %A Rafatirad,Setareh %A Homayoun,Houman %+ Department of Electrical and Computer Engineering, University of California, One Shields Avenue, Davis, CA, 95616, United States, 1 5308676009, rjfang@ucdavis.edu %K pain %K pain assessment %K machine learning %K survey %K mobile phone %D 2025 %7 24.2.2025 %9 Review %J JMIR AI %G English %X Background: Pain, a leading reason people seek medical care, has become a social issue. Automated pain assessment has seen notable advancements over recent decades, addressing a critical need in both clinical and everyday settings. Objective: The objective of this survey was to provide a comprehensive overview of pain and its mechanisms, to explore existing research on automated pain recognition modalities, and to identify key challenges and future directions in this field. Methods: A literature review was conducted, analyzing studies focused on various modalities for automated pain recognition. The modalities reviewed include facial expressions, physiological signals, audio cues, and pupil dilation, with a focus on their efficacy and application in pain assessment. Results: The survey found that each modality offers unique contributions to automated pain recognition, with facial expressions and physiological signals showing particular promise. However, the reliability and accuracy of these modalities vary, often depending on factors such as individual variability and environmental conditions. Conclusions: While automated pain recognition has progressed considerably, challenges remain in achieving consistent accuracy across diverse populations and contexts. Future research directions are suggested to address these challenges, enhancing the reliability and applicability of automated pain assessment in clinical practice. %M 39993299 %R 10.2196/53026 %U https://ai.jmir.org/2025/1/e53026 %U https://doi.org/10.2196/53026 %U http://www.ncbi.nlm.nih.gov/pubmed/39993299 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e50708 %T Perspectives of Black, Latinx, Indigenous, and Asian Communities on Health Data Use and AI: Cross-Sectional Survey Study %A Rinderknecht,Fatuma-Ayaan %A Yang,Vivian B %A Tilahun,Mekaleya %A Lester,Jenna C %+ Department of Dermatology, University of California, San Francisco, 1701 Divisadero St, San Francisco, CA, 94115, United States, 1 (415) 353 7800, jenna.lester@ucsf.edu %K augmented intelligence %K artificial intelligence %K health equity %K dermatology %K Black %K Latinx %K Indigenous %K Asian %K racial and ethnic minority communities %K AI %K health care %K health data %K survey %K racism %K large language model %K LLM %K diversity %D 2025 %7 21.2.2025 %9 Research Letter %J J Med Internet Res %G English %X Despite excitement around artificial intelligence (AI)–based tools in health care, there is work to be done before they can be equitably deployed. The absence of diverse patient voices in discussions on AI is a pressing matter, and current studies have been limited in diversity. Our study inquired about the perspectives of racial and ethnic minority patients on the use of their health data in AI, by conducting a cross-sectional survey among 230 participants who were at least 18 years of age and identified as Black, Latinx, Indigenous, or Asian. While familiarity with AI was high, a smaller proportion of participants understood how AI can be used in health care (152/199, 76.4%), and an even smaller proportion understood how AI can be applied to dermatology (133/199, 66.8%). Overall, 69.8% (139/199) of participants agreed that they trusted the health care system to treat their medical information with respect; however, this varied significantly by income (P=.045). Only 64.3% (128/199) of participants felt comfortable with their medical data being used to build AI tools, and 83.4% (166/199) believed they should be compensated if their data are used to develop AI. To our knowledge, this is the first study focused on understanding opinions about health data use for AI among racial and ethnic minority individuals, as similar studies have had limited diversity. It is important to capture the opinions of diverse groups because the inclusion of their data is essential for building equitable AI tools; however, historical harms have made inclusion challenging. %M 39983116 %R 10.2196/50708 %U https://www.jmir.org/2025/1/e50708 %U https://doi.org/10.2196/50708 %U http://www.ncbi.nlm.nih.gov/pubmed/39983116 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 12 %N %P e59010 %T Prioritizing Trust in Podiatrists’ Preference for AI in Supportive Roles Over Diagnostic Roles in Health Care: Qualitative Interview and Focus Group Study %A Tahtali,Mohammed A %A Snijders,Chris C P %A Dirne,Corné W G M %A Le Blanc,Pascale M %+ Department of Industrial Engineering & Management, Fontys University of Applied Sciences, De Rondom 1, 5612 AP, Eindhoven, The Netherlands, 31 8850 79388, m.tahtali@fontys.nl %K AI’s role in health care %K decision-making %K diabetes and podiatrists %K trust %K AI %K artificial intelligence %K qualitative %K foot %K podiatry %K professional %K experience %K attitude %K opinion %K perception %K acceptance %K adoption %K thematic %K focus group %D 2025 %7 21.2.2025 %9 Original Paper %J JMIR Hum Factors %G English %X Background: As artificial intelligence (AI) evolves, its roles have expanded from helping out with routine tasks to making complex decisions, once the exclusive domain of human experts. This shift is pronounced in health care, where AI aids in tasks ranging from image recognition in radiology to personalized treatment plans, demonstrating the potential to, at times, surpass human accuracy and efficiency. Despite AI’s accuracy in some critical tasks, the adoption of AI in health care is a challenge, in part because of skepticism about being able to rely on AI decisions. Objective: This study aimed to identify and delve into more effective and acceptable ways of integrating AI into a broader spectrum of health care tasks. Methods: We included 2 qualitative phases to explore podiatrists’ views on AI in health care. Initially, we interviewed 9 podiatrists (7 women and 2 men) with a mean age of 41 (SD 12) years and aimed to capture their sentiments regarding the use and role of AI in their work. Subsequently, a focus group with 5 podiatrists (4 women and 1 man) with a mean age of 54 (SD 10) years delved into AI’s supportive and diagnostic roles on the basis of the interviews. All interviews were recorded, transcribed verbatim, and analyzed using Atlas.ti and QDA-Miner, using both thematic analysis for broad patterns and framework analysis for structured insights per established guidelines. Results: Our research unveiled 9 themes and 3 subthemes, clarifying podiatrists’ nuanced views on AI in health care. Key overlapping insights in the 2 phases included a preference for using AI in supportive roles, such as triage, because of its efficiency and process optimization capabilities. There is a discernible hesitancy toward leveraging AI for diagnostic purposes, driven by concerns regarding its accuracy and the essential nature of human expertise. The need for transparency and explainability in AI systems emerged as a critical factor for fostering trust in both phases. Conclusions: The findings highlight a complex view from podiatrists on AI, showing openness to its application in supportive roles while exercising caution with diagnostic use. This result is consistent with a careful introduction of AI into health care in roles, such as triage, in which there is initial trust, as opposed to roles that ask the AI for a complete diagnosis. Such strategic adoption can mitigate initial resistance, gradually building the confidence to explore AI’s capabilities in more nuanced tasks, including diagnostics, where skepticism is currently more pronounced. Adopting AI stepwise could thus enhance trust and acceptance across a broader range of health care tasks, aligning technology integration with professional comfort and patient care standards. %M 39983118 %R 10.2196/59010 %U https://humanfactors.jmir.org/2025/1/e59010 %U https://doi.org/10.2196/59010 %U http://www.ncbi.nlm.nih.gov/pubmed/39983118 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 12 %N %P e60432 %T Exploring the Ethical Challenges of Conversational AI in Mental Health Care: Scoping Review %A Rahsepar Meadi,Mehrdad %A Sillekens,Tomas %A Metselaar,Suzanne %A van Balkom,Anton %A Bernstein,Justin %A Batelaan,Neeltje %+ Department of Psychiatry, Amsterdam Public Health, Vrije Universiteit Amsterdam, Boelelaan 1117, Amsterdam, 1081 HV, The Netherlands, 31 204444444, m.rahseparmeadi@ggzingeest.nl %K chatbot %K mHealth %K mobile health %K ethics %K mental health %K conversational agent %K artificial intelligence %K psychotherapy %K scoping review %K conversational agents %K digital technology %K natural language processing %K qualitative %K psychotherapist %D 2025 %7 21.2.2025 %9 Review %J JMIR Ment Health %G English %X Background: Conversational artificial intelligence (CAI) is emerging as a promising digital technology for mental health care. CAI apps, such as psychotherapeutic chatbots, are available in app stores, but their use raises ethical concerns. Objective: We aimed to provide a comprehensive overview of ethical considerations surrounding CAI as a therapist for individuals with mental health issues. Methods: We conducted a systematic search across PubMed, Embase, APA PsycINFO, Web of Science, Scopus, the Philosopher’s Index, and ACM Digital Library databases. Our search comprised 3 elements: embodied artificial intelligence, ethics, and mental health. We defined CAI as a conversational agent that interacts with a person and uses artificial intelligence to formulate output. We included articles discussing the ethical challenges of CAI functioning in the role of a therapist for individuals with mental health issues. We added additional articles through snowball searching. We included articles in English or Dutch. All types of articles were considered except abstracts of symposia. Screening for eligibility was done by 2 independent researchers (MRM and TS or AvB). An initial charting form was created based on the expected considerations and revised and complemented during the charting process. The ethical challenges were divided into themes. When a concern occurred in more than 2 articles, we identified it as a distinct theme. Results: We included 101 articles, of which 95% (n=96) were published in 2018 or later. Most were reviews (n=22, 21.8%) followed by commentaries (n=17, 16.8%). The following 10 themes were distinguished: (1) safety and harm (discussed in 52/101, 51.5% of articles); the most common topics within this theme were suicidality and crisis management, harmful or wrong suggestions, and the risk of dependency on CAI; (2) explicability, transparency, and trust (n=26, 25.7%), including topics such as the effects of “black box” algorithms on trust; (3) responsibility and accountability (n=31, 30.7%); (4) empathy and humanness (n=29, 28.7%); (5) justice (n=41, 40.6%), including themes such as health inequalities due to differences in digital literacy; (6) anthropomorphization and deception (n=24, 23.8%); (7) autonomy (n=12, 11.9%); (8) effectiveness (n=38, 37.6%); (9) privacy and confidentiality (n=62, 61.4%); and (10) concerns for health care workers’ jobs (n=16, 15.8%). Other themes were discussed in 9.9% (n=10) of the identified articles. Conclusions: Our scoping review has comprehensively covered ethical aspects of CAI in mental health care. While certain themes remain underexplored and stakeholders’ perspectives are insufficiently represented, this study highlights critical areas for further research. These include evaluating the risks and benefits of CAI in comparison to human therapists, determining its appropriate roles in therapeutic contexts and its impact on care access, and addressing accountability. Addressing these gaps can inform normative analysis and guide the development of ethical guidelines for responsible CAI use in mental health care. %M 39983102 %R 10.2196/60432 %U https://mental.jmir.org/2025/1/e60432 %U https://doi.org/10.2196/60432 %U http://www.ncbi.nlm.nih.gov/pubmed/39983102 %0 Journal Article %@ 2563-6316 %I JMIR Publications %V 6 %N %P e65565 %T Checklist Approach to Developing and Implementing AI in Clinical Settings: Instrument Development Study %A Owoyemi,Ayomide %A Osuchukwu,Joanne %A Salwei,Megan E %A Boyd,Andrew %K artificial intelligence %K machine learning %K algorithm %K model %K analytics %K AI deployment %K human-AI interaction %K AI integration %K checklist %K clinical workflow %K clinical setting %K literature review %D 2025 %7 20.2.2025 %9 %J JMIRx Med %G English %X Background: The integration of artificial intelligence (AI) in health care settings demands a nuanced approach that considers both technical performance and sociotechnical factors. Objective: This study aimed to develop a checklist that addresses the sociotechnical aspects of AI deployment in health care and provides a structured, holistic guide for teams involved in the life cycle of AI systems. Methods: A literature synthesis identified 20 relevant studies, forming the foundation for the Clinical AI Sociotechnical Framework checklist. A modified Delphi study was then conducted with 35 global health care professionals. Participants assessed the checklist’s relevance across 4 stages: “Planning,” “Design,” “Development,” and “Proposed Implementation.” A consensus threshold of 80% was established for each item. IQRs and Cronbach α were calculated to assess agreement and reliability. Results: The initial checklist had 45 questions. Following participant feedback, the checklist was refined to 34 items, and a final round saw 100% consensus on all items (mean score >0.8, IQR 0). Based on the outcome of the Delphi study, a final checklist was outlined, with 1 more question added to make 35 questions in total. Conclusions: The Clinical AI Sociotechnical Framework checklist provides a comprehensive, structured approach to developing and implementing AI in clinical settings, addressing technical and social factors critical for adoption and success. This checklist is a practical tool that aligns AI development with real-world clinical needs, aiming to enhance patient outcomes and integrate smoothly into health care workflows. %R 10.2196/65565 %U https://xmed.jmir.org/2025/1/e65565 %U https://doi.org/10.2196/65565 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 11 %N %P e63400 %T Perceptions and Earliest Experiences of Medical Students and Faculty With ChatGPT in Medical Education: Qualitative Study %A Abouammoh,Noura %A Alhasan,Khalid %A Aljamaan,Fadi %A Raina,Rupesh %A Malki,Khalid H %A Altamimi,Ibraheem %A Muaygil,Ruaim %A Wahabi,Hayfaa %A Jamal,Amr %A Alhaboob,Ali %A Assiri,Rasha Assad %A Al-Tawfiq,Jaffar A %A Al-Eyadhy,Ayman %A Soliman,Mona %A Temsah,Mohamad-Hani %+ Pediatric Department, King Saud University Medical City, King Saud University, King Abdullah Road, Riyadh, 11424, Saudi Arabia, 966 114692002, mtemsah@ksu.edu.sa %K ChatGPT %K medical education %K Saudi Arabia %K perceptions %K knowledge %K medical students %K faculty %K chatbot %K qualitative study %K artificial intelligence %K AI %K AI-based tools %K universities %K thematic analysis %K learning %K satisfaction %D 2025 %7 20.2.2025 %9 Original Paper %J JMIR Med Educ %G English %X Background: With the rapid development of artificial intelligence technologies, there is a growing interest in the potential use of artificial intelligence–based tools like ChatGPT in medical education. However, there is limited research on the initial perceptions and experiences of faculty and students with ChatGPT, particularly in Saudi Arabia. Objective: This study aimed to explore the earliest knowledge, perceived benefits, concerns, and limitations of using ChatGPT in medical education among faculty and students at a leading Saudi Arabian university. Methods: A qualitative exploratory study was conducted in April 2023, involving focused meetings with medical faculty and students with varying levels of ChatGPT experience. A thematic analysis was used to identify key themes and subthemes emerging from the discussions. Results: Participants demonstrated good knowledge of ChatGPT and its functions. The main themes were perceptions of ChatGPT use, potential benefits, and concerns about ChatGPT in research and medical education. The perceived benefits included collecting and summarizing information and saving time and effort. However, concerns and limitations centered around the potential lack of critical thinking in the information provided, the ambiguity of references, limitations of access, trust in the output of ChatGPT, and ethical concerns. Conclusions: This study provides valuable insights into the perceptions and experiences of medical faculty and students regarding the use of newly introduced large language models like ChatGPT in medical education. While the benefits of ChatGPT were recognized, participants also expressed concerns and limitations requiring further studies for effective integration into medical education, exploring the impact of ChatGPT on learning outcomes, student and faculty satisfaction, and the development of critical thinking skills. %M 39977012 %R 10.2196/63400 %U https://mededu.jmir.org/2025/1/e63400 %U https://doi.org/10.2196/63400 %U http://www.ncbi.nlm.nih.gov/pubmed/39977012 %0 Journal Article %@ 2562-7600 %I JMIR Publications %V 8 %N %P e63335 %T Examining the Role of AI in Changing the Role of Nurses in Patient Care: Systematic Review %A Al Khatib,Inas %A Ndiaye,Malick %+ Department of Industrial Engineering, College of Engineering, American University of Sharjah, University City, Sharjah, 26666, United Arab Emirates, 971 65155555, g00091914@aus.edu %K artificial intelligence %K AI %K nursing practice %K technology %K health care %K PRISMA %D 2025 %7 19.2.2025 %9 Review %J JMIR Nursing %G English %X Background: This review investigates the relationship between artificial intelligence (AI) use and the role of nurses in patient care. AI exists in health care for clinical decision support, disease management, patient engagement, and operational improvement and will continue to grow in popularity, especially in the nursing field. Objective: We aim to examine whether AI integration into nursing practice may have led to a change in the role of nurses in patient care. Methods: To compile pertinent data on AI and nursing and their relationship, we conducted a thorough systematic review literature analysis using secondary data sources, including academic literature from the Scopus database, industry reports, and government publications. A total of 401 resources were reviewed, and 53 sources were ultimately included in the paper, comprising 50 peer-reviewed journal articles, 1 conference proceeding, and 2 reports. To categorize and find patterns in the data, we used thematic analysis to categorize the systematic literature review findings into 3 primary themes and 9 secondary themes. To demonstrate whether a role change existed or was forecasted to exist, case studies of AI applications and examples were also relied on. Results: The research shows that all health care practitioners will be impacted by the revolutionary technology known as AI. Nurses should be at the forefront of this technology and be empowered throughout the implementation process of any of its tools that may accelerate innovation, improve decision-making, automate and speed up processes, and save overall costs in nursing practice. Conclusions: This study adds to the existing body of knowledge about the applications of AI in nursing and its consequences in changing the role of nurses in patient care. To further investigate the connection between AI and the role of nurses in patient care, future studies can use quantitative techniques based on recruiting nurses who have been involved in AI tool deployment—whether from a design aspect or operational use—and gathering empirical data for that purpose. %M 39970436 %R 10.2196/63335 %U https://nursing.jmir.org/2025/1/e63335 %U https://doi.org/10.2196/63335 %U http://www.ncbi.nlm.nih.gov/pubmed/39970436 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e56306 %T Finding Consensus on Trust in AI in Health Care: Recommendations From a Panel of International Experts %A Starke,Georg %A Gille,Felix %A Termine,Alberto %A Aquino,Yves Saint James %A Chavarriaga,Ricardo %A Ferrario,Andrea %A Hastings,Janna %A Jongsma,Karin %A Kellmeyer,Philipp %A Kulynych,Bogdan %A Postan,Emily %A Racine,Elise %A Sahin,Derya %A Tomaszewska,Paulina %A Vold,Karina %A Webb,Jamie %A Facchini,Alessandro %A Ienca,Marcello %+ Institute for History and Ethics of Medicine, Technical University of Munich, Ismaninger Str. 22, Munich, 81675, Germany, 49 8941404041, georg.starke@tum.de %K expert consensus %K trust %K artificial intelligence %K clinical decision support %K assistive technologies %K public health surveillance %K framework analysis %D 2025 %7 19.2.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: The integration of artificial intelligence (AI) into health care has become a crucial element in the digital transformation of health systems worldwide. Despite the potential benefits across diverse medical domains, a significant barrier to the successful adoption of AI systems in health care applications remains the prevailing low user trust in these technologies. Crucially, this challenge is exacerbated by the lack of consensus among experts from different disciplines on the definition of trust in AI within the health care sector. Objective: We aimed to provide the first consensus-based analysis of trust in AI in health care based on an interdisciplinary panel of experts from different domains. Our findings can be used to address the problem of defining trust in AI in health care applications, fostering the discussion of concrete real-world health care scenarios in which humans interact with AI systems explicitly. Methods: We used a combination of framework analysis and a 3-step consensus process involving 18 international experts from the fields of computer science, medicine, philosophy of technology, ethics, and social sciences. Our process consisted of a synchronous phase during an expert workshop where we discussed the notion of trust in AI in health care applications, defined an initial framework of important elements of trust to guide our analysis, and agreed on 5 case studies. This was followed by a 2-step iterative, asynchronous process in which the authors further developed, discussed, and refined notions of trust with respect to these specific cases. Results: Our consensus process identified key contextual factors of trust, namely, an AI system’s environment, the actors involved, and framing factors, and analyzed causes and effects of trust in AI in health care. Our findings revealed that certain factors were applicable across all discussed cases yet also pointed to the need for a fine-grained, multidisciplinary analysis bridging human-centered and technology-centered approaches. While regulatory boundaries and technological design features are critical to successful AI implementation in health care, ultimately, communication and positive lived experiences with AI systems will be at the forefront of user trust. Our expert consensus allowed us to formulate concrete recommendations for future research on trust in AI in health care applications. Conclusions: This paper advocates for a more refined and nuanced conceptual understanding of trust in the context of AI in health care. By synthesizing insights into commonalities and differences among specific case studies, this paper establishes a foundational basis for future debates and discussions on trusting AI in health care. %M 39969962 %R 10.2196/56306 %U https://www.jmir.org/2025/1/e56306 %U https://doi.org/10.2196/56306 %U http://www.ncbi.nlm.nih.gov/pubmed/39969962 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e72007 %T Authors’ Reply: Enhancing the Clinical Relevance of Al Research for Medication Decision-Making %A Vordenberg,Sarah E %A Nichols,Julianna %A Marshall,Vincent D %A Weir,Kristie Rebecca %A Dorsch,Michael P %+ , College of Pharmacy, University of Michigan, 428 Church St, Ann Arbor, MI, 48109, United States, 1 734 763 6691, skelling@med.umich.edu %K older adults %K artificial intelligence %K vignette %K pharmacology %K medication %K decision-making %K aging %K attitude %K perception %K perspective %K electronic heath record %D 2025 %7 18.2.2025 %9 Letter to the Editor %J J Med Internet Res %G English %X %M 39964740 %R 10.2196/72007 %U https://www.jmir.org/2025/1/e72007 %U https://doi.org/10.2196/72007 %U http://www.ncbi.nlm.nih.gov/pubmed/39964740 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e70657 %T Enhancing the Clinical Relevance of Al Research for Medication Decision-Making %A Wang,Qi %A Chen,Mingxian %+ Department of Gastroenterology, Tongde Hospital of Zhejiang Province, 234 Gucui Street, Xihu Region, Hangzhou, 310012, China, 86 151 576 82797, chenmingxian2005@126.com %K older adults %K artificial intelligence %K medication %K decision-making %K data security %K patient trust %D 2025 %7 18.2.2025 %9 Letter to the Editor %J J Med Internet Res %G English %X %M 39964744 %R 10.2196/70657 %U https://www.jmir.org/2025/1/e70657 %U https://doi.org/10.2196/70657 %U http://www.ncbi.nlm.nih.gov/pubmed/39964744 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e62851 %T Artificial Intelligence in Lymphoma Histopathology: Systematic Review %A Fu,Yao %A Huang,Zongyao %A Deng,Xudong %A Xu,Linna %A Liu,Yang %A Zhang,Mingxing %A Liu,Jinyi %A Huang,Bin %+ Department of Pathology, Sichuan Clinical Research Center for Cancer, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, University of Electronic Science and Technology of China, South Renmin Road, Chengdu, 610041, China, 86 18236170185, 18236170185@163.com %K lymphoma %K artificial intelligence %K bias %K histopathology %K tumor %K hematological %K lymphatic disease %K public health %K pathologists %K pathology %K immunohistochemistry %K diagnosis %K prognosis %D 2025 %7 14.2.2025 %9 Review %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) shows considerable promise in the areas of lymphoma diagnosis, prognosis, and gene prediction. However, a comprehensive assessment of potential biases and the clinical utility of AI models is still needed. Objective: Our goal was to evaluate the biases of published studies using AI models for lymphoma histopathology and assess the clinical utility of comprehensive AI models for diagnosis or prognosis. Methods: This study adhered to the Systematic Review Reporting Standards. A comprehensive literature search was conducted across PubMed, Cochrane Library, and Web of Science from their inception until August 30, 2024. The search criteria included the use of AI for prognosis involving human lymphoma tissue pathology images, diagnosis, gene mutation prediction, etc. The risk of bias was evaluated using the Prediction Model Risk of Bias Assessment Tool (PROBAST). Information for each AI model was systematically tabulated, and summary statistics were reported. The study is registered with PROSPERO (CRD42024537394) and follows the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 reporting guidelines. Results: The search identified 3565 records, with 41 articles ultimately meeting the inclusion criteria. A total of 41 AI models were included in the analysis, comprising 17 diagnostic models, 10 prognostic models, 2 models for detecting ectopic gene expression, and 12 additional models related to diagnosis. All studies exhibited a high or unclear risk of bias, primarily due to limited analysis and incomplete reporting of participant recruitment. Most high-risk models (10/41) predominantly assigned high-risk classifications to participants. Almost all the articles presented an unclear risk of bias in at least one domain, with the most frequent being participant selection (16/41) and statistical analysis (37/41). The primary reasons for this were insufficient analysis of participant recruitment and a lack of interpretability in outcome analyses. In the diagnostic models, the most frequently studied lymphoma subtypes were diffuse large B-cell lymphoma, follicular lymphoma, chronic lymphocytic leukemia, and mantle cell lymphoma, while in the prognostic models, the most common subtypes were diffuse large B-cell lymphoma, follicular lymphoma, chronic lymphocytic leukemia, and Hodgkin lymphoma. In the internal validation results of all models, the area under the receiver operating characteristic curve (AUC) ranged from 0.75 to 0.99 and accuracy ranged from 68.3% to 100%. In models with external validation results, the AUC ranged from 0.93 to 0.99. Conclusions: From a methodological perspective, all models exhibited biases. The enhancement of the accuracy of AI models and the acceleration of their clinical translation hinge on several critical aspects. These include the comprehensive reporting of data sources, the diversity of datasets, the study design, the transparency and interpretability of AI models, the use of cross-validation and external validation, and adherence to regulatory guidance and standardized processes in the field of medical AI. %M 39951716 %R 10.2196/62851 %U https://www.jmir.org/2025/1/e62851 %U https://doi.org/10.2196/62851 %U http://www.ncbi.nlm.nih.gov/pubmed/39951716 %0 Journal Article %@ 2291-9279 %I JMIR Publications %V 13 %N %P e68272 %T Enhancing Immersion in Virtual Reality–Based Advanced Life Support Training: Randomized Controlled Trial %A Kitapcioglu,Dilek %A Aksoy,Mehmet Emin %A Ozkan,Arun Ekin %A Usseli,Tuba %A Cabuk Colak,Dilan %A Torun,Tugrul %+ Center of Advanced Simulation and Education, Acibadem Mehmet Ali Aydinlar University, Kayisdagi cad No 32 Atasehir, Istanbul, 34752, Turkey, 90 05052685158, emin.aksoy@acibadem.edu.tr %K artificial intelligence %K voice recognition %K serious gaming %K immersion %K virtual reality %D 2025 %7 14.2.2025 %9 Original Paper %J JMIR Serious Games %G English %X Background: Serious game–based training modules are pivotal for simulation-based health care training. With advancements in artificial intelligence (AI) and natural language processing, voice command interfaces offer an intuitive alternative to traditional virtual reality (VR) controllers in VR applications. Objective: This study aims to compare AI-supported voice command interfaces and traditional VR controllers in terms of user performance, exam scores, presence, and confidence in advanced cardiac life support (ACLS) training. Methods: A total of 62 volunteer students from Acibadem Mehmet Ali Aydinlar University Vocational School for Anesthesiology, aged 20-22 years, participated in the study. All the participants completed a pretest consisting of 10 multiple-choice questions about ACLS. Following the pretest, participants were randomly divided into 2 groups: the voice command group (n=31) and the VR controller group (n=31). The voice command group members completed the VR-based ACLS serious game in training mode twice, using an AI-supported voice command as the game interface. The VR controller group members also completed the VR-based ACLS serious game in training mode twice, but they used VR controllers as the game interface. The participants completed a survey to assess their level of presence and confidence during gameplay. Following the survey, participants completed the exam module of the VR-based serious gaming module. At the final stage of the study, participants completed a posttest, which had the same content as the pretest. VR-based exam scores of the voice command and VR controller groups were compared using a 2-tailed, independent-samples t test, and linear regression analysis was conducted to examine the effect of presence and confidence rating. Results: Both groups showed an improvement in performance from pretest to posttest, with no significant difference in the magnitude of improvement between the 2 groups (P=.83). When comparing presence ratings, there was no significant difference between the voice command group (mean 5.18, SD 0.83) and VR controller group (mean 5.42, SD 0.75; P=.25). However, when comparing VR-based exam scores, the VR controller group (mean 80.47, SD 13.12) significantly outperformed the voice command group (mean 66.70, SD 21.65; P=.005), despite both groups having similar time allocations for the exam (voice command group: mean 18.59, SD 5.28 minutes and VR controller group: mean 17.3, SD 4.83 minutes). Confidence levels were similar between the groups (voice command group: mean 3.79, SD 0.77 and VR controller group: mean 3.60, SD 0.72), but the voice command group displayed a significant overconfidence bias (voice command group: mean 0.09, SD 0.24 and VR controller group: mean –0.09, SD 0.18; P=.002). Conclusions: VR-based ACLS training demonstrated effectiveness; however, the use of voice commands did not result in improved performance. Further research should explore ways to optimize AI’s role in education through VR. Trial Registration: ClinicalTrials.gov NCT06458452; https://clinicaltrials.gov/ct2/show/NCT06458452 %M 39951703 %R 10.2196/68272 %U https://games.jmir.org/2025/1/e68272 %U https://doi.org/10.2196/68272 %U http://www.ncbi.nlm.nih.gov/pubmed/39951703 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 11 %N %P e65699 %T The Promise and Perils of Artificial Intelligence in Advancing Participatory Science and Health Equity in Public Health %A King,Abby C %A Doueiri,Zakaria N %A Kaulberg,Ankita %A Goldman Rosas,Lisa %K digital health %K artificial intelligence %K community-based participatory research %K citizen science %K health equity %K societal trends %K public health %K viewpoint %K policy makers %K public participation %K information technology %K micro-level data %K macro-level data %K LLM %K natural language processing %K machine learning %K language model %K Our Voice %D 2025 %7 14.2.2025 %9 %J JMIR Public Health Surveill %G English %X Current societal trends reflect an increased mistrust in science and a lowered civic engagement that threaten to impair research that is foundational for ensuring public health and advancing health equity. One effective countermeasure to these trends lies in community-facing citizen science applications to increase public participation in scientific research, making this field an important target for artificial intelligence (AI) exploration. We highlight potentially promising citizen science AI applications that extend beyond individual use to the community level, including conversational large language models, text-to-image generative AI tools, descriptive analytics for analyzing integrated macro- and micro-level data, and predictive analytics. The novel adaptations of AI technologies for community-engaged participatory research also bring an array of potential risks. We highlight possible negative externalities and mitigations for some of the potential ethical and societal challenges in this field. %R 10.2196/65699 %U https://publichealth.jmir.org/2025/1/e65699 %U https://doi.org/10.2196/65699 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 12 %N %P e68135 %T Leveraging Large Language Models and Agent-Based Systems for Scientific Data Analysis: Validation Study %A Peasley,Dale %A Kuplicki,Rayus %A Sen,Sandip %A Paulus,Martin %K LLM %K agent-based systems %K scientific data analysis %K data contextualization %K AI-driven research tools %K large language model %K scientific data %K analysis %K contextualization %K AI %K artificial intelligence %K research tool %D 2025 %7 13.2.2025 %9 %J JMIR Ment Health %G English %X Background: Large language models have shown promise in transforming how complex scientific data are analyzed and communicated, yet their application to scientific domains remains challenged by issues of factual accuracy and domain-specific precision. The Laureate Institute for Brain Research–Tulsa University (LIBR-TU) Research Agent (LITURAt) leverages a sophisticated agent-based architecture to mitigate these limitations, using external data retrieval and analysis tools to ensure reliable, context-aware outputs that make scientific information accessible to both experts and nonexperts. Objective: The objective of this study was to develop and evaluate LITURAt to enable efficient analysis and contextualization of complex scientific datasets for diverse user expertise levels. Methods: An agent-based system based on large language models was designed to analyze and contextualize complex scientific datasets using a “plan-and-solve” framework. The system dynamically retrieves local data and relevant PubMed literature, performs statistical analyses, and generates comprehensive, context-aware summaries to answer user queries with high accuracy and consistency. Results: Our experiments demonstrated that LITURAt achieved an internal consistency rate of 94.8% and an external consistency rate of 91.9% across repeated and rephrased queries. Additionally, GPT-4 evaluations rated 80.3% (171/213) of the system’s answers as accurate and comprehensive, with 23.5% (50/213) receiving the highest rating of 5 for completeness and precision. Conclusions: These findings highlight the potential of LITURAt to significantly enhance the accessibility and accuracy of scientific data analysis, achieving high consistency and strong performance in complex query resolution. Despite existing limitations, such as model stability for highly variable queries, LITURAt demonstrates promise as a robust tool for democratizing data-driven insights across diverse scientific domains. %R 10.2196/68135 %U https://mental.jmir.org/2025/1/e68135 %U https://doi.org/10.2196/68135 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 12 %N %P e65923 %T An Explainable AI Application (AF’fective) to Support Monitoring of Patients With Atrial Fibrillation After Catheter Ablation: Qualitative Focus Group, Design Session, and Interview Study %A She,Wan Jou %A Siriaraya,Panote %A Iwakoshi,Hibiki %A Kuwahara,Noriaki %A Senoo,Keitaro %+ Department of Cardiac Arrhythmia Research and Innovation, Graduate School of Medical Science, Kyoto Prefectural University of Medicine, 465 Kajii-cho Hirokoji, Kawaramachi-dori, Kamigyo-ku Kyoto, Kyoto, 602-8566, Japan, 81 75 251 5511, k-senoo@koto.kpu-m.ac.jp %K atrial fibrillation %K explainable artificial intelligence %K explainable AI %K user-centered design %K prevention %K postablation monitoring %D 2025 %7 13.2.2025 %9 Original Paper %J JMIR Hum Factors %G English %X Background: The opaque nature of artificial intelligence (AI) algorithms has led to distrust in medical contexts, particularly in the treatment and monitoring of atrial fibrillation. Although previous studies in explainable AI have demonstrated potential to address this issue, they often focus solely on electrocardiography graphs and lack real-world field insights. Objective: We addressed this gap by incorporating standardized clinical interpretation of electrocardiography graphs into the system and collaborating with cardiologists to co-design and evaluate this approach using real-world patient cases and data. Methods: We conducted a 3-stage iterative design process with 23 cardiologists to co-design, evaluate, and pilot an explainable AI application. In the first stage, we identified 4 physician personas and 7 explainability strategies, which were reviewed in the second stage. A total of 4 strategies were deemed highly effective and feasible for pilot deployment. On the basis of these strategies, we developed a progressive web application and tested it with cardiologists in the third stage. Results: The final progressive web application prototype received above-average user experience evaluations and effectively motivated physicians to adopt it owing to its ease of use, reliable information, and explainable functionality. In addition, we gathered in-depth field insights from cardiologists who used the system in clinical contexts. Conclusions: Our study identified effective explainability strategies, emphasized the importance of curating actionable features and setting accurate expectations, and suggested that many of these insights could apply to other disease care contexts, paving the way for future real-world clinical evaluations. %M 39946707 %R 10.2196/65923 %U https://humanfactors.jmir.org/2025/1/e65923 %U https://doi.org/10.2196/65923 %U http://www.ncbi.nlm.nih.gov/pubmed/39946707 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e64290 %T Laypeople’s Use of and Attitudes Toward Large Language Models and Search Engines for Health Queries: Survey Study %A Mendel,Tamir %A Singh,Nina %A Mann,Devin M %A Wiesenfeld,Batia %A Nov,Oded %+ Department of Technology Management and Innovation, Tandon School of Engineering, New York University, 2 Metrotech Center, Brooklyn, New York, NY, 11201, United States, 1 8287348968, tamir.mendel@nyu.edu %K large language model %K artificial intelligence %K LLMs %K search engine %K Google %K internet %K online health information %K United States %K survey %K mobile phone %D 2025 %7 13.2.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Laypeople have easy access to health information through large language models (LLMs), such as ChatGPT, and search engines, such as Google. Search engines transformed health information access, and LLMs offer a new avenue for answering laypeople’s questions. Objective: We aimed to compare the frequency of use and attitudes toward LLMs and search engines as well as their comparative relevance, usefulness, ease of use, and trustworthiness in responding to health queries. Methods: We conducted a screening survey to compare the demographics of LLM users and nonusers seeking health information, analyzing results with logistic regression. LLM users from the screening survey were invited to a follow-up survey to report the types of health information they sought. We compared the frequency of use of LLMs and search engines using ANOVA and Tukey post hoc tests. Lastly, paired-sample Wilcoxon tests compared LLMs and search engines on perceived usefulness, ease of use, trustworthiness, feelings, bias, and anthropomorphism. Results: In total, 2002 US participants recruited on Prolific participated in the screening survey about the use of LLMs and search engines. Of them, 52% (n=1045) of the participants were female, with a mean age of 39 (SD 13) years. Participants were 9.7% (n=194) Asian, 12.1% (n=242) Black, 73.3% (n=1467) White, 1.1% (n=22) Hispanic, and 3.8% (n=77) were of other races and ethnicities. Further, 1913 (95.6%) used search engines to look up health queries versus 642 (32.6%) for LLMs. Men had higher odds (odds ratio [OR] 1.63, 95% CI 1.34-1.99; P<.001) of using LLMs for health questions than women. Black (OR 1.90, 95% CI 1.42-2.54; P<.001) and Asian (OR 1.66, 95% CI 1.19-2.30; P<.01) individuals had higher odds than White individuals. Those with excellent perceived health (OR 1.46, 95% CI 1.1-1.93; P=.01) were more likely to use LLMs than those with good health. Higher technical proficiency increased the likelihood of LLM use (OR 1.26, 95% CI 1.14-1.39; P<.001). In a follow-up survey of 281 LLM users for health, most participants used search engines first (n=174, 62%) to answer health questions, but the second most common first source consulted was LLMs (n=39, 14%). LLMs were perceived as less useful (P<.01) and less relevant (P=.07), but elicited fewer negative feelings (P<.001), appeared more human (LLM: n=160, vs search: n=32), and were seen as less biased (P<.001). Trust (P=.56) and ease of use (P=.27) showed no differences. Conclusions: Search engines are the primary source of health information; yet, positive perceptions of LLMs suggest growing use. Future work could explore whether LLM trust and usefulness are enhanced by supplementing answers with external references and limiting persuasive language to curb overreliance. Collaboration with health organizations can help improve the quality of LLMs’ health output. %M 39946180 %R 10.2196/64290 %U https://www.jmir.org/2025/1/e64290 %U https://doi.org/10.2196/64290 %U http://www.ncbi.nlm.nih.gov/pubmed/39946180 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e48328 %T Large Language Models–Supported Thrombectomy Decision-Making in Acute Ischemic Stroke Based on Radiology Reports: Feasibility Qualitative Study %A Kottlors,Jonathan %A Hahnfeldt,Robert %A Görtz,Lukas %A Iuga,Andra-Iza %A Fervers,Philipp %A Bremm,Johannes %A Zopfs,David %A Laukamp,Kai R %A Onur,Oezguer A %A Lennartz,Simon %A Schönfeld,Michael %A Maintz,David %A Kabbasch,Christoph %A Persigehl,Thorsten %A Schlamann,Marc %+ Institute for Diagnostic and Interventional Radiology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Kerpener Straße 62, Cologne, 50937, Germany, 49 221 47896063, jonathan.kottlors@uk-koeln.de %K artificial intelligence %K radiology %K report %K large language model %K text-based augmented supporting system %K mechanical thrombectomy %K GPT %K stroke %K decision-making %K thrombectomy %K imaging %K model %K machine learning %K ischemia %D 2025 %7 13.2.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: The latest advancement of artificial intelligence (AI) is generative pretrained transformer large language models (LLMs). They have been trained on massive amounts of text, enabling humanlike and semantical responses to text-based inputs and requests. Foreshadowing numerous possible applications in various fields, the potential of such tools for medical data integration and clinical decision-making is not yet clear. Objective: In this study, we investigate the potential of LLMs in report-based medical decision-making on the example of acute ischemic stroke (AIS), where clinical and image-based information may indicate an immediate need for mechanical thrombectomy (MT). The purpose was to elucidate the feasibility of integrating radiology report data and other clinical information in the context of therapy decision-making using LLMs. Methods: A hundred patients with AIS were retrospectively included, for which 50% (50/100) was indicated for MT, whereas the other 50% (50/100) was not. The LLM was provided with the computed tomography report, information on neurological symptoms and onset, and patients’ age. The performance of the AI decision-making model was compared with an expert consensus regarding the binary determination of MT indication, for which sensitivity, specificity, and accuracy were calculated. Results: The AI model had an overall accuracy of 88%, with a specificity of 96% and a sensitivity of 80%. The area under the curve for the report-based MT decision was 0.92. Conclusions: The LLM achieved promising accuracy in determining the eligibility of patients with AIS for MT based on radiology reports and clinical information. Our results underscore the potential of LLMs for radiological and medical data integration. This investigation should serve as a stimulus for further clinical applications of LLMs, in which this AI should be used as an augmented supporting system for human decision-making. %M 39946168 %R 10.2196/48328 %U https://www.jmir.org/2025/1/e48328 %U https://doi.org/10.2196/48328 %U http://www.ncbi.nlm.nih.gov/pubmed/39946168 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e64318 %T Performance Assessment of Large Language Models in Medical Consultation: Comparative Study %A Seo,Sujeong %A Kim,Kyuli %A Yang,Heyoung %+ Future Technology Analysis Center, Korea Institute of Science and Technology Information, Hoegi-ro 66, Dongdaemun-gu, Seoul, 92456, Republic of Korea, 82 10 9265 5661, hyyang@kisti.re.kr %K artificial intelligence %K biomedical %K large language model %K depression %K similarity measurement %K text validity %D 2025 %7 12.2.2025 %9 Original Paper %J JMIR Med Inform %G English %X Background: The recent introduction of generative artificial intelligence (AI) as an interactive consultant has sparked interest in evaluating its applicability in medical discussions and consultations, particularly within the domain of depression. Objective: This study evaluates the capability of large language models (LLMs) in AI to generate responses to depression-related queries. Methods: Using the PubMedQA and QuoraQA data sets, we compared various LLMs, including BioGPT, PMC-LLaMA, GPT-3.5, and Llama2, and measured the similarity between the generated and original answers. Results: The latest general LLMs, GPT-3.5 and Llama2, exhibited superior performance, particularly in generating responses to medical inquiries from the PubMedQA data set. Conclusions: Considering the rapid advancements in LLM development in recent years, it is hypothesized that version upgrades of general LLMs offer greater potential for enhancing their ability to generate “knowledge text” in the biomedical domain compared with fine-tuning for the biomedical field. These findings are expected to contribute significantly to the evolution of AI-based medical counseling systems. %M 39763114 %R 10.2196/64318 %U https://medinform.jmir.org/2025/1/e64318 %U https://doi.org/10.2196/64318 %U http://www.ncbi.nlm.nih.gov/pubmed/39763114 %0 Journal Article %@ 2561-1011 %I JMIR Publications %V 9 %N %P e59380 %T Predicting Atrial Fibrillation Relapse Using Bayesian Networks: Explainable AI Approach %A Alves,João Miguel %A Matos,Daniel %A Martins,Tiago %A Cavaco,Diogo %A Carmo,Pedro %A Galvão,Pedro %A Costa,Francisco Moscoso %A Morgado,Francisco %A Ferreira,António Miguel %A Freitas,Pedro %A Dias,Cláudia Camila %A Rodrigues,Pedro Pereira %A Adragão,Pedro %K artificial intelligence %K atrial fibrillation %K Bayesian networks %K clinical decision-making %K machine learning %K prognostic models %D 2025 %7 11.2.2025 %9 %J JMIR Cardio %G English %X Background: Atrial fibrillation (AF) is a prevalent arrhythmia associated with significant morbidity and mortality. Despite advancements in ablation techniques, predicting recurrence of AF remains a challenge, necessitating reliable models to identify patients at risk of relapse. Traditional scoring systems often lack applicability in diverse clinical settings and may not incorporate the latest evidence-based factors influencing AF outcomes. This study aims to develop an explainable artificial intelligence model using Bayesian networks to predict AF relapse postablation, leveraging on easily obtainable clinical variables. Objective: This study aims to investigate the effectiveness of Bayesian networks as a predictive tool for AF relapse following a percutaneous pulmonary vein isolation (PVI) procedure. The objectives include evaluating the model’s performance using various clinical predictors, assessing its adaptability to incorporate new risk factors, and determining its potential to enhance clinical decision-making in the management of AF. Methods: This study analyzed data from 480 patients with symptomatic drug-refractory AF who underwent percutaneous PVI. To predict AF relapse following the procedure, an explainable artificial intelligence model based on Bayesian networks was developed. The model used a variable number of clinical predictors, including age, sex, smoking status, preablation AF type, left atrial volume, epicardial fat, obstructive sleep apnea, and BMI. The predictive performance of the model was evaluated using the area under the receiver operating characteristic curve (AUC-ROC) metrics across different configurations of predictors (5, 6, and 7 variables). Validation was conducted through four distinct sampling techniques to ensure robustness and reliability of the predictions. Results: The Bayesian network model demonstrated promising predictive performance for AF relapse. Using 5 predictors (age, sex, smoking, preablation AF type, and obstructive sleep apnea), the model achieved an AUC-ROC of 0.661 (95% CI 0.603‐0.718). Incorporating additional predictors improved performance, with a 6-predictor model (adding BMI) achieving an AUC-ROC of 0.703 (95% CI 0.652‐0.753) and a 7-predictor model (adding left atrial volume and epicardial fat) achieving an AUC-ROC of 0.752 (95% CI 0.701‐0.800). These results indicate that the model can effectively estimate the risk of AF relapse using readily available clinical variables. Notably, the model maintained acceptable diagnostic accuracy even in scenarios where some predictive features were missing, highlighting its adaptability and potential use in real-world clinical settings. Conclusions: The developed Bayesian network model provides a reliable and interpretable tool for predicting AF relapse in patients undergoing percutaneous PVI. By using easily accessible clinical variables, presenting acceptable diagnostic accuracy, and showing adaptability to incorporate new medical knowledge over time, the model demonstrates a flexibility and robustness that makes it suitable for real-world clinical scenarios. %R 10.2196/59380 %U https://cardio.jmir.org/2025/1/e59380 %U https://doi.org/10.2196/59380 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 12 %N %P e60273 %T The Effects of Presenting AI Uncertainty Information on Pharmacists’ Trust in Automated Pill Recognition Technology: Exploratory Mixed Subjects Study %A Kim,Jin Yong %A Marshall,Vincent D %A Rowell,Brigid %A Chen,Qiyuan %A Zheng,Yifan %A Lee,John D %A Kontar,Raed Al %A Lester,Corey %A Yang,Xi Jessie %+ , Industrial and Operations Engineering, University of Michigan, 1640 IOE, 1205 Beal Avenue, Ann Arbor, MI, 48105, United States, 1 7347630541, xijyang@umich.edu %K artificial intelligence %K human-computer interaction %K uncertainty communication %K visualization %K medication errors %K safety %K artificial intelligence aid %K pharmacists %K pill verification %K automation %D 2025 %7 11.2.2025 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Dispensing errors significantly contribute to adverse drug events, resulting in substantial health care costs and patient harm. Automated pill verification technologies have been developed to aid pharmacists with medication dispensing. However, pharmacists’ trust in such automated technologies remains unexplored. Objective: This study aims to investigate pharmacists’ trust in automated pill verification technology designed to support medication dispensing. Methods: Thirty licensed pharmacists in the United States performed a web-based simulated pill verification task to determine whether an image of a filled medication bottle matched a known reference image. Participants completed a block of 100 verification trials without any help, and another block of 100 trials with the help of an imperfect artificial intelligence (AI) aid recommending acceptance or rejection of a filled medication bottle. The experiment used a mixed subjects design. The between-subjects factor was the AI aid type, with or without an AI uncertainty plot. The within-subjects factor was the four potential verification outcomes: (1) the AI rejects the incorrect drug, (2) the AI rejects the correct drug, (3) the AI approves the incorrect drug, and (4) the AI approves the correct drug. Participants’ trust in the AI system was measured. Mixed model (generalized linear models) tests were conducted with 2-tailed t tests to compare the means between the 2 AI aid types for each verification outcome. Results: Participants had an average trust propensity score of 72 (SD 18.08) out of 100, indicating a positive attitude toward trusting automated technologies. The introduction of an uncertainty plot to the AI aid significantly enhanced pharmacists’ end trust (t28=–1.854; P=.04). Trust dynamics were influenced by AI aid type and verification outcome. Specifically, pharmacists using the AI aid with the uncertainty plot had a significantly larger trust increment when the AI approved the correct drug (t78.98=3.93; P<.001) and a significantly larger trust decrement when the AI approved the incorrect drug (t2939.72=–4.78; P<.001). Intriguingly, the absence of the uncertainty plot led to an increase in trust when the AI correctly rejected an incorrect drug, whereas the presence of the plot resulted in a decrease in trust under the same circumstances (t509.77=–3.96; P<.001). A pronounced “negativity bias” was observed, where the degree of trust reduction when the AI made an error exceeded the trust gain when the AI made a correct decision (z=–11.30; P<.001). Conclusions: To the best of our knowledge, this study is the first attempt to examine pharmacists’ trust in automated pill verification technology. Our findings reveal that pharmacists have a favorable disposition toward trusting automation. Moreover, providing uncertainty information about the AI’s recommendation significantly boosts pharmacists’ trust in AI aid, highlighting the importance of developing transparent AI systems within health care. %M 39932773 %R 10.2196/60273 %U https://humanfactors.jmir.org/2025/1/e60273 %U https://doi.org/10.2196/60273 %U http://www.ncbi.nlm.nih.gov/pubmed/39932773 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e63824 %T Understanding Citizens’ Response to Social Activities on Twitter in US Metropolises During the COVID-19 Recovery Phase Using a Fine-Tuned Large Language Model: Application of AI %A Saito,Ryuichi %A Tsugawa,Sho %+ , Institute of Systems and Information Engineering, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, 305-8577, Japan, 81 08055751714, saito.ryuichi.tkb_gw@u.tsukuba.ac.jp %K COVID-19 %K restriction %K United States %K X %K Twitter %K sentiment analysis %K large language model %K LLM %K GPT-3.5 %K fine-tuning %D 2025 %7 11.2.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: The COVID-19 pandemic continues to hold an important place in the collective memory as of 2024. As of March 2024, >676 million cases, 6 million deaths, and 13 billion vaccine doses have been reported. It is crucial to evaluate sociopsychological impacts as well as public health indicators such as these to understand the effects of the COVID-19 pandemic. Objective: This study aimed to explore the sentiments of residents of major US cities toward restrictions on social activities in 2022 during the transitional phase of the COVID-19 pandemic, from the peak of the pandemic to its gradual decline. By illuminating people’s susceptibility to COVID-19, we provide insights into the general sentiment trends during the recovery phase of the pandemic. Methods: To analyze these trends, we collected posts (N=119,437) on the social media platform Twitter (now X) created by people living in New York City, Los Angeles, and Chicago from December 2021 to December 2022, which were impacted by the COVID-19 pandemic in similar ways. A total of 47,111 unique users authored these posts. In addition, for privacy considerations, any identifiable information, such as author IDs and usernames, was excluded, retaining only the text for analysis. Then, we developed a sentiment estimation model by fine-tuning a large language model on the collected data and used it to analyze how citizens’ sentiments evolved throughout the pandemic. Results: In the evaluation of models, GPT-3.5 Turbo with fine-tuning outperformed GPT-3.5 Turbo without fine-tuning and Robustly Optimized Bidirectional Encoder Representations from Transformers Pretraining Approach (RoBERTa)–large with fine-tuning, demonstrating significant accuracy (0.80), recall (0.79), precision (0.79), and F1-score (0.79). The findings using GPT-3.5 Turbo with fine-tuning reveal a significant relationship between sentiment levels and actual cases in all 3 cities. Specifically, the correlation coefficient for New York City is 0.89 (95% CI 0.81-0.93), for Los Angeles is 0.39 (95% CI 0.14-0.60), and for Chicago is 0.65 (95% CI 0.47-0.78). Furthermore, feature words analysis showed that COVID-19–related keywords were replaced with non–COVID-19-related keywords in New York City and Los Angeles from January 2022 onward and Chicago from March 2022 onward. Conclusions: The results show a gradual decline in sentiment and interest in restrictions across all 3 cities as the pandemic approached its conclusion. These results are also ensured by a sentiment estimation model fine-tuned on actual Twitter posts. This study represents the first attempt from a macro perspective to depict sentiment using a classification model created with actual data from the period when COVID-19 was prevalent. This approach can be applied to the spread of other infectious diseases by adjusting search keywords for observational data. %M 39932775 %R 10.2196/63824 %U https://www.jmir.org/2025/1/e63824 %U https://doi.org/10.2196/63824 %U http://www.ncbi.nlm.nih.gov/pubmed/39932775 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 9 %N %P e54156 %T Understanding Providers’ Attitude Toward AI in India’s Informal Health Care Sector: Survey Study %A Kumar,Sumeet %A Rayal,Snehil %A Bommaraju,Raghuram %A Varasala,Navya Pratyusha %A Papineni,Sirisha %A Deo,Sarang %K artificial intelligence %K tuberculosis %K health care providers %K cross-sectional studies %K trust %K x-rays %K India %D 2025 %7 10.2.2025 %9 %J JMIR Form Res %G English %X Background: Tuberculosis (TB) is a major global health concern, causing 1.5 million deaths in 2020. Diagnostic tests for TB are often inaccurate, expensive, and inaccessible, making chest x-rays augmented with artificial intelligence (AI) a promising solution. However, whether providers are willing to adopt AI is not apparent. Objective: The study seeks to understand the attitude of Ayurveda, Yoga and Naturopathy, Unani, Siddha, and Homoeopathy (AYUSH) and informal health care providers, who we jointly call AIPs, toward adopting AI for TB diagnosis. We chose to study these providers as they are the first point of contact for a majority of TB patients in India. Methods: We conducted a cross-sectional survey of 406 AIPs across the states of Jharkhand (162 participants) and Gujarat (244 participants) in India. We designed the survey questionnaire to assess the AIPs’ confidence in treating presumptive TB patients, their trust in local radiologists’ reading of the chest x-ray images, their beliefs regarding the diagnostic capabilities of AI, and their willingness to adopt AI for TB diagnosis. Results: We found that 93.7% (270/288) of AIPs believed that AI could improve the accuracy of TB diagnosis, and for those who believed in AI, 71.9% (194/270) were willing to try AI. Among all AIPs, 69.4% (200/288) were willing to try AI. However, we found significant differences in AIPs’ willingness to try AI across the 2 states. Specifically, in Gujarat, a state with better and more accessible health care infrastructure, 73.4% (155/211) were willing to try AI, and in Jharkhand, 58.4% (45/77) were willing to try AI. Moreover, AIPs in Gujarat who showed higher trust in the local radiologists were less likely to try AI (odds ratio [OR] 0.15, 95% CI 0.03‐0.69; P=.02). In contrast, in Jharkhand, those who showed higher trust in the local radiologists were more likely to try AI (OR 2.11, 95% CI 0.9‐4.93; P=.09). Conclusions: While most AIPs believed in the potential benefits of AI-based TB diagnoses, many did not intend to try AI, indicating that the expected benefits of AI measured in terms of technological superiority may not directly translate to impact on the ground. Improving beliefs among AIPs with poor access to radiology services or those who are less confident of diagnosing TB is likely to result in a greater impact of AI on the ground. Additionally, tailored interventions addressing regional and infrastructural differences may facilitate AI adoption in India’s informal health care sector. %R 10.2196/54156 %U https://formative.jmir.org/2025/1/e54156 %U https://doi.org/10.2196/54156 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 11 %N %P e66269 %T Interpretable Machine Learning to Predict the Malignancy Risk of Follicular Thyroid Neoplasms in Extremely Unbalanced Data: Retrospective Cohort Study and Literature Review %A Shan,Rui %A Li,Xin %A Chen,Jing %A Chen,Zheng %A Cheng,Yuan-Jia %A Han,Bo %A Hu,Run-Ze %A Huang,Jiu-Ping %A Kong,Gui-Lan %A Liu,Hui %A Mei,Fang %A Song,Shi-Bing %A Sun,Bang-Kai %A Tian,Hui %A Wang,Yang %A Xiao,Wu-Cai %A Yao,Xiang-Yun %A Ye,Jing-Ming %A Yu,Bo %A Yuan,Chun-Hui %A Zhang,Fan %A Liu,Zheng %K follicular thyroid neoplasm %K machine learning %K prediction model %K malignancy %K unbalanced data %K literature review %D 2025 %7 10.2.2025 %9 %J JMIR Cancer %G English %X Background: Diagnosing and managing follicular thyroid neoplasms (FTNs) remains a significant challenge, as the malignancy risk cannot be determined until after diagnostic surgery. Objective: We aimed to use interpretable machine learning to predict the malignancy risk of FTNs preoperatively in a real-world setting. Methods: We conducted a retrospective cohort study at the Peking University Third Hospital in Beijing, China. Patients with postoperative pathological diagnoses of follicular thyroid adenoma (FTA) or follicular thyroid carcinoma (FTC) were included, excluding those without preoperative thyroid ultrasonography. We used 22 predictors involving demographic characteristics, thyroid sonography, and hormones to train 5 machine learning models: logistic regression, least absolute shrinkage and selection operator regression, random forest, extreme gradient boosting, and support vector machine. The optimal model was selected based on discrimination, calibration, interpretability, and parsimony. To address the highly imbalanced data (FTA:FTC ratio>5:1), model discrimination was assessed using both the area under the receiver operating characteristic curve and the area under the precision-recall curve (AUPRC). To interpret the model, we used Shapley Additive Explanations values and partial dependence and individual conditional expectation plots. Additionally, a systematic review was performed to synthesize existing evidence and validate the discrimination ability of the previously developed Thyroid Imaging Reporting and Data System for Follicular Neoplasm scoring criteria to differentiate between benign and malignant FTNs using our data. Results: The cohort included 1539 patients (mean age 47.98, SD 14.15 years; female: n=1126, 73.16%) with 1672 FTN tumors (FTA: n=1414; FTC: n=258; FTA:FTC ratio=5.5). The random forest model emerged as optimal, identifying mean thyroid-stimulating hormone (TSH) score, mean tumor diameter, mean TSH, TSH instability, and TSH measurement levels as the top 5 predictors in discriminating FTA from FTC, with the area under the receiver operating characteristic curve of 0.79 (95% CI 0.77‐0.81) and AUPRC of 0.40 (95% CI 0.37-0.44). Malignancy risk increased nonlinearly with larger tumor diameters and higher TSH instability but decreased nonlinearly with higher mean TSH scores or mean TSH levels. FTCs with small sizes (mean diameter 2.88, SD 1.38 cm) were more likely to be misclassified as FTAs compared to larger ones (mean diameter 3.71, SD 1.36 cm). The systematic review of the 7 included studies revealed that (1) the FTA:FTC ratio varied from 0.6 to 4.0, lower than the natural distribution of 5.0; (2) no studies assessed prediction performance using AUPRC in unbalanced datasets; and (3) external validations of Thyroid Imaging Reporting and Data System for Follicular Neoplasm scoring criteria underperformed relative to the original study. Conclusions: Tumor size and TSH measurements were important in screening FTN malignancy risk preoperatively, but accurately predicting the risk of small-sized FTNs remains challenging. Future research should address the limitations posed by the extreme imbalance in FTA and FTC distributions in real-world data. %R 10.2196/66269 %U https://cancer.jmir.org/2025/1/e66269 %U https://doi.org/10.2196/66269 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e63881 %T InfectA-Chat, an Arabic Large Language Model for Infectious Diseases: Comparative Analysis %A Selcuk,Yesim %A Kim,Eunhui %A Ahn,Insung %+ , Department of Data-Centric Problem Solving Research, Korea Institute of Science and Technology Information, 245 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea, 82 42 869 1053, isahn@kisti.re.kr %K large language model %K Arabic large language models %K AceGPT %K multilingual large language model %K infectious disease monitoring %K public health %D 2025 %7 10.2.2025 %9 Original Paper %J JMIR Med Inform %G English %X Background: Infectious diseases have consistently been a significant concern in public health, requiring proactive measures to safeguard societal well-being. In this regard, regular monitoring activities play a crucial role in mitigating the adverse effects of diseases on society. To monitor disease trends, various organizations, such as the World Health Organization (WHO) and the European Centre for Disease Prevention and Control (ECDC), collect diverse surveillance data and make them publicly accessible. However, these platforms primarily present surveillance data in English, which creates language barriers for non–English-speaking individuals and global public health efforts to accurately observe disease trends. This challenge is particularly noticeable in regions such as the Middle East, where specific infectious diseases, such as Middle East respiratory syndrome coronavirus (MERS-CoV), have seen a dramatic increase. For such regions, it is essential to develop tools that can overcome language barriers and reach more individuals to alleviate the negative impacts of these diseases. Objective: This study aims to address these issues; therefore, we propose InfectA-Chat, a cutting-edge large language model (LLM) specifically designed for the Arabic language but also incorporating English for question and answer (Q&A) tasks. InfectA-Chat leverages its deep understanding of the language to provide users with information on the latest trends in infectious diseases based on their queries. Methods: This comprehensive study was achieved by instruction tuning the AceGPT-7B and AceGPT-7B-Chat models on a Q&A task, using a dataset of 55,400 Arabic and English domain–specific instruction–following data. The performance of these fine-tuned models was evaluated using 2770 domain-specific Arabic and English instruction–following data, using the GPT-4 evaluation method. A comparative analysis was then performed against Arabic LLMs and state-of-the-art models, including AceGPT-13B-Chat, Jais-13B-Chat, Gemini, GPT-3.5, and GPT-4. Furthermore, to ensure the model had access to the latest information on infectious diseases by regularly updating the data without additional fine-tuning, we used the retrieval-augmented generation (RAG) method. Results: InfectA-Chat demonstrated good performance in answering questions about infectious diseases by the GPT-4 evaluation method. Our comparative analysis revealed that it outperforms the AceGPT-7B-Chat and InfectA-Chat (based on AceGPT-7B) models by a margin of 43.52%. It also surpassed other Arabic LLMs such as AceGPT-13B-Chat and Jais-13B-Chat by 48.61%. Among the state-of-the-art models, InfectA-Chat achieved a leading performance of 23.78%, competing closely with the GPT-4 model. Furthermore, the RAG method in InfectA-Chat significantly improved document retrieval accuracy. Notably, RAG retrieved more accurate documents based on queries when the top-k parameter value was increased. Conclusions: Our findings highlight the shortcomings of general Arabic LLMs in providing up-to-date information about infectious diseases. With this study, we aim to empower individuals and public health efforts by offering a bilingual Q&A system for infectious disease monitoring. %M 39928922 %R 10.2196/63881 %U https://medinform.jmir.org/2025/1/e63881 %U https://doi.org/10.2196/63881 %U http://www.ncbi.nlm.nih.gov/pubmed/39928922 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e60888 %T Machine Learning in the Management of Patients Undergoing Catheter Ablation for Atrial Fibrillation: Scoping Review %A Luo,Aijing %A Chen,Wei %A Zhu,Hongtao %A Xie,Wenzhao %A Chen,Xi %A Liu,Zhenjiang %A Xin,Zirui %+ The Second Xiangya Hospital, Central South University, No. 139 Renmin Middle Road, Changsha, 410011, China, 86 15211017166, xinzirui@csu.edu.cn %K atrial fibrillation %K catheter ablation %K deep learning %K patient management %K prognosis %K quality assessment tools %K cardiac arrhythmia %K public health %K quality of life %K severe medical condition %K electrocardiogram %K electronic health record %K morbidity %K mortality %K thromboembolism %K clinical intervention %D 2025 %7 10.2.2025 %9 Review %J J Med Internet Res %G English %X Background: Although catheter ablation (CA) is currently the most effective clinical treatment for atrial fibrillation, its variable therapeutic effects among different patients present numerous problems. Machine learning (ML) shows promising potential in optimizing the management and clinical outcomes of patients undergoing atrial fibrillation CA (AFCA). Objective: This scoping review aimed to evaluate the current scientific evidence on the application of ML for managing patients undergoing AFCA, compare the performance of various models across specific clinical tasks within AFCA, and summarize the strengths and limitations of ML in this field. Methods: Adhering to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines, relevant studies published up to October 7, 2023, were searched from PubMed, Web of Science, Embase, the Cochrane Library, and ScienceDirect. The final included studies were confirmed based on inclusion and exclusion criteria and manual review. The PROBAST (Prediction model Risk Of Bias Assessment Tool) and QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies-2) methodological quality assessment tools were used to review the included studies, and narrative data synthesis was performed on the modeled results provided by these studies. Results: The analysis of 23 included studies showcased the contributions of ML in identifying potential ablation targets, improving ablation strategies, and predicting patient prognosis. The patient data used in these studies comprised demographics, clinical characteristics, various types of imaging (9/23, 39%), and electrophysiological signals (7/23, 30%). In terms of model type, deep learning, represented by convolutional neural networks, was most frequently applied (14/23, 61%). Compared with traditional clinical scoring models or human clinicians, the model performance reported in the included studies was generally satisfactory, but most models (14/23, 61%) showed a high risk of bias due to lack of external validation. Conclusions: Our evidence-based findings suggest that ML is a promising tool for improving the effectiveness and efficiency of managing patients undergoing AFCA. While guiding data preparation and model selection for future studies, this review highlights the need to address prevalent limitations, including lack of external validation, and to further explore model generalization and interpretability. %M 39928932 %R 10.2196/60888 %U https://www.jmir.org/2025/1/e60888 %U https://doi.org/10.2196/60888 %U http://www.ncbi.nlm.nih.gov/pubmed/39928932 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e58434 %T Assessing the Uses, Benefits, and Limitations of Digital Technologies Used by Health Professionals in Supporting Obesity and Mental Health Communication: Scoping Review %A Kearns,Amanda %A Moorhead,Anne %A Mulvenna,Maurice %A Bond,Raymond %+ Life and Health Sciences, Institute for Nursing and Health Research, Ulster University, 2-24 York Street, Belfast, BT15 1AP, United Kingdom, 44 7706848477, kearns-a7@ulster.ac.uk %K digital communication %K digital technology %K digital transformation %K health professional %K mental health %K obesity %K complex needs %K artificial intelligence %K AI %K PRISMA %D 2025 %7 10.2.2025 %9 Review %J J Med Internet Res %G English %X Background: Obesity and mental health issues present interconnected public health challenges that impair physical, social, and mental well-being. Digital technologies offer potential for enhancing health care communication between health professionals (HPs) and individuals living with obesity and mental health issues, but their effectiveness is not fully understood. Objective: This scoping review aims to identify and understand the different types of technologies used by HPs in supporting obesity and mental health communication. Methods: A comprehensive scoping review, which followed a validated methodology, analyzed studies published between 2013 and 2023 across 8 databases. The data extraction focused on HPs’ use of communication technologies, intervention types, biopsychosocial considerations, and perceptions of technology use. The review was guided by the following research question: “What are the uses, benefits, and limitations of digital technologies in supporting communication between HPs and persons living with obesity and mental health issues?” Results: In total, 8 studies—featuring web-based platforms, social media, synchronous video calls, telephone calls, automated SMS text messaging, and email—met the inclusion criteria. Technologies such as virtual learning collaborative dashboards and videoconferencing, supported by automated SMS text messaging and social media (Facebook and WhatsApp groups), were commonly used. Psychologists, dietitians, social workers, and health coaches used digital tools to facilitate virtual appointments, diet and mental health monitoring, and motivational and educational support through group therapy, 1-on-1 sessions, and hybrid models. Benefits included enhanced access to care and engagement, personalized digital cognitive behavioral therapy, perceived stigma reduction, privacy, and improved physical health outcomes in weight reduction. However, improvements in mental health outcomes were not statistically significant in studies reporting P values (P≥.05). The limitations included engagement difficulties due to conflicting personal family and work commitments; variable communication mode preferences, with some preferring in-person sessions; and misinterpretations of SMS text messaging prompts. Conflicts arose from cultural and individual differences, weight stigma, and confusion over HP roles in obesity and mental health care. Conclusions: Digital technologies have diversified the approaches HPs can take in delivering education, counseling, and motivation to individuals with obesity and mental health issues, facilitating private, stigma-reduced environments for personalized care. While the interventions were effective in obesity management, the review revealed a shortfall in addressing mental health needs. This highlights an urgent need for digital tools to serve as media for a deeper engagement with individuals’ complex biopsychosocial needs. The integration of data science and technological advancements offers promising avenues for tailored digital solutions. The findings advocate the importance of continued innovation and adaptation in digital health care communication strategies, with clearer HP roles and an interdisciplinary, empathetic approach focused on individual needs. %M 39928923 %R 10.2196/58434 %U https://www.jmir.org/2025/1/e58434 %U https://doi.org/10.2196/58434 %U http://www.ncbi.nlm.nih.gov/pubmed/39928923 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 12 %N %P e64414 %T Physician Perspectives on the Potential Benefits and Risks of Applying Artificial Intelligence in Psychiatric Medicine: Qualitative Study %A Stroud,Austin M %A Curtis,Susan H %A Weir,Isabel B %A Stout,Jeremiah J %A Barry,Barbara A %A Bobo,William V %A Athreya,Arjun P %A Sharp,Richard R %+ Biomedical Ethics Program, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, United States, 1 507 538 6502, sharp.richard@mayo.edu %K artificial intelligence %K machine learning %K digital health %K mental health %K psychiatry %K depression %K interviews %K family medicine %K physicians %K qualitative %K providers %K attitudes %K opinions %K perspectives %K ethics %D 2025 %7 10.2.2025 %9 Original Paper %J JMIR Ment Health %G English %X Background: As artificial intelligence (AI) tools are integrated more widely in psychiatric medicine, it is important to consider the impact these tools will have on clinical practice. Objective: This study aimed to characterize physician perspectives on the potential impact AI tools will have in psychiatric medicine. Methods: We interviewed 42 physicians (21 psychiatrists and 21 family medicine practitioners). These interviews used detailed clinical case scenarios involving the use of AI technologies in the evaluation, diagnosis, and treatment of psychiatric conditions. Interviews were transcribed and subsequently analyzed using qualitative analysis methods. Results: Physicians highlighted multiple potential benefits of AI tools, including potential support for optimizing pharmaceutical efficacy, reducing administrative burden, aiding shared decision-making, and increasing access to health services, and were optimistic about the long-term impact of these technologies. This optimism was tempered by concerns about potential near-term risks to both patients and themselves including misguiding clinical judgment, increasing clinical burden, introducing patient harms, and creating legal liability. Conclusions: Our results highlight the importance of considering specialist perspectives when deploying AI tools in psychiatric medicine. %M 39928397 %R 10.2196/64414 %U https://mental.jmir.org/2025/1/e64414 %U https://doi.org/10.2196/64414 %U http://www.ncbi.nlm.nih.gov/pubmed/39928397 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 12 %N %P e64396 %T The Efficacy of Conversational AI in Rectifying the Theory-of-Mind and Autonomy Biases: Comparative Analysis %A Rządeczka,Marcin %A Sterna,Anna %A Stolińska,Julia %A Kaczyńska,Paulina %A Moskalewicz,Marcin %+ Institute of Philosophy, Maria Curie-Skłodowska University, Pl. Marii Curie-Skłodowskiej 4, pok. 204, Lublin, 20-031, Poland, 48 815375481, marcin.rzadeczka@umcs.pl %K cognitive bias %K conversational artificial intelligence %K artificial intelligence %K AI %K chatbots %K digital mental health %K bias rectification %K affect recognition %D 2025 %7 7.2.2025 %9 Original Paper %J JMIR Ment Health %G English %X Background: The increasing deployment of conversational artificial intelligence (AI) in mental health interventions necessitates an evaluation of their efficacy in rectifying cognitive biases and recognizing affect in human-AI interactions. These biases are particularly relevant in mental health contexts as they can exacerbate conditions such as depression and anxiety by reinforcing maladaptive thought patterns or unrealistic expectations in human-AI interactions. Objective: This study aimed to assess the effectiveness of therapeutic chatbots (Wysa and Youper) versus general-purpose language models (GPT-3.5, GPT-4, and Gemini Pro) in identifying and rectifying cognitive biases and recognizing affect in user interactions. Methods: This study used constructed case scenarios simulating typical user-bot interactions to examine how effectively chatbots address selected cognitive biases. The cognitive biases assessed included theory-of-mind biases (anthropomorphism, overtrust, and attribution) and autonomy biases (illusion of control, fundamental attribution error, and just-world hypothesis). Each chatbot response was evaluated based on accuracy, therapeutic quality, and adherence to cognitive behavioral therapy principles using an ordinal scale to ensure consistency in scoring. To enhance reliability, responses underwent a double review process by 2 cognitive scientists, followed by a secondary review by a clinical psychologist specializing in cognitive behavioral therapy, ensuring a robust assessment across interdisciplinary perspectives. Results: This study revealed that general-purpose chatbots outperformed therapeutic chatbots in rectifying cognitive biases, particularly in overtrust bias, fundamental attribution error, and just-world hypothesis. GPT-4 achieved the highest scores across all biases, whereas the therapeutic bot Wysa scored the lowest. Notably, general-purpose bots showed more consistent accuracy and adaptability in recognizing and addressing bias-related cues across different contexts, suggesting a broader flexibility in handling complex cognitive patterns. In addition, in affect recognition tasks, general-purpose chatbots not only excelled but also demonstrated quicker adaptation to subtle emotional nuances, outperforming therapeutic bots in 67% (4/6) of the tested biases. Conclusions: This study shows that, while therapeutic chatbots hold promise for mental health support and cognitive bias intervention, their current capabilities are limited. Addressing cognitive biases in AI-human interactions requires systems that can both rectify and analyze biases as integral to human cognition, promoting precision and simulating empathy. The findings reveal the need for improved simulated emotional intelligence in chatbot design to provide adaptive, personalized responses that reduce overreliance and encourage independent coping skills. Future research should focus on enhancing affective response mechanisms and addressing ethical concerns such as bias mitigation and data privacy to ensure safe, effective AI-based mental health support. %M 39919295 %R 10.2196/64396 %U https://mental.jmir.org/2025/1/e64396 %U https://doi.org/10.2196/64396 %U http://www.ncbi.nlm.nih.gov/pubmed/39919295 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e59524 %T Unraveling Online Mental Health Through the Lens of Early Maladaptive Schemas: AI-Enabled Content Analysis of Online Mental Health Communities %A Ang,Beng Heng %A Gollapalli,Sujatha Das %A Du,Mingzhe %A Ng,See-Kiong %+ Integrative Sciences and Engineering Programme, NUS Graduate School, National University of Singapore, University Hall, Tan Chin Tuan Wing Level 5, #05-03 21 Lower Kent Ridge Road, Singapore, 119077, Singapore, 65 92983451, bengheng.ang@u.nus.edu %K early maladaptive schemas %K large language models %K online mental health communities %K case conceptualization %K prompt engineering %K artificial intelligence %K AI %D 2025 %7 7.2.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Early maladaptive schemas (EMSs) are pervasive, self-defeating patterns of thoughts and emotions underlying most mental health problems and are central in schema therapy. However, the characteristics of EMSs vary across demographics, and despite the growing use of online mental health communities (OMHCs), how EMSs manifest in these online support-seeking environments remains unclear. Understanding these characteristics could inform the design of more effective interventions powered by artificial intelligence to address online support seekers’ unique therapeutic needs. Objective: We aimed to uncover associations between EMSs and mental health problems within OMHCs and examine features of EMSs as they are reflected in OMHCs. Methods: We curated a dataset of 29,329 posts from widely accessed OMHCs, labeling each with relevant schemas and mental health problems. To identify associations, we conducted chi-square tests of independence and calculated odds ratios (ORs) with the dataset. In addition, we developed a novel group-level case conceptualization technique, leveraging GPT-4 to extract features of EMSs from OMHC texts across key schema therapy dimensions, such as schema triggers and coping responses. Results: Several associations were identified between EMSs and mental health problems, reflecting how EMSs manifest in online support-seeking contexts. Anxiety-related problems typically highlighted vulnerability to harm or illness (OR 5.64, 95% CI 5.34-5.96; P<.001), while depression-related problems emphasized unmet interpersonal needs, such as social isolation (OR 3.18, 95% CI 3.02-3.34; P<.001). Conversely, problems with eating disorders mostly exemplified negative self-perception and emotional inhibition (OR 1.89, 95% CI 1.45-2.46; P<.001). Personality disorders reflected themes of subjugation (OR 2.51, 95% CI 1.86-3.39; P<.001), while posttraumatic stress disorder problems involved distressing experiences and mistrust (OR 5.04, 95% CI 4.49-5.66; P<.001). Substance use disorder problems reflected negative self-perception of failure to achieve (OR 1.83, 95% CI 1.35-2.49; P<.001). Depression, personality disorders, and posttraumatic stress disorder were also associated with 12, 9, and 7 EMSs, respectively, emphasizing their complexities and the need for more comprehensive interventions. In contrast, anxiety, eating disorder, and substance use disorder were related to only 2 to 3 EMSs, suggesting that these problems are better addressed through targeted interventions. In addition, the EMS features extracted from our dataset averaged 13.27 (SD 3.05) negative features per schema, with 2.65 (SD 1.07) features per dimension, as supported by existing literature. Conclusions: We uncovered various associations between EMSs and mental health problems among online support seekers, highlighting the prominence of specific EMSs in each problem and the unique complexities of each problem in terms of EMSs. We also identified EMS features as expressed by support seekers in OMHCs, reinforcing the relevance of EMSs in these online support-seeking contexts. These insights are valuable for understanding how EMS are characterized in OMHCs and can inform the development of more effective artificial intelligence–powered tools to enhance support on these platforms. %M 39919286 %R 10.2196/59524 %U https://www.jmir.org/2025/1/e59524 %U https://doi.org/10.2196/59524 %U http://www.ncbi.nlm.nih.gov/pubmed/39919286 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e63550 %T ChatGPT for Univariate Statistics: Validation of AI-Assisted Data Analysis in Healthcare Research %A Ruta,Michael R %A Gaidici,Tony %A Irwin,Chase %A Lifshitz,Jonathan %+ University of Arizona College of Medicine – Phoenix, 475 N 5th St, Phoenix, AZ, 85004, United States, 1 602 827 2002, mruta@arizona.edu %K ChatGPT %K data analysis %K statistics %K chatbot %K artificial intelligence %K biomedical research %K programmers %K bioinformatics %K data processing %D 2025 %7 7.2.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: ChatGPT, a conversational artificial intelligence developed by OpenAI, has rapidly become an invaluable tool for researchers. With the recent integration of Python code interpretation into the ChatGPT environment, there has been a significant increase in the potential utility of ChatGPT as a research tool, particularly in terms of data analysis applications. Objective: This study aimed to assess ChatGPT as a data analysis tool and provide researchers with a framework for applying ChatGPT to data management tasks, descriptive statistics, and inferential statistics. Methods: A subset of the National Inpatient Sample was extracted. Data analysis trials were divided into data processing, categorization, and tabulation, as well as descriptive and inferential statistics. For data processing, categorization, and tabulation assessments, ChatGPT was prompted to reclassify variables, subset variables, and present data, respectively. Descriptive statistics assessments included mean, SD, median, and IQR calculations. Inferential statistics assessments were conducted at varying levels of prompt specificity (“Basic,” “Intermediate,” and “Advanced”). Specific tests included chi-square, Pearson correlation, independent 2-sample t test, 1-way ANOVA, Fisher exact, Spearman correlation, Mann-Whitney U test, and Kruskal-Wallis H test. Outcomes from consecutive prompt-based trials were assessed against expected statistical values calculated in Python (Python Software Foundation), SAS (SAS Institute), and RStudio (Posit PBC). Results: ChatGPT accurately performed data processing, categorization, and tabulation across all trials. For descriptive statistics, it provided accurate means, SDs, medians, and IQRs across all trials. Inferential statistics accuracy against expected statistical values varied with prompt specificity: 32.5% accuracy for “Basic” prompts, 81.3% for “Intermediate” prompts, and 92.5% for “Advanced” prompts. Conclusions: ChatGPT shows promise as a tool for exploratory data analysis, particularly for researchers with some statistical knowledge and limited programming expertise. However, its application requires careful prompt construction and human oversight to ensure accuracy. As a supplementary tool, ChatGPT can enhance data analysis efficiency and broaden research accessibility. %M 39919289 %R 10.2196/63550 %U https://www.jmir.org/2025/1/e63550 %U https://doi.org/10.2196/63550 %U http://www.ncbi.nlm.nih.gov/pubmed/39919289 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e65146 %T Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study %A Yang,Zhichao %A Yao,Zonghai %A Tasmin,Mahbuba %A Vashisht,Parth %A Jang,Won Seok %A Ouyang,Feiyun %A Wang,Beining %A McManus,David %A Berlowitz,Dan %A Yu,Hong %+ , Miner School of Computer & Information Sciences, University of Massachusetts Lowell, 1 University Ave, Lowell, MA, 01854, United States, 1 508 612 7292, Hong_Yu@uml.edu %K artificial intelligence %K natural language processing %K large language model %K LLM %K ChatGPT %K GPT %K GPT-4V %K USMLE %K Medical License Exam %K medical image interpretation %K United States Medical Licensing Examination %K NLP %D 2025 %7 7.2.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Recent advancements in artificial intelligence, such as GPT-3.5 Turbo (OpenAI) and GPT-4, have demonstrated significant potential by achieving good scores on text-only United States Medical Licensing Examination (USMLE) exams and effectively answering questions from physicians. However, the ability of these models to interpret medical images remains underexplored. Objective: This study aimed to comprehensively evaluate the performance, interpretability, and limitations of GPT-3.5 Turbo, GPT-4, and its successor, GPT-4 Vision (GPT-4V), specifically focusing on GPT-4V’s newly introduced image-understanding feature. By assessing the models on medical licensing examination questions that require image interpretation, we sought to highlight the strengths and weaknesses of GPT-4V in handling complex multimodal clinical information, thereby exposing hidden flaws and providing insights into its readiness for integration into clinical settings. Methods: This cross-sectional study tested GPT-4V, GPT-4, and ChatGPT-3.5 Turbo on a total of 227 multiple-choice questions with images from USMLE Step 1 (n=19), Step 2 clinical knowledge (n=14), Step 3 (n=18), the Diagnostic Radiology Qualifying Core Exam (DRQCE) (n=26), and AMBOSS question banks (n=150). AMBOSS provided expert-written hints and question difficulty levels. GPT-4V’s accuracy was compared with 2 state-of-the-art large language models, GPT-3.5 Turbo and GPT-4. The quality of the explanations was evaluated by choosing human preference between an explanation by GPT-4V (without hint), an explanation by an expert, or a tie, using 3 qualitative metrics: comprehensive explanation, question information, and image interpretation. To better understand GPT-4V’s explanation ability, we modified a patient case report to resemble a typical “curbside consultation” between physicians. Results: For questions with images, GPT-4V achieved an accuracy of 84.2%, 85.7%, 88.9%, and 73.1% in Step 1, Step 2 clinical knowledge, Step 3 of USMLE, and DRQCE, respectively. It outperformed GPT-3.5 Turbo (42.1%, 50%, 50%, 19.2%) and GPT-4 (63.2%, 64.3%, 66.7%, 26.9%). When GPT-4V answered correctly, its explanations were nearly as good as those provided by domain experts from AMBOSS. However, incorrect answers often had poor explanation quality: 18.2% (10/55) contained inaccurate text, 45.5% (25/55) had inference errors, and 76.3% (42/55) demonstrated image misunderstandings. With human expert assistance, GPT-4V reduced errors by an average of 40% (22/55). GPT-4V accuracy improved with hints, maintaining stable performance across difficulty levels, while medical student performance declined as difficulty increased. In a simulated curbside consultation scenario, GPT-4V required multiple specific prompts to interpret complex case data accurately. Conclusions: GPT-4V achieved high accuracy on multiple-choice questions with images, highlighting its potential in medical assessments. However, significant shortcomings were observed in the quality of explanations when questions were answered incorrectly, particularly in the interpretation of images, which could not be efficiently resolved through expert interaction. These findings reveal hidden flaws in the image interpretation capabilities of GPT-4V, underscoring the need for more comprehensive evaluations beyond multiple-choice questions before integrating GPT-4V into clinical settings. %M 39919278 %R 10.2196/65146 %U https://www.jmir.org/2025/1/e65146 %U https://doi.org/10.2196/65146 %U http://www.ncbi.nlm.nih.gov/pubmed/39919278 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e55825 %T Diagnosis of Chronic Kidney Disease Using Retinal Imaging and Urine Dipstick Data: Multimodal Deep Learning Approach %A Bhak,Youngmin %A Lee,Yu Ho %A Kim,Joonhyung %A Lee,Kiwon %A Lee,Daehwan %A Jang,Eun Chan %A Jang,Eunjeong %A Lee,Christopher Seungkyu %A Kang,Eun Seok %A Park,Sehee %A Han,Hyun Wook %A Nam,Sang Min %K multimodal deep learning %K chronic kidney disease %K fundus image %K saliency map %K urine dipstick %D 2025 %7 7.2.2025 %9 %J JMIR Med Inform %G English %X Background: Chronic kidney disease (CKD) is a prevalent condition with significant global health implications. Early detection and management are critical to prevent disease progression and complications. Deep learning (DL) models using retinal images have emerged as potential noninvasive screening tools for CKD, though their performance may be limited, especially in identifying individuals with proteinuria and in specific subgroups. Objective: We aim to evaluate the efficacy of integrating retinal images and urine dipstick data into DL models for enhanced CKD diagnosis. Methods: The 3 models were developed and validated: eGFR-RIDL (estimated glomerular filtration rate–retinal image deep learning), eGFR-UDLR (logistic regression using urine dipstick data), and eGFR-MMDL (multimodal deep learning combining retinal images and urine dipstick data). All models were trained to predict an eGFR<60 mL/min/1.73 m², a key indicator of CKD, calculated using the 2009 CKD-EPI (Chronic Kidney Disease Epidemiology Collaboration) equation. This study used a multicenter dataset of participants aged 20‐79 years, including a development set (65,082 people) and an external validation set (58,284 people). Wide Residual Networks were used for DL, and saliency maps were used to visualize model attention. Sensitivity analyses assessed the impact of numerical variables. Results: eGFR-MMDL outperformed eGFR-RIDL in both the test and external validation sets, with area under the curves of 0.94 versus 0.90 and 0.88 versus 0.77 (P<.001 for both, DeLong test). eGFR-UDLR outperformed eGFR-RIDL and was comparable to eGFR-MMDL, particularly in the external validation. However, in the subgroup analysis, eGFR-MMDL showed improvement across all subgroups, while eGFR-UDLR demonstrated no such gains. This suggested that the enhanced performance of eGFR-MMDL was not due to urine data alone, but rather from the synergistic integration of both retinal images and urine data. The eGFR-MMDL model demonstrated the best performance in individuals younger than 65 years or those with proteinuria. Age and proteinuria were identified as critical factors influencing model performance. Saliency maps indicated that urine data and retinal images provide complementary information, with urine offering insights into retinal abnormalities and retinal images, particularly the arcade vessels, being key for predicting kidney function. Conclusions: The MMDL model integrating retinal images and urine dipstick data show significant promise for noninvasive CKD screening, outperforming the retinal image–only model. However, routine blood tests are still recommended for individuals aged 65 years and older due to the model’s limited performance in this age group. %R 10.2196/55825 %U https://medinform.jmir.org/2025/1/e55825 %U https://doi.org/10.2196/55825 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 9 %N %P e62935 %T Assessment of Digital Capabilities by 9 Countries in the Alliance for Healthy Cities Using AI: Cross-Sectional Analysis %A Lee, Hocheol %K digital capabilities %K digital health cities %K digital transformation %K Asian Forum of Healthy Cities %K assessment %K digital health %K artificial intelligence %K AI %K World Health Organization %K WHO %K healthy city %K data %K health management %K digital era %K qualitative analysis %K cross-sectional survey %K database %K digital health database %K effectiveness %K digital literacy %D 2025 %7 7.2.2025 %9 %J JMIR Form Res %G English %X Background: The Alma-Ata Declaration of 1978 initiated a global focus on universal health, supported by the World Health Organization (WHO) through healthy cities policies. The concept emerged at the 1984 Toronto “Beyond Health Care” conference, leading to WHO’s first pilot project in Lisbon in 1986. The WHO continues to support regional healthy city networks, emphasizing digital transformation and data-driven health management in the digital era. Objective: This study explored the capabilities of digital healthy cities within the framework of digital transformation, focusing on member countries of the Asian Forum of Healthy Cities. It examined the cities’ preparedness and policy needs for transitioning to digital health. Methods: A cross-sectional survey was conducted of 9 countries—Australia, Cambodia, China, Japan, South Korea, Malaysia, Mongolia, the Philippines, and Vietnam—from August 1 to September 21, 2023. The 6-section SPIRIT (setting approach and sustainability; political commitment, policy, and community participation; information and innovation; resources and research; infrastructure and intersectoral; and training) checklist was modified to assess healthy cities’ digital capabilities. With input from 3 healthy city experts, the checklist was revised for digital capabilities, renaming “healthy city” to “digital healthy city.” The revised tool comprises 8 sections with 33 items. The survey leveraged ChatGPT (version 4.0; OpenAI, Microsoft), accessed via Python (Python Software Foundation) application programming interface. The openai library was installed, and an application programming interface key was entered to use ChatGPT (version 4.0). The “GPT-4 Turbo” model command was applied. A qualitative analysis of the collected data was conducted by 5 healthy city experts through group deep-discussions. Results: The results indicate that these countries should establish networks and committees for sustainable digital healthy cities. Cambodia showed the lowest access to electricity (70%) and significant digital infrastructure disparities. Efforts to sustain digital health initiatives varied, with countries such as Korea focusing on telemedicine, while China aimed to build a comprehensive digital health database, highlighting the need for tailored strategies in promoting digital healthy cities. Life expectancy was the highest in the Republic of Korea and Japan (both 84 y). Access to electricity was the lowest in Cambodia (70%) with the remaining countries having had 95% or higher access. The internet use rate was the highest in Malaysia (97.4%), followed by the Republic of Korea (97.2%), Australia (96.2%), and Japan (82.9%). Conclusions: This study highlights the importance of big data-driven policies and personal information protection systems. Collaborative efforts across sectors for effective implementation of digital healthy cities. The findings suggest that the effectiveness of digital healthy cities is diminished without adequate digital literacy among managers and users, suggesting the need for policies to improve digital literacy. %R 10.2196/62935 %U https://formative.jmir.org/2025/1/e62935 %U https://doi.org/10.2196/62935 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 4 %N %P e57319 %T Investigating the Classification of Living Kidney Donation Experiences on Reddit and Understanding the Sensitivity of ChatGPT to Prompt Engineering: Content Analysis %A Nielsen,Joshua %A Chen,Xiaoyu %A Davis,LaShara %A Waterman,Amy %A Gentili,Monica %+ Department of Industrial Engineering, JB Speed School of Engineering, University of Louisville, 220 Eastern Parkway, Louisville, KY, 40292, United States, 1 5024891335, joshua.nielsen@louisville.edu %K prompt engineering %K generative artificial intelligence %K kidney donation %K transplant %K living donor %D 2025 %7 7.2.2025 %9 Original Paper %J JMIR AI %G English %X Background: Living kidney donation (LKD), where individuals donate one kidney while alive, plays a critical role in increasing the number of kidneys available for those experiencing kidney failure. Previous studies show that many generous people are interested in becoming living donors; however, a huge gap exists between the number of patients on the waiting list and the number of living donors yearly. Objective: To bridge this gap, we aimed to investigate how to identify potential living donors from discussions on public social media forums so that educational interventions could later be directed to them. Methods: Using Reddit forums as an example, this study described the classification of Reddit content shared about LKD into three classes: (1) present (presently dealing with LKD personally), (2) past (dealt with LKD personally in the past), and (3) other (LKD general comments). An evaluation was conducted comparing a fine-tuned distilled version of the Bidirectional Encoder Representations from Transformers (BERT) model with inference using GPT-3.5 (ChatGPT). To systematically evaluate ChatGPT’s sensitivity to distinguishing between the 3 prompt categories, we used a comprehensive prompt engineering strategy encompassing a full factorial analysis in 48 runs. A novel prompt engineering approach, dialogue until classification consensus, was introduced to simulate a deliberation between 2 domain experts until a consensus on classification was achieved. Results: BERT and GPT-3.5 exhibited classification accuracies of approximately 75% and 78%, respectively. Recognizing the inherent ambiguity between classes, a post hoc analysis of incorrect predictions revealed sensible reasoning and acceptable errors in the predictive models. Considering these acceptable mismatched predictions, the accuracy improved to 89.3% for BERT and 90.7% for GPT-3.5. Conclusions: Large language models, such as GPT-3.5, are highly capable of detecting and categorizing LKD-targeted content on social media forums. They are sensitive to instructions, and the introduced dialogue until classification consensus method exhibited superior performance over stand-alone reasoning, highlighting the merit in advancing prompt engineering methodologies. The models can produce appropriate contextual reasoning, even when final conclusions differ from their human counterparts. %M 39918869 %R 10.2196/57319 %U https://ai.jmir.org/2025/1/e57319 %U https://doi.org/10.2196/57319 %U http://www.ncbi.nlm.nih.gov/pubmed/39918869 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 14 %N %P e63887 %T ChatGPT-4 Performance on German Continuing Medical Education—Friend or Foe (Trick or Treat)? Protocol for a Randomized Controlled Trial %A Burisch,Christian %A Bellary,Abhav %A Breuckmann,Frank %A Ehlers,Jan %A Thal,Serge C %A Sellmann,Timur %A Gödde,Daniel %+ State of North Rhine-Westphalia, Regional Government Düsseldorf, Leibniz-Gymnasium, Stankeitstraße 22, Essen, 45326, Germany, 49 201 79938720, christian.burisch@rub.de %K ChatGPT %K artificial intelligence %K large language model %K postgraduate education %K continuing medical education %K self-assessment program %D 2025 %7 6.2.2025 %9 Protocol %J JMIR Res Protoc %G English %X Background: The increasing development and spread of artificial and assistive intelligence is opening up new areas of application not only in applied medicine but also in related fields such as continuing medical education (CME), which is part of the mandatory training program for medical doctors in Germany. This study aimed to determine whether medical laypersons can successfully conduct training courses specifically for physicians with the help of a large language model (LLM) such as ChatGPT-4. This study aims to qualitatively and quantitatively investigate the impact of using artificial intelligence (AI; specifically ChatGPT) on the acquisition of credit points in German postgraduate medical education. Objective: Using this approach, we wanted to test further possible applications of AI in the postgraduate medical education setting and obtain results for practical use. Depending on the results, the potential influence of LLMs such as ChatGPT-4 on CME will be discussed, for example, as part of a SWOT (strengths, weaknesses, opportunities, threats) analysis. Methods: We designed a randomized controlled trial, in which adult high school students attempt to solve CME tests across six medical specialties in three study arms in total with 18 CME training courses per study arm under different interventional conditions with varying amounts of permitted use of ChatGPT-4. Sample size calculation was performed including guess probability (20% correct answers, SD=40%; confidence level of 1–α=.95/α=.05; test power of 1–β=.95; P<.05). The study was registered at open scientific framework. Results: As of October 2024, the acquisition of data and students to participate in the trial is ongoing. Upon analysis of our acquired data, we predict our findings to be ready for publication as soon as early 2025. Conclusions: We aim to prove that the advances in AI, especially LLMs such as ChatGPT-4 have considerable effects on medical laypersons’ ability to successfully pass CME tests. The implications that this holds on how the concept of continuous medical education requires reevaluation are yet to be contemplated. Trial Registration: OSF Registries 10.17605/OSF.IO/MZNUF; https://osf.io/mznuf International Registered Report Identifier (IRRID): PRR1-10.2196/63887 %M 39913914 %R 10.2196/63887 %U https://www.researchprotocols.org/2025/1/e63887 %U https://doi.org/10.2196/63887 %U http://www.ncbi.nlm.nih.gov/pubmed/39913914 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e53741 %T Mapping and Summarizing the Research on AI Systems for Automating Medical History Taking and Triage: Scoping Review %A Siira,Elin %A Johansson,Hanna %A Nygren,Jens %+ School of Health and Welfare, Halmstad University, Box 823, Halmstad, 301 18, Sweden, 46 70 692 46 13, elin.siira@hh.se %K scoping review %K artificial intelligence %K AI %K medical history taking %K triage %K health care %K automation %D 2025 %7 6.2.2025 %9 Review %J J Med Internet Res %G English %X Background: The integration of artificial intelligence (AI) systems for automating medical history taking and triage can significantly enhance patient flow in health care systems. Despite the promising performance of numerous AI studies, only a limited number of these systems have been successfully integrated into routine health care practice. To elucidate how AI systems can create value in this context, it is crucial to identify the current state of knowledge, including the readiness of these systems, the facilitators of and barriers to their implementation, and the perspectives of various stakeholders involved in their development and deployment. Objective: This study aims to map and summarize empirical research on AI systems designed for automating medical history taking and triage in health care settings. Methods: The study was conducted following the framework proposed by Arksey and O’Malley and adhered to the PRISMA-ScR (Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews) guidelines. A comprehensive search of 5 databases—PubMed, CINAHL, PsycINFO, Scopus, and Web of Science—was performed. A detailed protocol was established before the review to ensure methodological rigor. Results: A total of 1248 research publications were identified and screened. Of these, 86 (6.89%) met the eligibility criteria. Notably, most (n=63, 73%) studies were published between 2020 and 2022, with a significant concentration on emergency care (n=32, 37%). Other clinical contexts included radiology (n=12, 14%) and primary care (n=6, 7%). Many (n=15, 17%) studies did not specify a clinical context. Most (n=31, 36%) studies used retrospective designs, while others (n=34, 40%) did not specify their methodologies. The predominant type of AI system identified was the hybrid model (n=68, 79%), with forecasting (n=40, 47%) and recognition (n=36, 42%) being the most common tasks performed. While most (n=70, 81%) studies included patient populations, only 1 (1%) study investigated patients’ views on AI-based medical history taking and triage, and 2 (2%) studies considered health care professionals’ perspectives. Furthermore, only 6 (7%) studies validated or demonstrated AI systems in relevant clinical settings through real-time model testing, workflow implementation, clinical outcome evaluation, or integration into practice. Most (n=76, 88%) studies were concerned with the prototyping, development, or validation of AI systems. In total, 4 (5%) studies were reviews of several empirical studies conducted in different clinical settings. The facilitators and barriers to AI system implementation were categorized into 4 themes: technical aspects, contextual and cultural considerations, end-user engagement, and evaluation processes. Conclusions: This review highlights current trends, stakeholder perspectives, stages of innovation development, and key influencing factors related to implementing AI systems in health care. The identified literature gaps regarding stakeholder perspectives and the limited research on AI systems for automating medical history taking and triage indicate significant opportunities for further investigation and development in this evolving field. %M 39913918 %R 10.2196/53741 %U https://www.jmir.org/2025/1/e53741 %U https://doi.org/10.2196/53741 %U http://www.ncbi.nlm.nih.gov/pubmed/39913918 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 4 %N %P e60847 %T Advancing Privacy-Preserving Health Care Analytics and Implementation of the Personal Health Train: Federated Deep Learning Study %A Choudhury,Ananya %A Volmer,Leroy %A Martin,Frank %A Fijten,Rianne %A Wee,Leonard %A Dekker,Andre %A Soest,Johan van %+ , GROW Research Institute for Oncology and Reproduction, Maastricht University Medical Center+, Paul Henri Spakalaan 1, Maastricht, 6229EN, Netherlands, 31 0686008485, ananya.aus@gmail.com %K gross tumor volume segmentation %K federated learning infrastructure %K privacy-preserving technology %K cancer %K deep learning %K artificial intelligence %K lung cancer %K oncology %K radiotherapy %K imaging %K data protection %K data privacy %D 2025 %7 6.2.2025 %9 Original Paper %J JMIR AI %G English %X Background: The rapid advancement of deep learning in health care presents significant opportunities for automating complex medical tasks and improving clinical workflows. However, widespread adoption is impeded by data privacy concerns and the necessity for large, diverse datasets across multiple institutions. Federated learning (FL) has emerged as a viable solution, enabling collaborative artificial intelligence model development without sharing individual patient data. To effectively implement FL in health care, robust and secure infrastructures are essential. Developing such federated deep learning frameworks is crucial to harnessing the full potential of artificial intelligence while ensuring patient data privacy and regulatory compliance. Objective: The objective is to introduce an innovative FL infrastructure called the Personal Health Train (PHT) that includes the procedural, technical, and governance components needed to implement FL on real-world health care data, including training deep learning neural networks. The study aims to apply this federated deep learning infrastructure to the use case of gross tumor volume segmentation on chest computed tomography images of patients with lung cancer and present the results from a proof-of-concept experiment. Methods: The PHT framework addresses the challenges of data privacy when sharing data, by keeping data close to the source and instead bringing the analysis to the data. Technologically, PHT requires 3 interdependent components: “tracks” (protected communication channels), “trains” (containerized software apps), and “stations” (institutional data repositories), which are supported by the open source “Vantage6” software. The study applies this federated deep learning infrastructure to the use case of gross tumor volume segmentation on chest computed tomography images of patients with lung cancer, with the introduction of an additional component called the secure aggregation server, where the model averaging is done in a trusted and inaccessible environment. Results: We demonstrated the feasibility of executing deep learning algorithms in a federated manner using PHT and presented the results from a proof-of-concept study. The infrastructure linked 12 hospitals across 8 nations, covering 4 continents, demonstrating the scalability and global reach of the proposed approach. During the execution and training of the deep learning algorithm, no data were shared outside the hospital. Conclusions: The findings of the proof-of-concept study, as well as the implications and limitations of the infrastructure and the results, are discussed. The application of federated deep learning to unstructured medical imaging data, facilitated by the PHT framework and Vantage6 platform, represents a significant advancement in the field. The proposed infrastructure addresses the challenges of data privacy and enables collaborative model development, paving the way for the widespread adoption of deep learning–based tools in the medical domain and beyond. The introduction of the secure aggregation server implied that data leakage problems in FL can be prevented by careful design decisions of the infrastructure. Trial Registration: ClinicalTrials.gov NCT05775068; https://clinicaltrials.gov/study/NCT05775068 %M 39912580 %R 10.2196/60847 %U https://ai.jmir.org/2025/1/e60847 %U https://doi.org/10.2196/60847 %U http://www.ncbi.nlm.nih.gov/pubmed/39912580 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e59817 %T Perspectives of Hispanic and Latinx Community Members on AI-Enabled mHealth Tools: Qualitative Focus Group Study %A Kraft,Stephanie A %A Chopra,Shaan %A Duran,Miriana C %A Rojina,Janet A %A Beretta,Abril %A López,Katherine I %A Javan,Russell %A Wilfond,Benjamin S %A Rosenfeld,Margaret %A Fogarty,James %A Ko,Linda K %+ Department of Bioethics and Decision Sciences, Geisinger College of Health Sciences, 100 N. Academy Ave., Danville, PA, 17822, United States, 1 5702140506, skraft1@geisinger.edu %K wearable electronic devices %K qualitative research %K mobile health %K mHealth %K digital health %K privacy %K data sharing %K artificial intelligence %K AI %K community %K chronic conditions %K chronic disease %D 2025 %7 6.2.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Mobile health (mHealth) tools have the potential to reduce the burden of chronic conditions that disproportionately affect Hispanic and Latinx communities; however, digital divides in the access to and use of health technology suggest that mHealth has the potential to exacerbate, rather than reduce, these disparities. Objective: A key step toward developing health technology that is accessible and usable is to understand community member perspectives and needs so that technology is culturally relevant and appropriately contextualized. In this study, we aimed to examine the perspectives of Hispanic and Latinx community members in Washington State about mHealth. Methods: We recruited English- and Spanish-speaking Hispanic or Latinx adults to participate in web-based focus groups through existing community-based networks across rural and urban regions of Washington State. Focus groups included a presentation of narrative slideshow materials developed by the research team depicting mHealth use case examples of asthma in children and fall risk in older adults. Focus group questions asked participants to respond to the case examples and to further explore mHealth use preferences, benefits, barriers, and concerns. Focus group recordings were professionally transcribed, and Spanish transcripts were translated into English. We developed a qualitative codebook using deductive and inductive methods and then coded deidentified transcripts using the constant comparison method. The analysis team proposed themes based on review of coded data, which were validated through member checking with a community advisory board serving Latino individuals in the region and finalized through discussion with the entire research team. Results: Between May and September 2023, we conducted 8 focus groups in English or Spanish with 48 participants. Focus groups were stratified by language and region and included the following: 3 (n=18, 38% participants) Spanish urban groups, 2 (n=14, 29% participants) Spanish rural groups, 1 (n=6, 13% participants) English urban group, and 2 (n=10, 21% participants) English rural groups. We identified the following seven themes: (1) mHealth is seen as beneficial for promoting health and peace of mind; (2) some are unaware of, unfamiliar with, or uncomfortable with technology and may benefit from individualized support; (3) financial barriers limit access to mHealth; (4) practical considerations create barriers to using mHealth in daily life; (5) mHealth raises concern for overreliance on technology; (6) automated mHealth features are perceived as valuable but fallible, requiring human input to ensure accuracy; and (7) data sharing is seen as valuable for limited uses but raises privacy concerns. These themes illustrate key barriers to the benefits of mHealth that communities may face, provide insights into the role of mHealth within families, and examine the appropriate balance of data sharing and privacy protections. Conclusions: These findings offer important insights that can help advance the development of mHealth that responds to community values and priorities. %M 39912577 %R 10.2196/59817 %U https://www.jmir.org/2025/1/e59817 %U https://doi.org/10.2196/59817 %U http://www.ncbi.nlm.nih.gov/pubmed/39912577 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67485 %T AI for IMPACTS Framework for Evaluating the Long-Term Real-World Impacts of AI-Powered Clinician Tools: Systematic Review and Narrative Synthesis %A Jacob,Christine %A Brasier,Noé %A Laurenzi,Emanuele %A Heuss,Sabina %A Mougiakakou,Stavroula-Georgia %A Cöltekin,Arzu %A Peter,Marc K %+ FHNW, University of Applied Sciences and Arts Northwestern Switzerland, Bahnhofstrasse 6, Windisch, 5210, Switzerland, 41 62 957 29 78, christine.k.jacob@gmail.com %K eHealth %K assessment %K adoption %K implementation %K artificial intelligence %K clinician %K efficiency %K health technology assessment %K clinical practice %D 2025 %7 5.2.2025 %9 Review %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) has the potential to revolutionize health care by enhancing both clinical outcomes and operational efficiency. However, its clinical adoption has been slower than anticipated, largely due to the absence of comprehensive evaluation frameworks. Existing frameworks remain insufficient and tend to emphasize technical metrics such as accuracy and validation, while overlooking critical real-world factors such as clinical impact, integration, and economic sustainability. This narrow focus prevents AI tools from being effectively implemented, limiting their broader impact and long-term viability in clinical practice. Objective: This study aimed to create a framework for assessing AI in health care, extending beyond technical metrics to incorporate social and organizational dimensions. The framework was developed by systematically reviewing, analyzing, and synthesizing the evaluation criteria necessary for successful implementation, focusing on the long-term real-world impact of AI in clinical practice. Methods: A search was performed in July 2024 across the PubMed, Cochrane, Scopus, and IEEE Xplore databases to identify relevant studies published in English between January 2019 and mid-July 2024, yielding 3528 results, among which 44 studies met the inclusion criteria. The systematic review followed PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) guidelines and the Cochrane Handbook for Systematic Reviews. Data were analyzed using NVivo through thematic analysis and narrative synthesis to identify key emergent themes in the studies. Results: By synthesizing the included studies, we developed a framework that goes beyond the traditional focus on technical metrics or study-level methodologies. It integrates clinical context and real-world implementation factors, offering a more comprehensive approach to evaluating AI tools. With our focus on assessing the long-term real-world impact of AI technologies in health care, we named the framework AI for IMPACTS. The criteria are organized into seven key clusters, each corresponding to a letter in the acronym: (1) I—integration, interoperability, and workflow; (2) M—monitoring, governance, and accountability; (3) P—performance and quality metrics; (4) A—acceptability, trust, and training; (5) C—cost and economic evaluation; (6) T—technological safety and transparency; and (7) S—scalability and impact. These are further broken down into 28 specific subcriteria. Conclusions: The AI for IMPACTS framework offers a holistic approach to evaluate the long-term real-world impact of AI tools in the heterogeneous and challenging health care context and lays the groundwork for further validation through expert consensus and testing of the framework in real-world health care settings. It is important to emphasize that multidisciplinary expertise is essential for assessment, yet many assessors lack the necessary training. In addition, traditional evaluation methods struggle to keep pace with AI’s rapid development. To ensure successful AI integration, flexible, fast-tracked assessment processes and proper assessor training are needed to maintain rigorous standards while adapting to AI’s dynamic evolution. Trial Registration: reviewregistry1859; https://tinyurl.com/ysn2d7sh %M 39909417 %R 10.2196/67485 %U https://www.jmir.org/2025/1/e67485 %U https://doi.org/10.2196/67485 %U http://www.ncbi.nlm.nih.gov/pubmed/39909417 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 9 %N %P e56126 %T Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists’ Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study %A Bragazzi,Nicola Luigi %A Buchinger,Michèle %A Atwan,Hisham %A Tuma,Ruba %A Chirico,Francesco %A Szarpak,Lukasz %A Farah,Raymond %A Khamisy-Farah,Rola %+ Laboratory for Industrial and Applied Mathematics, Department of Mathematics and Statistics, York University, 4700 Keele Street, Toronto, ON, M3J 1P3, Canada, 1 416 736 2100, robertobragazzi@gmail.com %K COVID-19 %K vaccine %K reproductive health %K generative artificial intelligence %K large language model %K chatGPT %K google bard %K microsoft copilot %K vaccination %K natural language processing %K obstetric %K gynecology %K women %K text mining %K sentiment %K accuracy %K zero shot %K pregnancy %K readability %K infectious %D 2025 %7 5.2.2025 %9 Original Paper %J JMIR Form Res %G English %X Background: The COVID-19 pandemic has significantly strained health care systems globally, leading to an overwhelming influx of patients and exacerbating resource limitations. Concurrently, an “infodemic” of misinformation, particularly prevalent in women’s health, has emerged. This challenge has been pivotal for health care providers, especially gynecologists and obstetricians, in managing pregnant women’s health. The pandemic heightened risks for pregnant women from COVID-19, necessitating balanced advice from specialists on vaccine safety versus known risks. In addition, the advent of generative artificial intelligence (AI), such as large language models (LLMs), offers promising support in health care. However, they necessitate rigorous testing. Objective: This study aimed to assess LLMs’ proficiency, clarity, and objectivity regarding COVID-19’s impacts on pregnancy. Methods: This study evaluates 4 major AI prototypes (ChatGPT-3.5, ChatGPT-4, Microsoft Copilot, and Google Bard) using zero-shot prompts in a questionnaire validated among 159 Israeli gynecologists and obstetricians. The questionnaire assesses proficiency in providing accurate information on COVID-19 in relation to pregnancy. Text-mining, sentiment analysis, and readability (Flesch-Kincaid grade level and Flesch Reading Ease Score) were also conducted. Results: In terms of LLMs’ knowledge, ChatGPT-4 and Microsoft Copilot each scored 97% (32/33), Google Bard 94% (31/33), and ChatGPT-3.5 82% (27/33). ChatGPT-4 incorrectly stated an increased risk of miscarriage due to COVID-19. Google Bard and Microsoft Copilot had minor inaccuracies concerning COVID-19 transmission and complications. In the sentiment analysis, Microsoft Copilot achieved the least negative score (–4), followed by ChatGPT-4 (–6) and Google Bard (–7), while ChatGPT-3.5 obtained the most negative score (–12). Finally, concerning the readability analysis, Flesch-Kincaid Grade Level and Flesch Reading Ease Score showed that Microsoft Copilot was the most accessible at 9.9 and 49, followed by ChatGPT-4 at 12.4 and 37.1, while ChatGPT-3.5 (12.9 and 35.6) and Google Bard (12.9 and 35.8) generated particularly complex responses. Conclusions: The study highlights varying knowledge levels of LLMs in relation to COVID-19 and pregnancy. ChatGPT-3.5 showed the least knowledge and alignment with scientific evidence. Readability and complexity analyses suggest that each AI’s approach was tailored to specific audiences, with ChatGPT versions being more suitable for specialized readers and Microsoft Copilot for the general public. Sentiment analysis revealed notable variations in the way LLMs communicated critical information, underscoring the essential role of neutral and objective health care communication in ensuring that pregnant women, particularly vulnerable during the COVID-19 pandemic, receive accurate and reassuring guidance. Overall, ChatGPT-4, Microsoft Copilot, and Google Bard generally provided accurate, updated information on COVID-19 and vaccines in maternal and fetal health, aligning with health guidelines. The study demonstrated the potential role of AI in supplementing health care knowledge, with a need for continuous updating and verification of AI knowledge bases. The choice of AI tool should consider the target audience and required information detail level. %M 39794312 %R 10.2196/56126 %U https://formative.jmir.org/2025/1/e56126 %U https://doi.org/10.2196/56126 %U http://www.ncbi.nlm.nih.gov/pubmed/39794312 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 12 %N %P e56880 %T Capturing Requirements for a Data Annotation Tool for Intensive Care: Experimental User-Centered Design Study %A Wac,Marceli %A Santos-Rodriguez,Raul %A McWilliams,Chris %A Bourdeaux,Christopher %+ , Faculty of Engineering, University of Bristol, Queen's Building, University Walk, Bristol, BS8 1TR, United Kingdom, 44 1173315830, m.wac@bristol.ac.uk %K ICU %K intensive care %K machine learning %K data annotation %K data labeling %K annotation software %K capturing software requirements %D 2025 %7 5.2.2025 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Increasing use of computational methods in health care provides opportunities to address previously unsolvable problems. Machine learning techniques applied to routinely collected data can enhance clinical tools and improve patient outcomes, but their effective deployment comes with significant challenges. While some tasks can be addressed by training machine learning models directly on the collected data, more complex problems require additional input in the form of data annotations. Data annotation is a complex and time-consuming problem that requires domain expertise and frequently, technical proficiency. With clinicians’ time being an extremely limited resource, existing tools fail to provide an effective workflow for deployment in health care. Objective: This paper investigates the approach of intensive care unit staff to the task of data annotation. Specifically, it aims to (1) understand how clinicians approach data annotation and (2) capture the requirements for a digital annotation tool for the health care setting. Methods: We conducted an experimental activity involving annotation of the printed excerpts of real time-series admission data with 7 intensive care unit clinicians. Each participant annotated an identical set of admissions with the periods of weaning from mechanical ventilation during a single 45-minute workshop. Participants were observed during task completion and their actions were analyzed within Norman’s Interaction Cycle model to identify the software requirements. Results: Clinicians followed a cyclic process of investigation, annotation, data reevaluation, and label refinement. Variety of techniques were used to investigate data and create annotations. We identified 11 requirements for the digital tool across 4 domains: annotation of individual admissions (n=5), semiautomated annotation (n=3), operational constraints (n=2), and use of labels in machine learning (n=1). Conclusions: Effective data annotation in a clinical setting relies on flexibility in analysis and label creation and workflow continuity across multiple admissions. There is a need to ensure a seamless transition between data investigation, annotation, and refinement of the labels. %M 39908549 %R 10.2196/56880 %U https://humanfactors.jmir.org/2025/1/e56880 %U https://doi.org/10.2196/56880 %U http://www.ncbi.nlm.nih.gov/pubmed/39908549 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 11 %N %P e58161 %T AI in the Health Sector: Systematic Review of Key Skills for Future Health Professionals %A Gazquez-Garcia,Javier %A Sánchez-Bocanegra,Carlos Luis %A Sevillano,Jose Luis %K artificial intelligence %K healthcare competencies %K systematic review %K healthcare education %K AI regulation %D 2025 %7 5.2.2025 %9 %J JMIR Med Educ %G English %X Background: Technological advancements have significantly reshaped health care, introducing digital solutions that enhance diagnostics and patient care. Artificial intelligence (AI) stands out, offering unprecedented capabilities in data analysis, diagnostic support, and personalized medicine. However, effectively integrating AI into health care necessitates specialized competencies among professionals, an area still in its infancy in terms of comprehensive literature and formalized training programs. Objective: This systematic review aims to consolidate the essential skills and knowledge health care professionals need to integrate AI into their clinical practice effectively, according to the published literature. Methods: We conducted a systematic review, across databases PubMed, Scopus, and Web of Science, of peer-reviewed literature that directly explored the required skills for health care professionals to integrate AI into their practice, published in English or Spanish from 2018 onward. Studies that did not refer to specific skills or training in digital health were not included, discarding those that did not directly contribute to understanding the competencies necessary to integrate AI into health care practice. Bias in the examined works was evaluated following Cochrane’s domain-based recommendations. Results: The initial database search yielded a total of 2457 articles. After deleting duplicates and screening titles and abstracts, 37 articles were selected for full-text review. Out of these, only 7 met all the inclusion criteria for this systematic review. The review identified a diverse range of skills and competencies, that we categorized into 14 key areas classified based on their frequency of appearance in the selected studies, including AI fundamentals, data analytics and management, and ethical considerations. Conclusions: Despite the broadening of search criteria to capture the evolving nature of AI in health care, the review underscores a significant gap in focused studies on the required competencies. Moreover, the review highlights the critical role of regulatory bodies such as the US Food and Drug Administration in facilitating the adoption of AI technologies by establishing trust and standardizing algorithms. Key areas were identified for developing competencies among health care professionals for the implementation of AI, including: AI fundamentals knowledge (more focused on assessing the accuracy, reliability, and validity of AI algorithms than on more technical abilities such as programming or mathematics), data analysis skills (including data acquisition, cleaning, visualization, management, and governance), and ethical and legal considerations. In an AI-enhanced health care landscape, the ability to humanize patient care through effective communication is paramount. This balance ensures that while AI streamlines tasks and potentially increases patient interaction time, health care professionals maintain a focus on compassionate care, thereby leveraging AI to enhance, rather than detract from, the patient experience.  %R 10.2196/58161 %U https://mededu.jmir.org/2025/1/e58161 %U https://doi.org/10.2196/58161 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e58338 %T Challenges and Opportunities for Data Sharing Related to Artificial Intelligence Tools in Health Care in Low- and Middle-Income Countries: Systematic Review and Case Study From Thailand %A Kaushik,Aprajita %A Barcellona,Capucine %A Mandyam,Nikita Kanumoory %A Tan,Si Ying %A Tromp,Jasper %+ Saw Swee Hock School of Public Health, National University of Singapore, Tahir Foundation Building, 12 Science Drive 2, #10-01, Singapore, 117549, Singapore, 65 6516 4988, jasper_tromp@nus.edu.sg %K artificial intelligence %K data sharing %K health care %K low- and middle-income countries %K AI tools %K systematic review %K case study %K Thailand %K computing machinery %K academic experts %K technology developers %K health care providers %K internet connectivity %K data systems %K low health data literacy %K cybersecurity %K standardized data formats %K AI development %K PRISMA %D 2025 %7 4.2.2025 %9 Review %J J Med Internet Res %G English %X Background: Health care systems in low- and middle-income countries (LMICs) can greatly benefit from artificial intelligence (AI) interventions in various use cases such as diagnostics, treatment, and public health monitoring but face significant challenges in sharing data for developing and deploying AI in health care. Objective: This study aimed to identify barriers and enablers to data sharing for AI in health care in LMICs and to test the relevance of these in a local context. Methods: First, we conducted a systematic literature search using PubMed, SCOPUS, Embase, Web of Science, and ACM using controlled vocabulary. Primary research studies, perspectives, policy landscape analyses, and commentaries performed in or involving an LMIC context were included. Studies that lacked a clear connection to health information exchange systems or were not reported in English were excluded from the review. Two reviewers independently screened titles and abstracts of the included articles and critically appraised each study. All identified barriers and enablers were classified according to 7 categories as per the predefined framework—technical, motivational, economic, political, legal and policy, ethical, social, organisational, and managerial. Second, we tested the local relevance of barriers and enablers in Thailand through stakeholder interviews with 15 academic experts, technology developers, regulators, policy makers, and health care providers. The interviewers took notes and analyzed data using framework analysis. Coding procedures were standardized to enhance the reliability of our approach. Coded data were reverified and themes were readjusted where necessary to avoid researcher bias. Results: We identified 22 studies, the majority of which were conducted across Africa (n=12, 55%) and Asia (n=6, 27%). The most important data-sharing challenges were unreliable internet connectivity, lack of equipment, poor staff and management motivation, uneven resource distribution, and ethical concerns. Possible solutions included improving IT infrastructure, enhancing funding, introducing user-friendly software, and incentivizing health care organizations and personnel to share data for AI-related tools. In Thailand, inconsistent data systems, limited staff time, low health data literacy, complex and unclear policies, and cybersecurity issues were important data-sharing challenges. Key solutions included building a conducive digital ecosystem—having shared data input platforms for health facilities to ensure data uniformity and to develop easy-to-understand consent forms, having standardized guidelines for data sharing, and having compensation policies for data breach victims. Conclusions: Although AI in LMICs has the potential to overcome health inequalities, these countries face technical, political, legal, policy, and organizational barriers to sharing data, which impede effective AI development and deployment. When tested in a local context, most of these barriers were relevant. Although our findings might not be generalizable to other contexts, this study can be used by LMICs as a framework to identify barriers and strengths within their health care systems and devise localized solutions for enhanced data sharing. Trial Registration: PROSPERO CRD42022360644; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=360644 %M 39903508 %R 10.2196/58338 %U https://www.jmir.org/2025/1/e58338 %U https://doi.org/10.2196/58338 %U http://www.ncbi.nlm.nih.gov/pubmed/39903508 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 9 %N %P e54167 %T Summer Research Internship Curriculum to Promote Self-Efficacy, Researcher Identity, and Peer-to-Peer Learning: Retrospective Cohort Study %A Levites Strekalova,Yulia A %A Liu-Galvin,Rachel %A Border,Samuel %A Midence,Sara %A Khan,Mishal %A VanZanten,Maya %A Tomaszewski,John %A Jain,Sanjay %A Sarder,Pinaki %K artificial intelligence %K biomedical research %K curriculum %K training programs %K workforce %D 2025 %7 3.2.2025 %9 %J JMIR Form Res %G English %X Background: Common barriers to students’ persistence in research include experiencing feelings of exclusion and a lack of belonging, difficulties developing a robust researcher identity, perceptions of racial and social stigma directed toward them, and perceived gaps in research skills, which are particularly pronounced among trainees from groups traditionally underrepresented in research. To address these known barriers, summer research programs have been shown to increase the participation and retention of undergraduate students in research. However, previous programs have focused predominantly on technical knowledge and skills, without integrating an academic enrichment curriculum that promotes professional development by improving students’ academic and research communication skills. Objective: This retrospective pre-then-post study aimed to evaluate changes in self-reported ratings of research abilities among a cohort of undergraduate students who participated in a summer research program. Methods: The Human BioMolecular Atlas Program (HuBMAP) piloted the implementation of a web-based academic enrichment curriculum for the Summer 2023 Research Internship cohort, which was comprised of students from groups underrepresented in biomedical artificial intelligence research. HuBMAP, a 400-member research consortium funded by the Common Fund at the National Institutes of Health, offered a 10-week summer research internship that included an academic enrichment curriculum delivered synchronously via the web to all students across multiple sites. The curriculum is intended to support intern self-efficacy, researcher identity development, and peer-to-peer learning. At the end of the internship, students were invited to participate in a web-based survey in which they were asked to rate their academic and research abilities before the internship and as a result of the internship using a modified Entering Research Learning Assessment instrument. Wilcoxon matched-pairs signed rank test was performed to assess the difference in the mean scores per respondent before and after participating in the internship. Results: A total of 14 of the 22 undergraduate students who participated in the internship responded to the survey. The results of the retrospective pre-then-post survey indicated that there was a significant increase in students’ self-rated research abilities, evidenced by a significant improvement in the mean scores of the respondents when comparing reported skills self-assessment before and after the internship (improvement: median 1.09, IQR 0.88-1.65; W=52.5, P<.001). After participating in the HuBMAP web-based academic enrichment curriculum, students’ self-reported research abilities, including their confidence, their communication and collaboration skills, their self-efficacy in research, and their abilities to set research career goals, increased. Conclusions: Summer internship programs can incorporate an academic enrichment curriculum with small-group peer learning in addition to a laboratory-based experience to facilitate increased student engagement, self-efficacy, and a sense of belonging in the research community. Future research should investigate the impact of academic enrichment curricula and peer mentoring on the long-term retention of students in biomedical research careers, particularly retention of students underrepresented in biomedical fields. %R 10.2196/54167 %U https://formative.jmir.org/2025/1/e54167 %U https://doi.org/10.2196/54167 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e63377 %T Problems and Barriers Related to the Use of AI-Based Clinical Decision Support Systems: Interview Study %A Giebel,Godwin Denk %A Raszke,Pascal %A Nowak,Hartmuth %A Palmowski,Lars %A Adamzik,Michael %A Heinz,Philipp %A Tokic,Marianne %A Timmesfeld,Nina %A Brunkhorst,Frank %A Wasem,Jürgen %A Blase,Nikola %+ Institute for Healthcare Management and Research, University of Duisburg-Essen, Thea-Leymann-Str. 9, Essen, Germany, 49 201 18 331, godwin.giebel@medman.uni-due.de %K decision support %K artificial intelligence %K machine learning %K clinical decision support system %K digitalization %K health care %K technology %K innovation %K semistructured interview %K qualitative %K quality assurance %K web-based %K digital health %K health informatics %D 2025 %7 3.2.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Digitalization is currently revolutionizing health care worldwide. A promising technology in this context is artificial intelligence (AI). The application of AI can support health care providers in their daily work in various ways. The integration of AI is particularly promising in clinical decision support systems (CDSSs). While the opportunities of this technology are numerous, the problems should not be overlooked. Objective: This study aimed to identify challenges and barriers in the context of AI-based CDSSs from the perspectives of experts across various disciplines. Methods: Semistructured expert interviews were conducted with different stakeholders. These included representatives of patients, physicians and caregivers, developers of AI-based CDSSs, researchers (studying AI in health care and social and health law), quality management and quality assurance representatives, a representative of an ethics committee, a representative of a health insurance fund, and medical product consultants. The interviews took place on the web and were recorded, transcribed, and subsequently subjected to a qualitative content analysis based on the method by Kuckartz. The analysis was conducted using MAXQDA software. Initially, the problems were separated into “general,” “development,” and “clinical use.” Finally, a workshop within the project consortium served to systematize the identified problems. Results: A total of 15 expert interviews were conducted, and 309 expert statements with reference to problems and barriers in the context of AI-based CDSSs were identified. These emerged in 7 problem categories: technology (46/309, 14.9%), data (59/309, 19.1%), user (102/309, 33%), studies (17/309, 5.5%), ethics (20/309, 6.5%), law (33/309, 10.7%), and general (32/309, 10.4%). The problem categories were further divided into problem areas, which in turn comprised the respective problems. Conclusions: A large number of problems and barriers were identified in the context of AI-based CDSSs. These can be systematized according to the point at which they occur (“general,” “development,” and “clinical use”) or according to the problem category (“technology,” “data,” “user,” “studies,” “ethics,” “law,” and “general”). The problems identified in this work should be further investigated. They can be used as a basis for deriving solutions to optimize development, acceptance, and use of AI-based CDSSs. International Registered Report Identifier (IRRID): RR2-10.2196/preprints.62704 %M 39899342 %R 10.2196/63377 %U https://www.jmir.org/2025/1/e63377 %U https://doi.org/10.2196/63377 %U http://www.ncbi.nlm.nih.gov/pubmed/39899342 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e51785 %T Why AI Monitoring Faces Resistance and What Healthcare Organizations Can Do About It: An Emotion-Based Perspective %A Werder,Karl %A Cao,Lan %A Park,Eun Hee %A Ramesh,Balasubramaniam %+ , Digital Business Innovation, IT University of Copenhagen, Rued Langgaards Vej 7, Copenhagen, 2300, Denmark, 45 72185386, karw@itu.dk %K artificial intelligence %K AI monitoring %K emotion %K resistance %K health care %D 2025 %7 31.1.2025 %9 Viewpoint %J J Med Internet Res %G English %X Continuous monitoring of patients’ health facilitated by artificial intelligence (AI) has enhanced the quality of health care, that is, the ability to access effective care. However, AI monitoring often encounters resistance to adoption by decision makers. Healthcare organizations frequently assume that the resistance stems from patients’ rational evaluation of the technology’s costs and benefits. Recent research challenges this assumption and suggests that the resistance to AI monitoring is influenced by the emotional experiences of patients and their surrogate decision makers. We develop a framework from an emotional perspective, provide important implications for healthcare organizations, and offer recommendations to help reduce resistance to AI monitoring. %M 39889282 %R 10.2196/51785 %U https://www.jmir.org/2025/1/e51785 %U https://doi.org/10.2196/51785 %U http://www.ncbi.nlm.nih.gov/pubmed/39889282 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e59946 %T Effect of Artificial Intelligence Helpfulness and Uncertainty on Cognitive Interactions with Pharmacists: Randomized Controlled Trial %A Tsai,Chuan-Ching %A Kim,Jin Yong %A Chen,Qiyuan %A Rowell,Brigid %A Yang,X Jessie %A Kontar,Raed %A Whitaker,Megan %A Lester,Corey %+ Department of Clinical Pharmacy, College of Pharmacy, University of Michigan, 428 Church Street, Ann Arbor, MI, 48109, United States, 1 7346478849, lesterca@umich.edu %K CDSS %K eye-tracking %K medication verification %K uncertainty visualization %K AI helpfulness and accuracy %K artificial intelligence %K cognitive interactions %K clinical decision support system %K cognition %K pharmacists %K medication %K interaction %K decision-making %K cognitive processing %D 2025 %7 31.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Clinical decision support systems leveraging artificial intelligence (AI) are increasingly integrated into health care practices, including pharmacy medication verification. Communicating uncertainty in an AI prediction is viewed as an important mechanism for boosting human collaboration and trust. Yet, little is known about the effects on human cognition as a result of interacting with such types of AI advice. Objective: This study aimed to evaluate the cognitive interaction patterns of pharmacists during medication product verification when using an AI prototype. Moreover, we examine the impact of AI’s assistance, both helpful and unhelpful, and the communication of uncertainty of AI-generated results on pharmacists’ cognitive interaction with the prototype. Methods: In a randomized controlled trial, 30 pharmacists from professional networks each performed 200 medication verification tasks while their eye movements were recorded using an online eye tracker. Participants completed 100 verifications without AI assistance and 100 with AI assistance (either with black box help without uncertainty information or uncertainty-aware help, which displays AI uncertainty). Fixation patterns (first and last areas fixated, number of fixations, fixation duration, and dwell times) were analyzed in relation to AI help type and helpfulness. Results: Pharmacists shifted 19%-26% of their total fixations to AI-generated regions when these were available, suggesting the integration of AI advice in decision-making. AI assistance did not reduce the number of fixations on fill images, which remained the primary focus area. Unhelpful AI advice led to longer dwell times on reference and fill images, indicating increased cognitive processing. Displaying AI uncertainty led to longer cognitive processing times as measured by dwell times in original images. Conclusions: Unhelpful AI increases cognitive processing time in the original images. Transparency in AI is needed in “black box” systems, but showing more information can add a cognitive burden. Therefore, the communication of uncertainty should be optimized and integrated into clinical workflows using user-centered design to avoid increasing cognitive load or impeding clinicians’ original workflow. Trial Registration: ClinicalTrials.gov NCT06795477; https://clinicaltrials.gov/study/NCT06795477 %M 39888668 %R 10.2196/59946 %U https://www.jmir.org/2025/1/e59946 %U https://doi.org/10.2196/59946 %U http://www.ncbi.nlm.nih.gov/pubmed/39888668 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e66896 %T Assessing the Adherence of ChatGPT Chatbots to Public Health Guidelines for Smoking Cessation: Content Analysis %A Abroms,Lorien C %A Yousefi,Artin %A Wysota,Christina N %A Wu,Tien-Chin %A Broniatowski,David A %+ Department of Prevention & Community Health, Milken Institute School of Public Health, George Washington University, 950 New Hampshire Avenue NW, Washington, DC, 20052, United States, 1 202 9943518, lorien@gwu.edu %K ChatGPT %K large language models %K chatbots %K tobacco %K smoking cessation %K cigarettes %K artificial intelligence %D 2025 %7 30.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Large language model (LLM) artificial intelligence chatbots using generative language can offer smoking cessation information and advice. However, little is known about the reliability of the information provided to users. Objective: This study aims to examine whether 3 ChatGPT chatbots—the World Health Organization’s Sarah, BeFreeGPT, and BasicGPT—provide reliable information on how to quit smoking. Methods: A list of quit smoking queries was generated from frequent quit smoking searches on Google related to “how to quit smoking” (n=12). Each query was given to each chatbot, and responses were analyzed for their adherence to an index developed from the US Preventive Services Task Force public health guidelines for quitting smoking and counseling principles. Responses were independently coded by 2 reviewers, and differences were resolved by a third coder. Results: Across chatbots and queries, on average, chatbot responses were rated as being adherent to 57.1% of the items on the adherence index. Sarah’s adherence (72.2%) was significantly higher than BeFreeGPT (50%) and BasicGPT (47.8%; P<.001). The majority of chatbot responses had clear language (97.3%) and included a recommendation to seek out professional counseling (80.3%). About half of the responses included the recommendation to consider using nicotine replacement therapy (52.7%), the recommendation to seek out social support from friends and family (55.6%), and information on how to deal with cravings when quitting smoking (44.4%). The least common was information about considering the use of non–nicotine replacement therapy prescription drugs (14.1%). Finally, some types of misinformation were present in 22% of responses. Specific queries that were most challenging for the chatbots included queries on “how to quit smoking cold turkey,” “...with vapes,” “...with gummies,” “...with a necklace,” and “...with hypnosis.” All chatbots showed resilience to adversarial attacks that were intended to derail the conversation. Conclusions: LLM chatbots varied in their adherence to quit-smoking guidelines and counseling principles. While chatbots reliably provided some types of information, they omitted other types, as well as occasionally provided misinformation, especially for queries about less evidence-based methods of quitting. LLM chatbot instructions can be revised to compensate for these weaknesses. %M 39883917 %R 10.2196/66896 %U https://www.jmir.org/2025/1/e66896 %U https://doi.org/10.2196/66896 %U http://www.ncbi.nlm.nih.gov/pubmed/39883917 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67346 %T Risk Factors for Gastrointestinal Bleeding in Patients With Acute Myocardial Infarction: Multicenter Retrospective Cohort Study %A Kou,Yanqi %A Ye,Shicai %A Tian,Yuan %A Yang,Ke %A Qin,Ling %A Huang,Zhe %A Luo,Botao %A Ha,Yanping %A Zhan,Liping %A Ye,Ruyin %A Huang,Yujie %A Zhang,Qing %A He,Kun %A Liang,Mouji %A Zheng,Jieming %A Huang,Haoyuan %A Wu,Chunyi %A Ge,Lei %A Yang,Yuping %+ Department of Gastroenterology, Affiliated Hospital of Guangdong Medical University, No. 2 Wenming East Road, Xiashan, Zhanjiang, Zhanjiang, 524000, China, 1 13106629993, yangyupingchn@163.com %K acute myocardial infarction %K gastrointestinal bleeding %K machine learning %K in-hospital %K prediction model %D 2025 %7 30.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Gastrointestinal bleeding (GIB) is a severe and potentially life-threatening complication in patients with acute myocardial infarction (AMI), significantly affecting prognosis during hospitalization. Early identification of high-risk patients is essential to reduce complications, improve outcomes, and guide clinical decision-making. Objective: This study aimed to develop and validate a machine learning (ML)–based model for predicting in-hospital GIB in patients with AMI, identify key risk factors, and evaluate the clinical applicability of the model for risk stratification and decision support. Methods: A multicenter retrospective cohort study was conducted, including 1910 patients with AMI from the Affiliated Hospital of Guangdong Medical University (2005-2024). Patients were divided into training (n=1575) and testing (n=335) cohorts based on admission dates. For external validation, 1746 patients with AMI were included in the publicly available MIMIC-IV (Medical Information Mart for Intensive Care IV) database. Propensity score matching was adjusted for demographics, and the Boruta algorithm identified key predictors. A total of 7 ML algorithms—logistic regression, k-nearest neighbors, support vector machine, decision tree, random forest (RF), extreme gradient boosting, and neural networks—were trained using 10-fold cross-validation. The models were evaluated for the area under the receiver operating characteristic curve, accuracy, sensitivity, specificity, recall, F1-score, and decision curve analysis. Shapley additive explanations analysis ranked variable importance. Kaplan-Meier survival analysis evaluated the impact of GIB on short-term survival. Multivariate logistic regression assessed the relationship between coronary heart disease (CHD) and in-hospital GIB after adjusting for clinical variables. Results: The RF model outperformed other ML models, achieving an area under the receiver operating characteristic curve of 0.77 in the training cohort, 0.77 in the testing cohort, and 0.75 in the validation cohort. Key predictors included red blood cell count, hemoglobin, maximal myoglobin, hematocrit, CHD, and other variables, all of which were strongly associated with GIB risk. Decision curve analysis demonstrated the clinical use of the RF model for early risk stratification. Kaplan-Meier survival analysis showed no significant differences in 7- and 15-day survival rates between patients with AMI with and without GIB (P=.83 for 7-day survival and P=.87 for 15-day survival). Multivariate logistic regression showed that CHD was an independent risk factor for in-hospital GIB (odds ratio 2.79, 95% CI 2.09-3.74). Stratified analyses by sex, age, occupation, marital status, and other subgroups consistently showed that the association between CHD and GIB remained robust across all subgroups. Conclusions: The ML-based RF model provides a robust and clinically applicable tool for predicting in-hospital GIB in patients with AMI. By leveraging routinely available clinical and laboratory data, the model supports early risk stratification and personalized preventive strategies. %M 39883922 %R 10.2196/67346 %U https://www.jmir.org/2025/1/e67346 %U https://doi.org/10.2196/67346 %U http://www.ncbi.nlm.nih.gov/pubmed/39883922 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 14 %N %P e62704 %T User-Oriented Requirements for Artificial Intelligence–Based Clinical Decision Support Systems in Sepsis: Protocol for a Multimethod Research Project %A Raszke,Pascal %A Giebel,Godwin Denk %A Abels,Carina %A Wasem,Jürgen %A Adamzik,Michael %A Nowak,Hartmuth %A Palmowski,Lars %A Heinz,Philipp %A Mreyen,Silke %A Timmesfeld,Nina %A Tokic,Marianne %A Brunkhorst,Frank Martin %A Blase,Nikola %+ Institute for Health Care Management and Research, University of Duisburg-Essen, Thea-Leymann-Str. 9, Essen, 45127, Germany, 49 201 183 4395, Pascal.Raszke@medman.uni-due.de %K medical informatics %K artificial intelligence %K machine learning %K computational intelligence %K clinical decision support systems %K CDSS %K decision support %K sepsis %K bloodstream infection %D 2025 %7 30.1.2025 %9 Protocol %J JMIR Res Protoc %G English %X Background: Artificial intelligence (AI)–based clinical decision support systems (CDSS) have been developed for several diseases. However, despite the potential to improve the quality of care and thereby positively impact patient-relevant outcomes, the majority of AI-based CDSS have not been adopted in standard care. Possible reasons for this include barriers in the implementation and a nonuser-oriented development approach, resulting in reduced user acceptance. Objective: This research project has 2 objectives. First, problems and corresponding solutions that hinder or support the development and implementation of AI-based CDSS are identified. Second, the research project aims to increase user acceptance by creating a user-oriented requirement profile, using the example of sepsis. Methods: The research project is based on a multimethod approach combining (1) a scoping review, (2) focus groups with physicians and professional caregivers, and (3) semistructured interviews with relevant stakeholders. The research modules mentioned provide the basis for the development of a (4) survey, including a discrete choice experiment (DCE) with physicians. A minimum of 6667 physicians with expertise in the clinical picture of sepsis are contacted for this purpose. The survey is followed by the development of a requirement profile for AI-based CDSS and the derivation of policy recommendations for action, which are evaluated in a (5) expert roundtable discussion. Results: The multimethod research project started in November 2022. It provides an overview of the barriers and corresponding solutions related to the development and implementation of AI-based CDSS. Using sepsis as an example, a user-oriented requirement profile for AI-based CDSS is developed. The scoping review has been concluded and the qualitative modules have been subjected to analysis. The start of the survey, including the DCE, was at the end of July 2024. Conclusions: The results of the research project represent the first attempt to create a comprehensive user-oriented requirement profile for the development of sepsis-specific AI-based CDSS. In addition, general recommendations are derived, in order to reduce barriers in the development and implementation of AI-based CDSS. The findings of this research project have the potential to facilitate the integration of AI-based CDSS into standard care in the long term. International Registered Report Identifier (IRRID): DERR1-10.2196/62704 %M 39883929 %R 10.2196/62704 %U https://www.researchprotocols.org/2025/1/e62704 %U https://doi.org/10.2196/62704 %U http://www.ncbi.nlm.nih.gov/pubmed/39883929 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e60138 %T Designing Health Recommender Systems to Promote Health Equity: A Socioecological Perspective %A Figueroa,Caroline A %A Torkamaan,Helma %A Bhattacharjee,Ananya %A Hauptmann,Hanna %A Guan,Kathleen W %A Sedrakyan,Gayane %+ Faculty of Technology, Policy and Management, Delft University of Technology, Jaffalaan 5, Delft, 2628, Netherlands, 31 621378688, c.figueroa@tudelft.nl %K digital health %K health promotion %K health recommender systems %K artificial intelligence %K health equity %K AI %K digital devices %K socioecological %K health inequities %K health behavior %K health behaviors %K patient centric %K digital health intervention %D 2025 %7 30.1.2025 %9 Viewpoint %J J Med Internet Res %G English %X Health recommender systems (HRS) have the capability to improve human-centered care and prevention by personalizing content, such as health interventions or health information. HRS, an emerging and developing field, can play a unique role in the digital health field as they can offer relevant recommendations, not only based on what users themselves prefer and may be receptive to, but also using data about wider spheres of influence over human behavior, including peers, families, communities, and societies. We identify and discuss how HRS could play a unique role in decreasing health inequities. We use the socioecological model, which provides representations of how multiple, nested levels of influence (eg, community, institutional, and policy factors) interact to shape individual health. This perspective helps illustrate how HRS could address not just individual health factors but also the structural barriers—such as access to health care, social support, and access to healthy food—that shape health outcomes at various levels. Based on this analysis, we then discuss the challenges and future research priorities. We find that despite the potential for targeting more complex systemic challenges to obtaining good health, current HRS are still focused on individual health behaviors, often do not integrate the lived experiences of users in the design, and have had limited reach and effectiveness for individuals from low socioeconomic status and racial or ethnic minoritized backgrounds. In this viewpoint, we argue that a new design paradigm is necessary in which HRS focus on incorporating structural barriers to good health in addition to user preferences. HRS should be designed with an emphasis on health systems, which also includes incorporating decolonial perspectives of well-being that challenge prevailing medical models. Furthermore, potential lies in evaluating the health equity effects of HRS and leveraging collected data to influence policy. With changes in practices and with an intentional equity focus, HRS could play a crucial role in health promotion and decreasing health inequities. %M 39883934 %R 10.2196/60138 %U https://www.jmir.org/2025/1/e60138 %U https://doi.org/10.2196/60138 %U http://www.ncbi.nlm.nih.gov/pubmed/39883934 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e58760 %T Identification of Intracranial Germ Cell Tumors Based on Facial Photos: Exploratory Study on the Use of Deep Learning for Software Development %A Li,Yanong %A He,Yixuan %A Liu,Yawei %A Wang,Bingchen %A Li,Bo %A Qiu,Xiaoguang %+ Department of Radiation Oncology, Beijing Tiantan Hospital, Capital Medical University, 119 West Southern 4th Ring Road, Fengtai District, Beijing, 100070, China, 86 10 59975581, qiuxiaoguang@bjtth.org %K deep learning %K facial recognition %K intracranial germ cell tumors %K endocrine indicators %K software development %K artificial intelligence %K machine learning models %K software engineering %K neural networks %K algorithms %K cohort studies %D 2025 %7 30.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Primary intracranial germ cell tumors (iGCTs) are highly malignant brain tumors that predominantly occur in children and adolescents, with an incidence rate ranking third among primary brain tumors in East Asia (8%-15%). Due to their insidious onset and impact on critical functional areas of the brain, these tumors often result in irreversible abnormalities in growth and development, as well as cognitive and motor impairments in affected children. Therefore, early diagnosis through advanced screening techniques is vital for improving patient outcomes and quality of life. Objective: This study aimed to investigate the application of facial recognition technology in the early detection of iGCTs in children and adolescents. Early diagnosis through advanced screening techniques is vital for improving patient outcomes and quality of life. Methods: A multicenter, phased approach was adopted for the development and validation of a deep learning model, GVisageNet, dedicated to the screening of midline brain tumors from normal controls (NCs) and iGCTs from other midline brain tumors. The study comprised the collection and division of datasets into training (n=847, iGCTs=358, NCs=300, other midline brain tumors=189) and testing (n=212, iGCTs=79, NCs=70, other midline brain tumors=63), with an additional independent validation dataset (n=336, iGCTs=130, NCs=100, other midline brain tumors=106) sourced from 4 medical institutions. A regression model using clinically relevant, statistically significant data was developed and combined with GVisageNet outputs to create a hybrid model. This integration sought to assess the incremental value of clinical data. The model’s predictive mechanisms were explored through correlation analyses with endocrine indicators and stratified evaluations based on the degree of hypothalamic-pituitary-target axis damage. Performance metrics included area under the curve (AUC), accuracy, sensitivity, and specificity. Results: On the independent validation dataset, GVisageNet achieved an AUC of 0.938 (P<.01) in distinguishing midline brain tumors from NCs. Further, GVisageNet demonstrated significant diagnostic capability in distinguishing iGCTs from the other midline brain tumors, achieving an AUC of 0.739, which is superior to the regression model alone (AUC=0.632, P<.001) but less than the hybrid model (AUC=0.789, P=.04). Significant correlations were found between the GVisageNet’s outputs and 7 endocrine indicators. Performance varied with hypothalamic-pituitary-target axis damage, indicating a further understanding of the working mechanism of GVisageNet. Conclusions: GVisageNet, capable of high accuracy both independently and with clinical data, shows substantial potential for early iGCTs detection, highlighting the importance of combining deep learning with clinical insights for personalized health care. %M 39883924 %R 10.2196/58760 %U https://www.jmir.org/2025/1/e58760 %U https://doi.org/10.2196/58760 %U http://www.ncbi.nlm.nih.gov/pubmed/39883924 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 14 %N %P e63775 %T Evolution of Artificial Intelligence in Medical Education From 2000 to 2024: Bibliometric Analysis %A Li,Rui %A Wu,Tong %+ Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, 1095 Jiefang Road, Wuhan, 430030, China, 86 13018044651, tongwu66@tjh.tjmu.edu.cn %K artificial intelligence %K medical education %K bibliometric %K citation trends %K academic pattern %K VOSviewer %K Citespace %K AI %D 2025 %7 30.1.2025 %9 Original Paper %J Interact J Med Res %G English %X Background: Incorporating artificial intelligence (AI) into medical education has gained significant attention for its potential to enhance teaching and learning outcomes. However, it lacks a comprehensive study depicting the academic performance and status of AI in the medical education domain. Objective: This study aims to analyze the social patterns, productive contributors, knowledge structure, and clusters since the 21st century. Methods: Documents were retrieved from the Web of Science Core Collection database from 2000 to 2024. VOSviewer, Incites, and Citespace were used to analyze the bibliometric metrics, which were categorized by country, institution, authors, journals, and keywords. The variables analyzed encompassed counts, citations, H-index, impact factor, and collaboration metrics. Results: Altogether, 7534 publications were initially retrieved and 2775 were included for analysis. The annual count and citation of papers exhibited exponential trends since 2018. The United States emerged as the lead contributor due to its high productivity and recognition levels. Stanford University, Johns Hopkins University, National University of Singapore, Mayo Clinic, University of Arizona, and University of Toronto were representative institutions in their respective fields. Cureus, JMIR Medical Education, Medical Teacher, and BMC Medical Education ranked as the top four most productive journals. The resulting heat map highlighted several high-frequency keywords, including performance, education, AI, and model. The citation burst time of terms revealed that AI technologies shifted from imaging processing (2000), augmented reality (2013), and virtual reality (2016) to decision-making (2020) and model (2021). Keywords such as mortality and robotic surgery persisted into 2023, suggesting the ongoing recognition and interest in these areas. Conclusions: This study provides valuable insights and guidance for researchers who are interested in educational technology, as well as recommendations for pioneering institutions and journal submissions. Along with the rapid growth of AI, medical education is expected to gain much more benefits. %M 39883926 %R 10.2196/63775 %U https://www.i-jmr.org/2025/1/e63775 %U https://doi.org/10.2196/63775 %U http://www.ncbi.nlm.nih.gov/pubmed/39883926 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e54601 %T Using Large Language Models to Detect and Understand Drug Discontinuation Events in Web-Based Forums: Development and Validation Study %A Trevena,William %A Zhong,Xiang %A Alvarado,Michelle %A Semenov,Alexander %A Oktay,Alp %A Devlin,Devin %A Gohil,Aarya Yogesh %A Chittimouju,Sai Harsha %+ Department of Industrial and Systems Engineering, The University of Florida, PO BOX 115002, GAINESVILLE, FL, 32611-5002, United States, 1 3523922477, xiang.zhong@ise.ufl.edu %K natural language processing %K large language models %K ChatGPT %K drug discontinuation events %K zero-shot classification %K artificial intelligence %K AI %D 2025 %7 30.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: The implementation of large language models (LLMs), such as BART (Bidirectional and Auto-Regressive Transformers) and GPT-4, has revolutionized the extraction of insights from unstructured text. These advancements have expanded into health care, allowing analysis of social media for public health insights. However, the detection of drug discontinuation events (DDEs) remains underexplored. Identifying DDEs is crucial for understanding medication adherence and patient outcomes. Objective: The aim of this study is to provide a flexible framework for investigating various clinical research questions in data-sparse environments. We provide an example of the utility of this framework by identifying DDEs and their root causes in an open-source web-based forum, MedHelp, and by releasing the first open-source DDE datasets to aid further research in this domain. Methods: We used several LLMs, including GPT-4 Turbo, GPT-4o, DeBERTa (Decoding-Enhanced Bidirectional Encoder Representations from Transformer with Disentangled Attention), and BART, among others, to detect and determine the root causes of DDEs in user comments posted on MedHelp. Our study design included the use of zero-shot classification, which allows these models to make predictions without task-specific training. We split user comments into sentences and applied different classification strategies to assess the performance of these models in identifying DDEs and their root causes. Results: Among the selected models, GPT-4o performed the best at determining the root causes of DDEs, predicting only 12.9% of root causes incorrectly (hamming loss). Among the open-source models tested, BART demonstrated the best performance in detecting DDEs, achieving an F1-score of 0.86, a false positive rate of 2.8%, and a false negative rate of 6.5%, all without any fine-tuning. The dataset included 10.7% (107/1000) DDEs, emphasizing the models’ robustness in an imbalanced data context. Conclusions: This study demonstrated the effectiveness of open- and closed-source LLMs, such as GPT-4o and BART, for detecting DDEs and their root causes from publicly accessible data through zero-shot classification. The robust and scalable framework we propose can aid researchers in addressing data-sparse clinical research questions. The launch of open-access DDE datasets has the potential to stimulate further research and novel discoveries in this field. %M 39883487 %R 10.2196/54601 %U https://www.jmir.org/2025/1/e54601 %U https://doi.org/10.2196/54601 %U http://www.ncbi.nlm.nih.gov/pubmed/39883487 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 13 %N %P e60521 %T The Use of Artificial Intelligence and Wearable Inertial Measurement Units in Medicine: Systematic Review %A Smits Serena,Ricardo %A Hinterwimmer,Florian %A Burgkart,Rainer %A von Eisenhart-Rothe,Rudiger %A Rueckert,Daniel %+ Department of Orthopaedics and Sports Orthopaedics, Klinikum rechts der Isar, Technical University of Munich, Ismaninger Strasse 22, Munich, 81675, Germany, 49 8941402271, ricardo.smits@tum.de %K artificial intelligence %K accelerometer %K gyroscope %K IMUs %K time series data %K wearable %K systematic review %K patient care %K machine learning %K data collection %D 2025 %7 29.1.2025 %9 Review %J JMIR Mhealth Uhealth %G English %X Background: Artificial intelligence (AI) has already revolutionized the analysis of image, text, and tabular data, bringing significant advances across many medical sectors. Now, by combining with wearable inertial measurement units (IMUs), AI could transform health care again by opening new opportunities in patient care and medical research. Objective: This systematic review aims to evaluate the integration of AI models with wearable IMUs in health care, identifying current applications, challenges, and future opportunities. The focus will be on the types of models used, the characteristics of the datasets, and the potential for expanding and enhancing the use of this technology to improve patient care and advance medical research. Methods: This study examines this synergy of AI models and IMU data by using a systematic methodology, following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, to explore 3 core questions: (1) Which medical fields are most actively researching AI and IMU data? (2) Which models are being used in the analysis of IMU data within these medical fields? (3) What are the characteristics of the datasets used for in this fields? Results: The median dataset size is of 50 participants, which poses significant limitations for AI models given their dependency on large datasets for effective training and generalization. Furthermore, our analysis reveals the current dominance of machine learning models in 76% on the surveyed studies, suggesting a preference for traditional models like linear regression, support vector machine, and random forest, but also indicating significant growth potential for deep learning models in this area. Impressively, 93% of the studies used supervised learning, revealing an underuse of unsupervised learning, and indicating an important area for future exploration on discovering hidden patterns and insights without predefined labels or outcomes. In addition, there was a preference for conducting studies in clinical settings (77%), rather than in real-life scenarios, a choice that, along with the underapplication of the full potential of wearable IMUs, is recognized as a limitation in terms of practical applicability. Furthermore, the focus of 65% of the studies on neurological issues suggests an opportunity to broaden research scope to other clinical areas such as musculoskeletal applications, where AI could have significant impacts. Conclusions: In conclusion, the review calls for a collaborative effort to address the highlighted challenges, including improvements in data collection, increasing dataset sizes, a move that inherently pushes the field toward the adoption of more complex deep learning models, and the expansion of the application of AI models on IMU data methodologies across various medical fields. This approach aims to enhance the reliability, generalizability, and clinical applicability of research findings, ultimately improving patient outcomes and advancing medical research. %M 39880389 %R 10.2196/60521 %U https://mhealth.jmir.org/2025/1/e60521 %U https://doi.org/10.2196/60521 %U http://www.ncbi.nlm.nih.gov/pubmed/39880389 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 9 %N %P e66330 %T Estimating the Prevalence of Schizophrenia in the General Population of Japan Using an Artificial Neural Network–Based Schizophrenia Classifier: Web-Based Cross-Sectional Survey %A Choomung,Pichsinee %A He,Yupeng %A Matsunaga,Masaaki %A Sakuma,Kenji %A Kishi,Taro %A Li,Yuanying %A Tanihara,Shinichi %A Iwata,Nakao %A Ota,Atsuhiko %K schizophrenia %K schizophrenic %K prevalence %K artificial neural network %K neural network %K neural networks %K ANN %K deep learning %K machine learning %K SZ classifier %K web-based survey %K epidemiology %K epidemiological %K Japan %K classifiers %K mental illness %K mental disorder %K mental health %D 2025 %7 29.1.2025 %9 %J JMIR Form Res %G English %X Background: Estimating the prevalence of schizophrenia in the general population remains a challenge worldwide, as well as in Japan. Few studies have estimated schizophrenia prevalence in the Japanese population and have often relied on reports from hospitals and self-reported physician diagnoses or typical schizophrenia symptoms. These approaches are likely to underestimate the true prevalence owing to stigma, poor insight, or lack of access to health care among respondents. To address these issues, we previously developed an artificial neural network (ANN)–based schizophrenia classification model (SZ classifier) using data from a large-scale Japanese web-based survey to enhance the comprehensiveness of schizophrenia case identification in the general population. In addition, we also plan to introduce a population-based survey to collect general information and sample participants matching the population’s demographic structure, thereby achieving a precise estimate of the prevalence of schizophrenia in Japan. Objective: This study aimed to estimate the prevalence of schizophrenia by applying the SZ classifier to random samples from the Japanese population. Methods: We randomly selected a sample of 750 participants where the age, sex, and regional distributions were similar to Japan’s demographic structure from a large-scale Japanese web-based survey. Demographic data, health-related backgrounds, physical comorbidities, psychiatric comorbidities, and social comorbidities were collected and applied to the SZ classifier, as this information was also used for developing the SZ classifier. The crude prevalence of schizophrenia was calculated through the proportion of positive cases detected by the SZ classifier. The crude estimate was further refined by excluding false-positive cases and including false-negative cases to determine the actual prevalence of schizophrenia. Results: Out of 750 participants, 62 were classified as schizophrenia cases by the SZ classifier, resulting in a crude prevalence of schizophrenia in the general population of Japan of 8.3% (95% CI 6.6%-10.1%). Among these 62 cases, 53 were presumed to be false positives, and 3 were presumed to be false negatives. After adjustment, the actual prevalence of schizophrenia in the general population was estimated to be 1.6% (95% CI 0.7%-2.5%). Conclusions: This estimated prevalence was slightly higher than that reported in previous studies, possibly due to a more comprehensive disease classification methodology or, conversely, model limitations. This study demonstrates the capability of an ANN-based model to improve the estimation of schizophrenia prevalence in the general population, offering a novel approach to public health analysis. %R 10.2196/66330 %U https://formative.jmir.org/2025/1/e66330 %U https://doi.org/10.2196/66330 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 14 %N %P e62865 %T Exploring the Credibility of Large Language Models for Mental Health Support: Protocol for a Scoping Review %A Gautam,Dipak %A Kellmeyer,Philipp %+ Data and Web Science Group, School of Business Informatics and Mathematics, University of Manneim, B6, 26, Mannheim, D-68159, Germany, 49 621181 ext 2422, philipp.kellmeyer@uni-mannheim.de %K large language model %K LLM %K mental health %K explainability %K credibility %K mobile phone %D 2025 %7 29.1.2025 %9 Protocol %J JMIR Res Protoc %G English %X Background: The rapid evolution of large language models (LLMs), such as Bidirectional Encoder Representations from Transformers (BERT; Google) and GPT (OpenAI), has introduced significant advancements in natural language processing. These models are increasingly integrated into various applications, including mental health support. However, the credibility of LLMs in providing reliable and explainable mental health information and support remains underexplored. Objective: This scoping review systematically maps the factors influencing the credibility of LLMs in mental health support, including reliability, explainability, and ethical considerations. The review is expected to offer critical insights for practitioners, researchers, and policy makers, guiding future research and policy development. These findings will contribute to the responsible integration of LLMs into mental health care, with a focus on maintaining ethical standards and user trust. Methods: This review follows PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines and the Joanna Briggs Institute (JBI) methodology. Eligibility criteria include studies that apply transformer-based generative language models in mental health support, such as BERT and GPT. Sources include PsycINFO, MEDLINE via PubMed, Web of Science, IEEE Xplore, and ACM Digital Library. A systematic search of studies from 2019 onward will be conducted and updated until October 2024. Data will be synthesized qualitatively. The Population, Concept, and Context framework will guide the inclusion criteria. Two independent reviewers will screen and extract data, resolving discrepancies through discussion. Data will be synthesized and presented descriptively. Results: As of September 2024, this study is currently in progress, with the systematic search completed and the screening phase ongoing. We expect to complete data extraction by early November 2024 and synthesis by late November 2024. Conclusions: This scoping review will map the current evidence on the credibility of LLMs in mental health support. It will identify factors influencing the reliability, explainability, and ethical considerations of these models, providing insights for practitioners, researchers, policy makers, and users. These findings will fill a critical gap in the literature and inform future research, practice, and policy development, ensuring the responsible integration of LLMs in mental health services. International Registered Report Identifier (IRRID): DERR1-10.2196/62865 %M 39879615 %R 10.2196/62865 %U https://www.researchprotocols.org/2025/1/e62865 %U https://doi.org/10.2196/62865 %U http://www.ncbi.nlm.nih.gov/pubmed/39879615 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 11 %N %P e63809 %T An Explainable Artificial Intelligence Text Classifier for Suicidality Prediction in Youth Crisis Text Line Users: Development and Validation Study %A Thomas,Julia %A Lucht,Antonia %A Segler,Jacob %A Wundrack,Richard %A Miché,Marcel %A Lieb,Roselind %A Kuchinke,Lars %A Meinlschmidt,Gunther %+ Division of Clinical Psychology and Epidemiology, Faculty of Psychology, University of Basel, Missionsstrasse 60/62, Basel, 4055, Switzerland, 49 30 57714627, julia.thomas@krisenchat.de %K deep learning %K explainable artificial intelligence (XAI) %K large language model (LLM) %K machine learning %K neural network %K prevention %K risk monitoring %K suicide %K transformer model %K suicidality %K suicidal ideation %K self-murder %K self-harm %K youth %K adolescent %K adolescents %K public health %K language model %K language models %K chat protocols %K crisis helpline %K help-seeking behaviors %K German %K Shapley %K decision-making %K mental health %K health informatics %K mobile phone %D 2025 %7 29.1.2025 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Suicide represents a critical public health concern, and machine learning (ML) models offer the potential for identifying at-risk individuals. Recent studies using benchmark datasets and real-world social media data have demonstrated the capability of pretrained large language models in predicting suicidal ideation and behaviors (SIB) in speech and text. Objective: This study aimed to (1) develop and implement ML methods for predicting SIBs in a real-world crisis helpline dataset, using transformer-based pretrained models as a foundation; (2) evaluate, cross-validate, and benchmark the model against traditional text classification approaches; and (3) train an explainable model to highlight relevant risk-associated features. Methods: We analyzed chat protocols from adolescents and young adults (aged 14-25 years) seeking assistance from a German crisis helpline. An ML model was developed using a transformer-based language model architecture with pretrained weights and long short-term memory layers. The model predicted suicidal ideation (SI) and advanced suicidal engagement (ASE), as indicated by composite Columbia-Suicide Severity Rating Scale scores. We compared model performance against a classical word-vector-based ML model. We subsequently computed discrimination, calibration, clinical utility, and explainability information using a Shapley Additive Explanations value-based post hoc estimation model. Results: The dataset comprised 1348 help-seeking encounters (1011 for training and 337 for testing). The transformer-based classifier achieved a macroaveraged area under the curve (AUC) receiver operating characteristic (ROC) of 0.89 (95% CI 0.81-0.91) and an overall accuracy of 0.79 (95% CI 0.73-0.99). This performance surpassed the word-vector-based baseline model (AUC-ROC=0.77, 95% CI 0.64-0.90; accuracy=0.61, 95% CI 0.61-0.80). The transformer model demonstrated excellent prediction for nonsuicidal sessions (AUC-ROC=0.96, 95% CI 0.96-0.99) and good prediction for SI and ASE, with AUC-ROCs of 0.85 (95% CI 0.97-0.86) and 0.87 (95% CI 0.81-0.88), respectively. The Brier Skill Score indicated a 44% improvement in classification performance over the baseline model. The Shapley Additive Explanations model identified language features predictive of SIBs, including self-reference, negation, expressions of low self-esteem, and absolutist language. Conclusions: Neural networks using large language model–based transfer learning can accurately identify SI and ASE. The post hoc explainer model revealed language features associated with SI and ASE. Such models may potentially support clinical decision-making in suicide prevention services. Future research should explore multimodal input features and temporal aspects of suicide risk. %M 39879608 %R 10.2196/63809 %U https://publichealth.jmir.org/2025/1/e63809 %U https://doi.org/10.2196/63809 %U http://www.ncbi.nlm.nih.gov/pubmed/39879608 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e57723 %T Transformers for Neuroimage Segmentation: Scoping Review %A Iratni,Maya %A Abdullah,Amira %A Aldhaheri,Mariam %A Elharrouss,Omar %A Abd-alrazaq,Alaa %A Rustamov,Zahiriddin %A Zaki,Nazar %A Damseh,Rafat %+ Department of Computer Science and Software Engineering, United Arab Emirates University, Sheik Khalifa Bin Zayed St - 'Asharij - Shiebat Al Oud - Abu Dhabi, Al Ain, 15551, United Arab Emirates, 971 37135586, rdamseh@uaeu.ac.ae %K 3D segmentation %K brain tumor segmentation %K deep learning %K neuroimaging %K transformer %D 2025 %7 29.1.2025 %9 Review %J J Med Internet Res %G English %X Background: Neuroimaging segmentation is increasingly important for diagnosing and planning treatments for neurological diseases. Manual segmentation is time-consuming, apart from being prone to human error and variability. Transformers are a promising deep learning approach for automated medical image segmentation. Objective: This scoping review will synthesize current literature and assess the use of various transformer models for neuroimaging segmentation. Methods: A systematic search in major databases, including Scopus, IEEE Xplore, PubMed, and ACM Digital Library, was carried out for studies applying transformers to neuroimaging segmentation problems from 2019 through 2023. The inclusion criteria allow only for peer-reviewed journal papers and conference papers focused on transformer-based segmentation of human brain imaging data. Excluded are the studies dealing with nonneuroimaging data or raw brain signals and electroencephalogram data. Data extraction was performed to identify key study details, including image modalities, datasets, neurological conditions, transformer models, and evaluation metrics. Results were synthesized using a narrative approach. Results: Of the 1246 publications identified, 67 (5.38%) met the inclusion criteria. Half of all included studies were published in 2022, and more than two-thirds used transformers for segmenting brain tumors. The most common imaging modality was magnetic resonance imaging (n=59, 88.06%), while the most frequently used dataset was brain tumor segmentation dataset (n=39, 58.21%). 3D transformer models (n=42, 62.69%) were more prevalent than their 2D counterparts. The most developed were those of hybrid convolutional neural network-transformer architectures (n=57, 85.07%), where the vision transformer is the most frequently used type of transformer (n=37, 55.22%). The most frequent evaluation metric was the Dice score (n=63, 94.03%). Studies generally reported increased segmentation accuracy and the ability to model both local and global features in brain images. Conclusions: This review represents the recent increase in the adoption of transformers for neuroimaging segmentation, particularly for brain tumor detection. Currently, hybrid convolutional neural network-transformer architectures achieve state-of-the-art performances on benchmark datasets over standalone models. Nevertheless, their applicability remains highly limited by high computational costs and potential overfitting on small datasets. The heavy reliance of the field on the brain tumor segmentation dataset hints at the use of a more diverse set of datasets to validate the performances of models on a variety of neurological diseases. Further research is needed to define the optimal transformer architectures and training methods for clinical applications. Continuing development may make transformers the state-of-the-art for fast, accurate, and reliable brain magnetic resonance imaging segmentation, which could lead to improved clinical tools for diagnosing and evaluating neurological disorders. %M 39879621 %R 10.2196/57723 %U https://www.jmir.org/2025/1/e57723 %U https://doi.org/10.2196/57723 %U http://www.ncbi.nlm.nih.gov/pubmed/39879621 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 4 %N %P e64188 %T Urgency Prediction for Medical Laboratory Tests Through Optimal Sparse Decision Tree: Case Study With Echocardiograms %A Jiang,Yiqun %A Li,Qing %A Huang,Yu-Li %A Zhang,Wenli %+ Department of Information Systems and Business Analytics, Iowa State University, 2167 Union Drive, Ames, IA, 50011-2027, United States, 1 5152942469, wlzhang@iastate.edu %K interpretable machine learning %K urgency prediction %K appointment scheduling %K echocardiogram %K health care management %D 2025 %7 29.1.2025 %9 Original Paper %J JMIR AI %G English %X Background: In the contemporary realm of health care, laboratory tests stand as cornerstone components, driving the advancement of precision medicine. These tests offer intricate insights into a variety of medical conditions, thereby facilitating diagnosis, prognosis, and treatments. However, the accessibility of certain tests is hindered by factors such as high costs, a shortage of specialized personnel, or geographic disparities, posing obstacles to achieving equitable health care. For example, an echocardiogram is a type of laboratory test that is extremely important and not easily accessible. The increasing demand for echocardiograms underscores the imperative for more efficient scheduling protocols. Despite this pressing need, limited research has been conducted in this area. Objective: The study aims to develop an interpretable machine learning model for determining the urgency of patients requiring echocardiograms, thereby aiding in the prioritization of scheduling procedures. Furthermore, this study aims to glean insights into the pivotal attributes influencing the prioritization of echocardiogram appointments, leveraging the high interpretability of the machine learning model. Methods: Empirical and predictive analyses have been conducted to assess the urgency of patients based on a large real-world echocardiogram appointment dataset (ie, 34,293 appointments) sourced from electronic health records encompassing administrative information, referral diagnosis, and underlying patient conditions. We used a state-of-the-art interpretable machine learning algorithm, the optimal sparse decision tree (OSDT), renowned for its high accuracy and interpretability, to investigate the attributes pertinent to echocardiogram appointments. Results: The method demonstrated satisfactory performance (F1-score=36.18% with an improvement of 1.7% and F2-score=28.18% with an improvement of 0.79% by the best-performing baseline model) in comparison to the best-performing baseline model. Moreover, due to its high interpretability, the results provide valuable medical insights regarding the identification of urgent patients for tests through the extraction of decision rules from the OSDT model. Conclusions: The method demonstrated state-of-the-art predictive performance, affirming its effectiveness. Furthermore, we validate the decision rules derived from the OSDT model by comparing them with established medical knowledge. These interpretable results (eg, attribute importance and decision rules from the OSDT model) underscore the potential of our approach in prioritizing patient urgency for echocardiogram appointments and can be extended to prioritize other laboratory test appointments using electronic health record data. %M 39879091 %R 10.2196/64188 %U https://ai.jmir.org/2025/1/e64188 %U https://doi.org/10.2196/64188 %U http://www.ncbi.nlm.nih.gov/pubmed/39879091 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e63109 %T Diagnostic Decision-Making Variability Between Novice and Expert Optometrists for Glaucoma: Comparative Analysis to Inform AI System Design %A Ghaffar,Faisal %A Furtado,Nadine M. %A Ali,Imad %A Burns,Catherine %+ Department of Systems Design Engineering, Faculty of Engineering, University of Waterloo, EC4 2121, 295 Phillip St, Waterloo, ON, N2L 3W8, Canada, 1 519 888 4567 ext 33903, catherine.burns@uwaterloo.ca %K decision-making %K human-centered AI design %K human factors %K experts versus novices differences %K optometry %K glaucoma diagnosis %K experts versus novices %K glaucoma %K eye disease %K vision %K vision impairment %K comparative analysis %K methodology %K optometrist %K artificial intelligence %K AI %K diagnostic accuracy %K consistency %K clinical data %K risk assessment %K progression analysis %D 2025 %7 29.1.2025 %9 Original Paper %J JMIR Med Inform %G English %X Background: While expert optometrists tend to rely on a deep understanding of the disease and intuitive pattern recognition, those with less experience may depend more on extensive data, comparisons, and external guidance. Understanding these variations is important for developing artificial intelligence (AI) systems that can effectively support optometrists with varying degrees of experience and minimize decision inconsistencies. Objective: The main objective of this study is to identify and analyze the variations in diagnostic decision-making approaches between novice and expert optometrists. By understanding these variations, we aim to provide guidelines for the development of AI systems that can support optometrists with varying levels of expertise. These guidelines will assist in developing AI systems for glaucoma diagnosis, ultimately enhancing the diagnostic accuracy of optometrists and minimizing inconsistencies in their decisions. Methods: We conducted in-depth interviews with 14 optometrists using within-subject design, including both novices and experts, focusing on their approaches to glaucoma diagnosis. The responses were coded and analyzed using a mixed method approach incorporating both qualitative and quantitative analysis. Statistical tests such as Mann-Whitney U and chi-square tests were used to find significance in intergroup variations. These findings were further supported by themes extracted through qualitative analysis, which helped to identify decision-making patterns and understand variations in their approaches. Results: Both groups showed lower concordance rates with clinical diagnosis, with experts showing almost double (7/35, 20%) concordance rates with limited data in comparison to novices (7/69, 10%), highlighting the impact of experience and data availability on clinical judgment; this rate increased to nearly 40% for both groups (experts: 5/12, 42% and novices: 8/21, 42%) when they had access to complete historical data of the patient. We also found statistically significant intergroup differences between the first visits and subsequent visits with a P value of less than .05 on the Mann-Whitney U test in many assessments. Furthermore, approaches to the exam assessment and decision differed significantly: experts emphasized comprehensive risk assessments and progression analysis, demonstrating cognitive efficiency and intuitive decision-making, while novices relied more on structured, analytical methods and external references. Additionally, significant variations in patient follow-up times were observed, with a P value of <.001 on the chi-square test, showing a stronger influence of experience on follow-up time decisions. Conclusions: The study highlights significant variations in the decision-making process of novice and expert optometrists in glaucoma diagnosis, with experience playing a key role in accuracy, approach, and management. These findings demonstrate the critical need for AI systems tailored to varying levels of expertise. They also provide insights for the future design of AI systems aimed at enhancing the diagnostic accuracy of optometrists and consistency across different expertise levels, ultimately improving patient outcomes in optometric practice. %M 39879089 %R 10.2196/63109 %U https://medinform.jmir.org/2025/1/e63109 %U https://doi.org/10.2196/63109 %U http://www.ncbi.nlm.nih.gov/pubmed/39879089 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e62914 %T Preclinical Cognitive Markers of Alzheimer Disease and Early Diagnosis Using Virtual Reality and Artificial Intelligence: Literature Review %A Scribano Parada,María de la Paz %A González Palau,Fátima %A Valladares Rodríguez,Sonia %A Rincon,Mariano %A Rico Barroeta,Maria José %A García Rodriguez,Marta %A Bueno Aguado,Yolanda %A Herrero Blanco,Ana %A Díaz-López,Estela %A Bachiller Mayoral,Margarita %A Losada Durán,Raquel %K dementia %K Alzheimer disease %K mild cognitive impairment %K virtual reality %K artificial intelligence %K early detection %K qualitative review %K literature review %K AI %D 2025 %7 28.1.2025 %9 %J JMIR Med Inform %G English %X Background: This review explores the potential of virtual reality (VR) and artificial intelligence (AI) to identify preclinical cognitive markers of Alzheimer disease (AD). By synthesizing recent studies, it aims to advance early diagnostic methods to detect AD before significant symptoms occur. Objective: Research emphasizes the significance of early detection in AD during the preclinical phase, which does not involve cognitive impairment but nevertheless requires reliable biomarkers. Current biomarkers face challenges, prompting the exploration of cognitive behavior indicators beyond episodic memory. Methods: Using PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, we searched Scopus, PubMed, and Google Scholar for studies on neuropsychiatric disorders utilizing conversational data. Results: Following an analysis of 38 selected articles, we highlight verbal episodic memory as a sensitive preclinical AD marker, with supporting evidence from neuroimaging and genetic profiling. Executive functions precede memory decline, while processing speed is a significant correlate. The potential of VR remains underexplored, and AI algorithms offer a multidimensional approach to early neurocognitive disorder diagnosis. Conclusions: Emerging technologies like VR and AI show promise for preclinical diagnostics, but thorough validation and regulation for clinical safety and efficacy are necessary. Continued technological advancements are expected to enhance early detection and management of AD. %R 10.2196/62914 %U https://medinform.jmir.org/2025/1/e62914 %U https://doi.org/10.2196/62914 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 11 %N %P e60653 %T User and Developer Views on Using AI Technologies to Facilitate the Early Detection of Skin Cancers in Primary Care Settings: Qualitative Semistructured Interview Study %A Jones,Owain Tudor %A Calanzani,Natalia %A Scott,Suzanne E %A Matin,Rubeta N %A Emery,Jon %A Walter,Fiona M %+ Department of Public Health and Primary Care, University of Cambridge, East Forvie Building, Robinson Way, Cambridge, CB2 0SZ, United Kingdom, 44 7737204055, otj24@medschl.cam.ac.uk %K artificial intelligence %K AI %K machine learning %K ML %K primary care %K skin cancer %K melanoma %K qualitative research %K mobile phone %D 2025 %7 28.1.2025 %9 Original Paper %J JMIR Cancer %G English %X Background: Skin cancers, including melanoma and keratinocyte cancers, are among the most common cancers worldwide, and their incidence is rising in most populations. Earlier detection of skin cancer leads to better outcomes for patients. Artificial intelligence (AI) technologies have been applied to skin cancer diagnosis, but many technologies lack clinical evidence and/or the appropriate regulatory approvals. There are few qualitative studies examining the views of relevant stakeholders or evidence about the implementation and positioning of AI technologies in the skin cancer diagnostic pathway. Objective: This study aimed to understand the views of several stakeholder groups on the use of AI technologies to facilitate the early diagnosis of skin cancer, including patients, members of the public, general practitioners, primary care nurse practitioners, dermatologists, and AI researchers. Methods: This was a qualitative, semistructured interview study with 29 stakeholders. Participants were purposively sampled based on age, sex, and geographical location. We conducted the interviews via Zoom between September 2022 and May 2023. Transcribed recordings were analyzed using thematic framework analysis. The framework for the Nonadoption, Abandonment, and Challenges to Scale-Up, Spread, and Sustainability was used to guide the analysis to help understand the complexity of implementing diagnostic technologies in clinical settings. Results: Major themes were “the position of AI in the skin cancer diagnostic pathway” and “the aim of the AI technology”; cross-cutting themes included trust, usability and acceptability, generalizability, evaluation and regulation, implementation, and long-term use. There was no clear consensus on where AI should be placed along the skin cancer diagnostic pathway, but most participants saw the technology in the hands of either patients or primary care practitioners. Participants were concerned about the quality of the data used to develop and test AI technologies and the impact this could have on their accuracy in clinical use with patients from a range of demographics and the risk of missing skin cancers. Ease of use and not increasing the workload of already strained health care services were important considerations for participants. Health care professionals and AI researchers reported a lack of established methods of evaluating and regulating AI technologies. Conclusions: This study is one of the first to examine the views of a wide range of stakeholders on the use of AI technologies to facilitate early diagnosis of skin cancer. The optimal approach and position in the diagnostic pathway for these technologies have not yet been determined. AI technologies need to be developed and implemented carefully and thoughtfully, with attention paid to the quality and representativeness of the data used for development, to achieve their potential. %M 39874580 %R 10.2196/60653 %U https://cancer.jmir.org/2025/1/e60653 %U https://doi.org/10.2196/60653 %U http://www.ncbi.nlm.nih.gov/pubmed/39874580 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 14 %N %P e59823 %T The Clinicians’ Guide to Large Language Models: A General Perspective With a Focus on Hallucinations %A Roustan,Dimitri %A Bastardot,François %+ Emergency Medicine Department, Cliniques Universitaires Saint-Luc, Avenue Hippocrate 10, Brussels, 1200, Belgium, 32 477063174, dim.roustan@gmail.com %K medical informatics %K large language model %K clinical informatics %K decision-making %K computer assisted %K decision support techniques %K decision support %K decision %K AI %K artificial intelligence %K artificial intelligence tool %K LLM %K electronic data system %K hallucinations %K false information %K technical framework %D 2025 %7 28.1.2025 %9 Viewpoint %J Interact J Med Res %G English %X Large language models (LLMs) are artificial intelligence tools that have the prospect of profoundly changing how we practice all aspects of medicine. Considering the incredible potential of LLMs in medicine and the interest of many health care stakeholders for implementation into routine practice, it is therefore essential that clinicians be aware of the basic risks associated with the use of these models. Namely, a significant risk associated with the use of LLMs is their potential to create hallucinations. Hallucinations (false information) generated by LLMs arise from a multitude of causes, including both factors related to the training dataset as well as their auto-regressive nature. The implications for clinical practice range from the generation of inaccurate diagnostic and therapeutic information to the reinforcement of flawed diagnostic reasoning pathways, as well as a lack of reliability if not used properly. To reduce this risk, we developed a general technical framework for approaching LLMs in general clinical practice, as well as for implementation on a larger institutional scale. %M 39874574 %R 10.2196/59823 %U https://www.i-jmr.org/2025/1/e59823 %U https://doi.org/10.2196/59823 %U http://www.ncbi.nlm.nih.gov/pubmed/39874574 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e63634 %T Health and Experiences During the COVID-19 Pandemic Among Children and Young People: Analysis of Free-Text Responses From the Children and Young People With Long COVID Study %A Rojas,Natalia K %A Martin,Sam %A Cortina-Borja,Mario %A Shafran,Roz %A Fox-Smith,Lana %A Stephenson,Terence %A Ching,Brian C F %A d'Oelsnitz,Anaïs %A Norris,Tom %A Xu,Yue %A McOwat,Kelsey %A Dalrymple,Emma %A Heyman,Isobel %A Ford,Tamsin %A Chalder,Trudie %A Simmons,Ruth %A , %A Pinto Pereira,Snehal M %+ Division of Surgery & Interventional Science, Faculty of Medical Sciences, University College London, 43-45 Foley St, W1W 7TY, London, United Kingdom, 44 (0) 20 7679 200, n.rojas@ucl.ac.uk %K children and young people %K text mining %K free-text responses %K experiences %K COVID-19 %K long COVID %K InfraNodus %K sentiment analysis %K discourse analysis %K AI %K artificial intelligence %D 2025 %7 28.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: The literature is equivocal as to whether the predicted negative mental health impact of the COVID-19 pandemic came to fruition. Some quantitative studies report increased emotional problems and depression; others report improved mental health and well-being. Qualitative explorations reveal heterogeneity, with themes ranging from feelings of loss to growth and development. Objective: This study aims to analyze free-text responses from children and young people participating in the Children and Young People With Long COVID study to get a clearer understanding of how young people were feeling during the pandemic. Methods: A total of 8224 free-text responses from children and young people were analyzed using InfraNodus, an artificial intelligence–powered text network analysis tool, to determine the most prevalent topics. A random subsample of 411 (5%) of the 8224 responses underwent a manual sentiment analysis; this was reweighted to represent the general population of children and young people in England. Results: Experiences fell into 6 main overlapping topical clusters: school, examination stress, mental health, emotional impact of the pandemic, social and family support, and physical health (including COVID-19 symptoms). Sentiment analysis showed that statements were largely negative (314/411, 76.4%), with a small proportion being positive (57/411, 13.9%). Those reporting negative sentiment were mostly female (227/314, 72.3%), while those reporting positive sentiment were mostly older (170/314, 54.1%). There were significant observed associations between sentiment and COVID-19 status as well as sex (P=.001 and P<.001, respectively) such that the majority of the responses, regardless of COVID-19 status or sex, were negative; for example, 84.1% (227/270) of the responses from female individuals and 61.7% (87/141) of those from male individuals were negative. There were no observed associations between sentiment and all other examined demographics. The results were broadly similar when reweighted to the general population of children and young people in England: 78.52% (negative), 13.23% (positive), and 8.24% (neutral). Conclusions: We used InfraNodus to analyze free-text responses from a large sample of children and young people. The majority of responses (314/411, 76.4%) were negative, and many of the children and young people reported experiencing distress across a range of domains related to school, social situations, and mental health. Our findings add to the literature, highlighting the importance of specific considerations for children and young people when responding to national emergencies. %M 39874576 %R 10.2196/63634 %U https://www.jmir.org/2025/1/e63634 %U https://doi.org/10.2196/63634 %U http://www.ncbi.nlm.nih.gov/pubmed/39874576 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 9 %N %P e67969 %T Multimodal Pain Recognition in Postoperative Patients: Machine Learning Approach %A Subramanian,Ajan %A Cao,Rui %A Naeini,Emad Kasaeyan %A Aqajari,Seyed Amir Hossein %A Hughes,Thomas D %A Calderon,Michael-David %A Zheng,Kai %A Dutt,Nikil %A Liljeberg,Pasi %A Salanterä,Sanna %A Nelson,Ariana M %A Rahmani,Amir M %+ Department of Computer Science, University of California, Irvine, 3211 Donald Bren Hall, Irvine, CA, 92617, United States, 1 6506604994, ajans1@uci.edu %K pain intensity recognition %K multimodal information fusion %K signal processing %K weak supervision %K health care %K pain intensity %K pain recognition %K machine learning approach %K acute pain %K pain assessment %K behavioral pain %K pain measurement %K pain monitoring %K multimodal machine learning–based framework %K machine learning–based framework %K electrocardiogram %K electromyogram %K electrodermal activity %K self-reported pain level %K clinical pain management %D 2025 %7 27.1.2025 %9 Original Paper %J JMIR Form Res %G English %X Background: Acute pain management is critical in postoperative care, especially in vulnerable patient populations that may be unable to self-report pain levels effectively. Current methods of pain assessment often rely on subjective patient reports or behavioral pain observation tools, which can lead to inconsistencies in pain management. Multimodal pain assessment, integrating physiological and behavioral data, presents an opportunity to create more objective and accurate pain measurement systems. However, most previous work has focused on healthy subjects in controlled environments, with limited attention to real-world postoperative pain scenarios. This gap necessitates the development of robust, multimodal approaches capable of addressing the unique challenges associated with assessing pain in clinical settings, where factors like motion artifacts, imbalanced label distribution, and sparse data further complicate pain monitoring. Objective: This study aimed to develop and evaluate a multimodal machine learning–based framework for the objective assessment of pain in postoperative patients in real clinical settings using biosignals such as electrocardiogram, electromyogram, electrodermal activity, and respiration rate (RR) signals. Methods: The iHurt study was conducted on 25 postoperative patients at the University of California, Irvine Medical Center. The study captured multimodal biosignals during light physical activities, with concurrent self-reported pain levels using the Numerical Rating Scale. Data preprocessing involved noise filtering, feature extraction, and combining handcrafted and automatic features through convolutional and long-short-term memory autoencoders. Machine learning classifiers, including support vector machine, random forest, adaptive boosting, and k-nearest neighbors, were trained using weak supervision and minority oversampling to handle sparse and imbalanced pain labels. Pain levels were categorized into baseline and 3 levels of pain intensity (1-3). Results: The multimodal pain recognition models achieved an average balanced accuracy of over 80% across the different pain levels. RR models consistently outperformed other single modalities, particularly for lower pain intensities, while facial muscle activity (electromyogram) was most effective for distinguishing higher pain intensities. Although single-modality models, especially RR, generally provided higher performance compared to multimodal approaches, our multimodal framework still delivered results that surpassed most previous works in terms of overall accuracy. Conclusions: This study presents a novel, multimodal machine learning framework for objective pain recognition in postoperative patients. The results highlight the potential of integrating multiple biosignal modalities for more accurate pain assessment, with particular value in real-world clinical settings. %M 39869898 %R 10.2196/67969 %U https://formative.jmir.org/2025/1/e67969 %U https://doi.org/10.2196/67969 %U http://www.ncbi.nlm.nih.gov/pubmed/39869898 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 11 %N %P e58834 %T A Machine Learning Approach Using Topic Modeling to Identify and Assess Experiences of Patients With Colorectal Cancer: Explorative Study %A Voigt,Kelly %A Sun,Yingtao %A Patandin,Ayush %A Hendriks,Johanna %A Goossens,Richard Hendrik %A Verhoef,Cornelis %A Husson,Olga %A Grünhagen,Dirk %A Jung,Jiwon %K colorectal cancer %K forum %K topic modeling %K patient journey %K patient experience %K AI %K machine learning %K cancer care %K cancer survivor %K United States %K quality of life %K post %K topic %K artificial intelligence %D 2025 %7 27.1.2025 %9 %J JMIR Cancer %G English %X Background: The rising number of cancer survivors and the shortage of health care professionals challenge the accessibility of cancer care. Health technologies are necessary for sustaining optimal patient journeys. To understand individuals’ daily lives during their patient journey, qualitative studies are crucial. However, not all patients wish to share their stories with researchers. Objective: This study aims to identify and assess patient experiences on a large scale using a novel machine learning–supported approach, leveraging data from patient forums. Methods: Forum posts of patients with colorectal cancer (CRC) from the Cancer Survivors Network USA were used as the data source. Topic modeling, as a part of machine learning, was used to recognize the topic patterns in the posts. Researchers read the most relevant 50 posts on each topic, dividing them into “home” or “hospital” contexts. A patient community journey map, derived from patients stories, was developed to visually illustrate our findings. CRC medical doctors and a quality-of-life expert evaluated the identified topics of patient experience and the map. Results: Based on 212,107 posts, 37 topics and 10 upper clusters were produced. Dominant clusters included “Daily activities while living with CRC” (38,782, 18.3%) and “Understanding treatment including alternatives and adjuvant therapy” (31,577, 14.9%). Topics related to the home context had more emotional content compared with the hospital context. The patient community journey map was constructed based on these findings. Conclusions: Our study highlighted the diverse concerns and experiences of patients with CRC. The more emotional content in home context discussions underscores the personal impact of CRC beyond clinical settings. Based on our study, we found that a machine learning-supported approach is a promising solution to analyze patients’ experiences. The innovative application of patient community journey mapping provides a unique perspective into the challenges in patients’ daily lives, which is essential for delivering appropriate support at the right moment. %R 10.2196/58834 %U https://cancer.jmir.org/2025/1/e58834 %U https://doi.org/10.2196/58834 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e64993 %T Accuracy and Safety of AI-Enabled Scribe Technology: Instrument Validation Study %A Biro,Joshua %A Handley,Jessica L %A Cobb,Nathan K %A Kottamasu,Varsha %A Collins,Jeffrey %A Krevat,Seth %A Ratwani,Raj M %+ , National Center for Human Factors in Healthcare, MedStar Health Research Institute, 3007 Tilden St NW, Washington, DC, 20008, United States, 1 3015423073, joshua.m.biro@medstar.net %K artificial intelligence %K AI %K patient safety %K ambient digital scribe %K AI-enabled scribe technology %K AI scribe technology %K scribe technology %K accuracy %K safety %K ambient scribe %K digital scribe %K patient-clinician %K patient-clinician communication %K doctor-patient relationship %K doctor-patient communication %K patient engagement %K patient safety %K dialogue script %K scribe %D 2025 %7 27.1.2025 %9 Research Letter %J J Med Internet Res %G English %X Artificial intelligence–enabled ambient digital scribes may have many potential benefits, yet results from our study indicate that there are errors that must be evaluated to mitigate safety risks. %M 39869899 %R 10.2196/64993 %U https://www.jmir.org/2025/1/e64993 %U https://doi.org/10.2196/64993 %U http://www.ncbi.nlm.nih.gov/pubmed/39869899 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e64649 %T Enhancing Diagnostic Accuracy of Lung Nodules in Chest Computed Tomography Using Artificial Intelligence: Retrospective Analysis %A Liu,Weiqi %A Wu,You %A Zheng,Zhuozhao %A Bittle,Mark %A Yu,Wei %A Kharrazi,Hadi %+ Department of Radiology, Beijing Anzhen Hospital, Capital Medical University, 2 Anzhen Road, Chaoyang District, Beijing, 100029, China, 86 10 84005287, nxyw1969@163.com %K artificial intelligence %K diagnostic accuracy %K lung nodule %K radiology %K AI system %D 2025 %7 27.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Uncertainty in the diagnosis of lung nodules is a challenge for both patients and physicians. Artificial intelligence (AI) systems are increasingly being integrated into medical imaging to assist diagnostic procedures. However, the accuracy of AI systems in identifying and measuring lung nodules on chest computed tomography (CT) scans remains unclear, which requires further evaluation. Objective: This study aimed to evaluate the impact of an AI-assisted diagnostic system on the diagnostic efficiency of radiologists. It specifically examined the report modification rates and missed and misdiagnosed rates of junior radiologists with and without AI assistance. Methods: We obtained effective data from 12,889 patients in 2 tertiary hospitals in Beijing before and after the implementation of the AI system, covering the period from April 2018 to March 2022. Diagnostic reports written by both junior and senior radiologists were included in each case. Using reports by senior radiologists as a reference, we compared the modification rates of reports written by junior radiologists with and without AI assistance. We further evaluated alterations in lung nodule detection capability over 3 years after the integration of the AI system. Evaluation metrics of this study include lung nodule detection rate, accuracy, false negative rate, false positive rate, and positive predictive value. The statistical analyses included descriptive statistics and chi-square, Cochran-Armitage, and Mann-Kendall tests. Results: The AI system was implemented in Beijing Anzhen Hospital (Hospital A) in January 2019 and Tsinghua Changgung Hospital (Hospital C) in June 2021. The modification rate of diagnostic reports in the detection of lung nodules increased from 4.73% to 7.23% (χ21=12.15; P<.001) at Hospital A. In terms of lung nodule detection rates postimplementation, Hospital C increased from 46.19% to 53.45% (χ21=25.48; P<.001) and Hospital A increased from 39.29% to 55.22% (χ21=122.55; P<.001). At Hospital A, the false negative rate decreased from 8.4% to 5.16% (χ21=9.85; P=.002), while the false positive rate increased from 2.36% to 9.77% (χ21=53.48; P<.001). The detection accuracy demonstrated a decrease from 93.33% to 92.23% for Hospital A and from 95.27% to 92.77% for Hospital C. Regarding the changes in lung nodule detection capability over a 3-year period following the integration of the AI system, the detection rates for lung nodules exhibited a modest increase from 54.6% to 55.84%, while the overall accuracy demonstrated a slight improvement from 92.79% to 93.92%. Conclusions: The AI system enhanced lung nodule detection, offering the possibility of earlier disease identification and timely intervention. Nevertheless, the initial reduction in accuracy underscores the need for standardized diagnostic criteria and comprehensive training for radiologists to maximize the effectiveness of AI-enabled diagnostic systems. %M 39869890 %R 10.2196/64649 %U https://www.jmir.org/2025/1/e64649 %U https://doi.org/10.2196/64649 %U http://www.ncbi.nlm.nih.gov/pubmed/39869890 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e63548 %T The Use of AI in Mental Health Services to Support Decision-Making: Scoping Review %A Auf,Hassan %A Svedberg,Petra %A Nygren,Jens %A Nair,Monika %A Lundgren,Lina E %+ Halmstad University, School of Health and Welfare, Box 823, Kristian IV:s väg 3, Halmstad, 30118, Sweden, 46 35167100, hassan.auf@hh.se %K artificial intelligence %K AI %K mental health %K decision-making %K shared decision-making %K implementation %K human-computer interaction %D 2025 %7 24.1.2025 %9 Review %J J Med Internet Res %G English %X Background: Recent advancements in artificial intelligence (AI) have changed the care processes in mental health, particularly in decision-making support for health care professionals and individuals with mental health problems. AI systems provide support in several domains of mental health, including early detection, diagnostics, treatment, and self-care. The use of AI systems in care flows faces several challenges in relation to decision-making support, stemming from technology, end-user, and organizational perspectives with the AI disruption of care processes. Objective: This study aims to explore the use of AI systems in mental health to support decision-making, focusing on 3 key areas: the characteristics of research on AI systems in mental health; the current applications, decisions, end users, and user flow of AI systems to support decision-making; and the evaluation of AI systems for the implementation of decision-making support, including elements influencing the long-term use. Methods: A scoping review of empirical evidence was conducted across 5 databases: PubMed, Scopus, PsycINFO, Web of Science, and CINAHL. The searches were restricted to peer-reviewed articles published in English after 2011. The initial screening at the title and abstract level was conducted by 2 reviewers, followed by full-text screening based on the inclusion criteria. Data were then charted and prepared for data analysis. Results: Of a total of 1217 articles, 12 (0.99%) met the inclusion criteria. These studies predominantly originated from high-income countries. The AI systems were used in health care, self-care, and hybrid care contexts, addressing a variety of mental health problems. Three types of AI systems were identified in terms of decision-making support: diagnostic and predictive AI, treatment selection AI, and self-help AI. The dynamics of the type of end-user interaction and system design were diverse in complexity for the integration and use of the AI systems to support decision-making in care processes. The evaluation of the use of AI systems highlighted several challenges impacting the implementation and functionality of the AI systems in care processes, including factors affecting accuracy, increase of demand, trustworthiness, patient-physician communication, and engagement with the AI systems. Conclusions: The design, development, and implementation of AI systems to support decision-making present substantial challenges for the sustainable use of this technology in care processes. The empirical evidence shows that the evaluation of the use of AI systems in mental health is still in its early stages, with need for more empirically focused research on real-world use. The key aspects requiring further investigation include the evaluation of the use of AI-supported decision-making from human-AI interaction and human-computer interaction perspectives, longitudinal implementation studies of AI systems in mental health to assess the use, and the integration of shared decision-making in AI systems. %M 39854710 %R 10.2196/63548 %U https://www.jmir.org/2025/1/e63548 %U https://doi.org/10.2196/63548 %U http://www.ncbi.nlm.nih.gov/pubmed/39854710 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e56155 %T AI-Driven Innovations for Early Sepsis Detection by Combining Predictive Accuracy With Blood Count Analysis in an Emergency Setting: Retrospective Study %A Lin,Tai-Han %A Chung,Hsing-Yi %A Jian,Ming-Jr %A Chang,Chih-Kai %A Lin,Hung-Hsin %A Yen,Chiung-Tzu %A Tang,Sheng-Hui %A Pan,Pin-Ching %A Perng,Cherng-Lih %A Chang,Feng-Yee %A Chen,Chien-Wen %A Shang,Hung-Sheng %+ Division of Clinical Pathology, Department of Pathology, Tri-Service General Hospital, National Defense Medical Center, No. 161, Sec. 6, Minquan E. Rd., Neihu Dist., Taipei, 11490, Taiwan, 886 920713130, iamkeith001@gmail.com %K sepsis %K artificial intelligence %K critical care %K complete blood count analysis %K CBC analysis %K artificial intelligence clinical decision support systems %K AI-CDSS %D 2025 %7 24.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Sepsis, a critical global health challenge, accounted for approximately 20% of worldwide deaths in 2017. Although the Sequential Organ Failure Assessment (SOFA) score standardizes the diagnosis of organ dysfunction, early sepsis detection remains challenging due to its insidious symptoms. Current diagnostic methods, including clinical assessments and laboratory tests, frequently lack the speed and specificity needed for timely intervention, particularly in vulnerable populations such as older adults, intensive care unit (ICU) patients, and those with compromised immune systems. While bacterial cultures remain vital, their time-consuming nature and susceptibility to false negatives limit their effectiveness. Even promising existing machine learning approaches are restricted by reliance on complex clinical factors that could delay results, underscoring the need for faster, simpler, and more reliable diagnostic strategies. Objective: This study introduces innovative machine learning models using complete blood count with differential (CBC+DIFF) data—a routine, minimally invasive test that assesses immune response through blood cell measurements, critical for sepsis identification. The primary objective was to implement this model within an artificial intelligence–clinical decision support system (AI-CDSS) to enhance early sepsis detection and management in critical care settings. Methods: This retrospective study at Tri-Service General Hospital (September to December 2023) analyzed 746 ICU patients with suspected pneumonia-induced sepsis (supported by radiographic evidence and a SOFA score increase of ≥2 points), alongside 746 stable outpatients as controls. Sepsis infection sources were confirmed through positive sputum, blood cultures, or FilmArray results. The dataset incorporated both basic hematological factors and advanced neutrophil characteristics (side scatter light intensity, cytoplasmic complexity, and neutrophil-to-lymphocyte ratio), with data from September to November used for training and data from December used for validation. Machine learning models, including light gradient boosting machine (LGBM), random forest classifier, and gradient boosting classifier, were developed using CBC+DIFF data and were assessed using metrics such as area under the curve, sensitivity, and specificity. The best-performing model was integrated into the AI-CDSS, with its implementation supported through workshops and training sessions. Results: Pathogen identification in ICU patients found 243 FilmArray-positive, 411 culture-positive, and 92 undetected cases, yielding a final dataset of 654 (43.8%) sepsis cases out of 1492 total cases. The machine learning models demonstrated high predictive accuracy, with LGBM achieving the highest area under the curve (0.90), followed by the random forest classifier (0.89) and gradient boosting classifier (0.88). The best-performing LGBM model was selected and integrated as the core of our AI-CDSS, which was built on a web interface to facilitate rapid sepsis risk assessment using CBC+DIFF data. Conclusions: This study demonstrates that by providing streamlined predictions using CBC+DIFF data without requiring extensive clinical parameters, the AI-CDSS can be seamlessly integrated into clinical workflows, enhancing rapid, accurate identification of sepsis and improving patient care and treatment timeliness. %R 10.2196/56155 %U https://www.jmir.org/2025/1/e56155 %U https://doi.org/10.2196/56155 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 11 %N %P e57275 %T Large Language Model Approach for Zero-Shot Information Extraction and Clustering of Japanese Radiology Reports: Algorithm Development and Validation %A Yamagishi,Yosuke %A Nakamura,Yuta %A Hanaoka,Shouhei %A Abe,Osamu %K radiology reports %K clustering %K large language model %K natural language processing %K information extraction %K lung cancer %K machine learning %D 2025 %7 23.1.2025 %9 %J JMIR Cancer %G English %X Background: The application of natural language processing in medicine has increased significantly, including tasks such as information extraction and classification. Natural language processing plays a crucial role in structuring free-form radiology reports, facilitating the interpretation of textual content, and enhancing data utility through clustering techniques. Clustering allows for the identification of similar lesions and disease patterns across a broad dataset, making it useful for aggregating information and discovering new insights in medical imaging. However, most publicly available medical datasets are in English, with limited resources in other languages. This scarcity poses a challenge for development of models geared toward non-English downstream tasks. Objective: This study aimed to develop and evaluate an algorithm that uses large language models (LLMs) to extract information from Japanese lung cancer radiology reports and perform clustering analysis. The effectiveness of this approach was assessed and compared with previous supervised methods. Methods: This study employed the MedTxt-RR dataset, comprising 135 Japanese radiology reports from 9 radiologists who interpreted the computed tomography images of 15 lung cancer patients obtained from Radiopaedia. Previously used in the NTCIR-16 (NII Testbeds and Community for Information Access Research) shared task for clustering performance competition, this dataset was ideal for comparing the clustering ability of our algorithm with those of previous methods. The dataset was split into 8 cases for development and 7 for testing, respectively. The study’s approach involved using the LLM to extract information pertinent to lung cancer findings and transforming it into numeric features for clustering, using the K-means method. Performance was evaluated using 135 reports for information extraction accuracy and 63 test reports for clustering performance. This study focused on the accuracy of automated systems for extracting tumor size, location, and laterality from clinical reports. The clustering performance was evaluated using normalized mutual information, adjusted mutual information , and the Fowlkes-Mallows index for both the development and test data. Results: The tumor size was accurately identified in 99 out of 135 reports (73.3%), with errors in 36 reports (26.7%), primarily due to missing or incorrect size information. Tumor location and laterality were identified with greater accuracy in 112 out of 135 reports (83%); however, 23 reports (17%) contained errors mainly due to empty values or incorrect data. Clustering performance of the test data yielded an normalized mutual information of 0.6414, adjusted mutual information of 0.5598, and Fowlkes-Mallows index of 0.5354. The proposed method demonstrated superior performance across all evaluation metrics compared to previous methods. Conclusions: The unsupervised LLM approach surpassed the existing supervised methods in clustering Japanese radiology reports. These findings suggest that LLMs hold promise for extracting information from radiology reports and integrating it into disease-specific knowledge structures. %R 10.2196/57275 %U https://cancer.jmir.org/2025/1/e57275 %U https://doi.org/10.2196/57275 %0 Journal Article %@ 2561-1011 %I JMIR Publications %V 9 %N %P e60238 %T Causal Inference for Hypertension Prediction With Wearable Electrocardiogram and Photoplethysmogram Signals: Feasibility Study %A Gong,Ke %A Chen,Yifan %A Song,Xinyue %A Fu,Zhizhong %A Ding,Xiaorong %K hypertension %K causal inference %K wearable physiological signals %K electrocardiogram %K photoplethysmogram %D 2025 %7 23.1.2025 %9 %J JMIR Cardio %G English %X Background: Hypertension is a leading cause of cardiovascular disease and premature death worldwide, and it puts a heavy burden on the health care system. Therefore, it is very important to detect and evaluate hypertension and related cardiovascular events to enable early prevention, detection, and management. Hypertension can be detected in a timely manner with cardiac signals, such as through an electrocardiogram (ECG) and photoplethysmogram (PPG), which can be observed via wearable sensors. Most previous studies predicted hypertension from ECG and PPG signals with extracted features that are correlated with hypertension. However, correlation is sometimes unreliable and may be affected by confounding factors. Objective: The aim of this study was to investigate the feasibility of predicting the risk of hypertension by exploring features that are causally related to hypertension via causal inference methods. Additionally, we paid special attention to and verified the reliability and effectiveness of causality compared to correlation. Methods: We used a large public dataset from the Aurora Project, which was conducted by Microsoft Research. The dataset included diverse individuals who were balanced in terms of gender, age, and the condition of hypertension, with their ECG and PPG signals simultaneously acquired with wrist-worn wearable devices. We first extracted 205 features from the ECG and PPG signals, calculated 6 statistical metrics for these 205 features, and selected some valuable features out of the 205 features under each statistical metric. Then, 6 causal graphs of the selected features for each kind of statistical metric and hypertension were constructed with the equivalent greedy search algorithm. We further fused the 6 causal graphs into 1 causal graph and identified features that were causally related to hypertension from the causal graph. Finally, we used these features to detect hypertension via machine learning algorithms. Results: We validated the proposed method on 405 subjects. We identified 24 causal features that were associated with hypertension. The causal features could detect hypertension with an accuracy of 89%, precision of 92%, and recall of 82%, which outperformed detection with correlation features (accuracy of 85%, precision of 88%, and recall of 77%). Conclusions: The results indicated that the causal inference–based approach can potentially clarify the mechanism of hypertension detection with noninvasive signals and effectively detect hypertension. It also revealed that causality can be more reliable and effective than correlation for hypertension detection and other application scenarios. %R 10.2196/60238 %U https://cardio.jmir.org/2025/1/e60238 %U https://doi.org/10.2196/60238 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e58177 %T A Novel Artificial Intelligence–Enhanced Digital Network for Prehospital Emergency Support: Community Intervention Study %A Kim,Ji Hoon %A Kim,Min Joung %A Kim,Hyeon Chang %A Kim,Ha Yan %A Sung,Ji Min %A Chang,Hyuk-Jae %+ Department of Cardiology, Yonsei University College of Medicine, 50 Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea, 82 2 2228 2460, hjchang@yuhs.ac %K emergency patient transport %K transport time %K artificial intelligence %K smartphone %K mobile phone %D 2025 %7 23.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Efficient emergency patient transport systems, which are crucial for delivering timely medical care to individuals in critical situations, face certain challenges. To address this, CONNECT-AI (CONnected Network for EMS Comprehensive Technical-Support using Artificial Intelligence), a novel digital platform, was introduced. This artificial intelligence (AI)–based network provides comprehensive technical support for the real-time sharing of medical information at the prehospital stage. Objective: This study aimed to evaluate the effectiveness of this system in reducing patient transport delays. Methods: The CONNECT-AI system provided 3 key AI services to prehospital care providers by collecting real-time patient data from the scene and hospital resource information, such as bed occupancy and the availability of emergency surgeries or procedures, using 5G communication technology and internet of things devices. These services included guidance on first aid, prediction of critically ill patients, and recommendation of the optimal transfer hospital. In addition, the platform offered emergency department medical staff real-time clinical information, including live video of patients during transport to the hospital. This community-based, nonrandomized controlled intervention study was designed to evaluate the effectiveness of the CONNECT-AI system in 2 regions of South Korea, each of which operated an intervention and a control period, each lasting 16 weeks. The impact of the system was assessed based on the proportion of patients experiencing transfer delays. Results: A total of 14,853 patients transported by public ambulance were finally selected for analysis. Overall, the median transport time was 10 (IQR 7-14) minutes in the intervention group and 9 (IQR 6-13) minutes in the control group. When comparing the incidence of transport time outliers (>75%), which was the primary outcome of this study, the rate was higher in the intervention group in region 1, but significantly reduced in region 2, with the overall outlier rate being higher in the intervention group (27.5%-29.7%, P=.04). However, for patients with fever or respiratory symptoms, the group using the system showed a statistically significant reduction in outlier cases (36.5%-30.1%, P=.01). For patients who received real-time acceptance signals from the hospital, the reduction in the percentage of 75% outliers was statistically significant compared with those without the system (27.5%-19.6%, P=.02). As a result of emergency department treatment, 1.5% of patients in the control group and 1.1% in the intervention group died (P=.14). In the system-guided optimal hospital transfer group, the mortality rate was significantly lower than in the control group (1.54%-0.64%, P=.01). Conclusions: The present digital emergency medical system platform offers a novel approach to enhancing emergency patient transport by leveraging AI, real-time information sharing, and decision support. While the system demonstrated improvements for certain patient groups facing transfer challenges, further research and modifications are necessary to fully realize its benefits in diverse health care contexts. Trial Registration: ClinicalTrials.gov NCT04829279; https://clinicaltrials.gov/study/NCT04829279 %M 39847421 %R 10.2196/58177 %U https://www.jmir.org/2025/1/e58177 %U https://doi.org/10.2196/58177 %U http://www.ncbi.nlm.nih.gov/pubmed/39847421 %0 Journal Article %@ 2562-7600 %I JMIR Publications %V 8 %N %P e67197 %T Impact of Attached File Formats on the Performance of ChatGPT-4 on the Japanese National Nursing Examination: Evaluation Study %A Taira,Kazuya %A Itaya,Takahiro %A Yada,Shuntaro %A Hiyama,Kirara %A Hanada,Ayame %K nursing examination %K machine learning %K ML %K artificial intelligence %K AI %K large language models %K ChatGPT %K generative AI %D 2025 %7 22.1.2025 %9 %J JMIR Nursing %G English %X Abstract: This research letter discusses the impact of different file formats on ChatGPT-4’s performance on the Japanese National Nursing Examination, highlighting the need for standardized reporting protocols to enhance the integration of artificial intelligence in nursing education and practice. %R 10.2196/67197 %U https://nursing.jmir.org/2025/1/e67197 %U https://doi.org/10.2196/67197 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e68198 %T AI Can Be a Powerful Social Innovation for Public Health if Community Engagement Is at the Core %A Bazzano,Alessandra N %A Mantsios,Andrea %A Mattei,Nicholas %A Kosorok,Michael R %A Culotta,Aron %+ Department of Maternal and Child Health, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, 135 Dauer Drive, CB #7445, Chapel Hill, NC, 27599-7445, United States, 1 919 966 9306, abazzano@tulane.edu %K Artificial Intelligence %K Generative Artificial Intelligence %K Citizen Science %K Community Participation %K Innovation Diffusion %D 2025 %7 22.1.2025 %9 Viewpoint %J J Med Internet Res %G English %X There is a critical need for community engagement in the process of adopting artificial intelligence (AI) technologies in public health. Public health practitioners and researchers have historically innovated in areas like vaccination and sanitation but have been slower in adopting emerging technologies such as generative AI. However, with increasingly complex funding, programming, and research requirements, the field now faces a pivotal moment to enhance its agility and responsiveness to evolving health challenges. Participatory methods and community engagement are key components of many current public health programs and research. The field of public health is well positioned to ensure community engagement is part of AI technologies applied to population health issues. Without such engagement, the adoption of these technologies in public health may exclude significant portions of the population, particularly those with the fewest resources, with the potential to exacerbate health inequities. Risks to privacy and perpetuation of bias are more likely to be avoided if AI technologies in public health are designed with knowledge of community engagement, existing health disparities, and strategies for improving equity. This viewpoint proposes a multifaceted approach to ensure safer and more effective integration of AI in public health with the following call to action: (1) include the basics of AI technology in public health training and professional development; (2) use a community engagement approach to co-design AI technologies in public health; and (3) introduce governance and best practice mechanisms that can guide the use of AI in public health to prevent or mitigate potential harms. These actions will support the application of AI to varied public health domains through a framework for more transparent, responsive, and equitable use of this evolving technology, augmenting the work of public health practitioners and researchers to improve health outcomes while minimizing risks and unintended consequences. %M 39841529 %R 10.2196/68198 %U https://www.jmir.org/2025/1/e68198 %U https://doi.org/10.2196/68198 %U http://www.ncbi.nlm.nih.gov/pubmed/39841529 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e66612 %T Ten Machine Learning Models for Predicting Preoperative and Postoperative Coagulopathy in Patients With Trauma: Multicenter Cohort Study %A Xiong,Xiaojuan %A Fu,Hong %A Xu,Bo %A Wei,Wang %A Zhou,Mi %A Hu,Peng %A Ren,Yunqin %A Mao,Qingxiang %+ Department of Anesthesiology, Daping Hospital, Army Medical University, Yuzhong District, 10 Changjiang Zhilu, Chongqing, China, 86 68729729, qxmao@tmmu.edu.cn %K traumatic coagulopathy %K preoperative %K postoperative %K machine learning models %K random forest %K Medical Information Mart for Intensive Care %D 2025 %7 22.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Recent research has revealed the potential value of machine learning (ML) models in improving prognostic prediction for patients with trauma. ML can enhance predictions and identify which factors contribute the most to posttraumatic mortality. However, no studies have explored the risk factors, complications, and risk prediction of preoperative and postoperative traumatic coagulopathy (PPTIC) in patients with trauma. Objective: This study aims to help clinicians implement timely and appropriate interventions to reduce the incidence of PPTIC and related complications, thereby lowering in-hospital mortality and disability rates for patients with trauma. Methods: We analyzed data from 13,235 patients with trauma from 4 medical centers, including medical histories, laboratory results, and hospitalization complications. We developed 10 ML models in Python (Python Software Foundation) to predict PPTIC based on preoperative indicators. Data from 10,023 Medical Information Mart for Intensive Care patients were divided into training (70%) and test (30%) sets, with 3212 patients from 3 other centers used for external validation. Model performance was assessed with 5-fold cross-validation, bootstrapping, Brier score, and Shapley additive explanation values. Results: Univariate logistic regression identified PPTIC risk factors as (1) prolonged activated partial thromboplastin time, prothrombin time, and international normalized ratio; (2) decreased levels of hemoglobin, hematocrit, red blood cells, calcium, and sodium; (3) lower admission diastolic blood pressure; (4) elevated alanine aminotransferase and aspartate aminotransferase levels; (5) admission heart rate; and (6) emergency surgery and perioperative transfusion. Multivariate logistic regression revealed that patients with PPTIC faced significantly higher risks of sepsis (1.75-fold), heart failure (1.5-fold), delirium (3.08-fold), abnormal coagulation (3.57-fold), tracheostomy (2.76-fold), mortality (2.19-fold), and urinary tract infection (1.95-fold), along with longer hospital and intensive care unit stays. Random forest was the most effective ML model for predicting PPTIC, achieving an area under the receiver operating characteristic of 0.91, an area under the precision-recall curve of 0.89, accuracy of 0.84, sensitivity of 0.80, specificity of 0.88, precision of 0.88, F1-score of 0.84, and Brier score of 0.13 in external validation. Conclusions: Key PPTIC risk factors include (1) prolonged activated partial thromboplastin time, prothrombin time, and international normalized ratio; (2) low levels of hemoglobin, hematocrit, red blood cells, calcium, and sodium; (3) low diastolic blood pressure; (4) elevated alanine aminotransferase and aspartate aminotransferase levels; (5) admission heart rate; and (6) the need for emergency surgery and transfusion. PPTIC is associated with severe complications and extended hospital stays. Among the ML models, the random forest model was the most effective predictor. Trial Registration: Chinese Clinical Trial Registry ChiCTR2300078097; https://www.chictr.org.cn/showproj.html?proj=211051 %M 39841523 %R 10.2196/66612 %U https://www.jmir.org/2025/1/e66612 %U https://doi.org/10.2196/66612 %U http://www.ncbi.nlm.nih.gov/pubmed/39841523 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67143 %T What’s Going On With Me and How Can I Better Manage My Health? The Potential of GPT-4 to Transform Discharge Letters Into Patient-Centered Letters to Enhance Patient Safety: Prospective, Exploratory Study %A Eisinger,Felix %A Holderried,Friederike %A Mahling,Moritz %A Stegemann–Philipps,Christian %A Herrmann–Werner,Anne %A Nazarenus,Eric %A Sonanini,Alessandra %A Guthoff,Martina %A Eickhoff,Carsten %A Holderried,Martin %+ Tübingen Institute for Medical Education, University of Tübingen, Elfriede-Aulhorn-Str. 10, Tübingen, 72076, Germany, 49 1704848650, Friederike.Holderried@med.uni-tuebingen.de %K GPT-4 %K patient letters %K health care communication %K artificial intelligence %K patient safety %K patient education %D 2025 %7 21.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: For hospitalized patients, the discharge letter serves as a crucial source of medical information, outlining important discharge instructions and health management tasks. However, these letters are often written in professional jargon, making them difficult for patients with limited medical knowledge to understand. Large language models, such as GPT, have the potential to transform these discharge summaries into patient-friendly letters, improving accessibility and understanding. Objective: This study aims to use GPT-4 to convert discharge letters into more readable patient-centered letters. We evaluated how effectively and comprehensively GPT-4 identified and transferred patient safety–relevant information from the discharge letters to the transformed patient letters. Methods: Three discharge letters were created based on common medical conditions, containing 72 patient safety–relevant pieces of information, referred to as “learning objectives.” GPT-4 was prompted to transform these discharge letters into patient-centered letters. The resulting patient letters were analyzed for medical accuracy, patient centricity, and the ability to identify and translate the learning objectives. Bloom’s taxonomy was applied to analyze and categorize the learning objectives. Results: GPT-4 addressed the majority (56/72, 78%) of the learning objectives from the discharge letters. However, 11 of the 72 (15%) learning objectives were not included in the majority of the patient-centered letters. A qualitative analysis based on Bloom’s taxonomy revealed that learning objectives in the “Understand” category (9/11) were more frequently omitted than those in the “Remember” category (2/11). Most of the missing learning objectives were related to the content field of “prevention of complications.” By contrast, learning objectives regarding “lifestyle” and “organizational” aspects were addressed more frequently. Medical errors were found in a small proportion of sentences (31/787, 3.9%). In terms of patient centricity, the patient-centered letters demonstrated better readability than the discharge letters. Compared with discharge letters, they included fewer medical terms (132/860, 15.3%, vs 165/273, 60/4%), fewer abbreviations (43/860, 5%, vs 49/273, 17.9%), and more explanations of medical terms (121/131, 92.4%, vs 0/165, 0%). Conclusions: Our study demonstrates that GPT-4 has the potential to transform discharge letters into more patient-centered communication. While the readability and patient centricity of the transformed letters are well-established, they do not fully address all patient safety–relevant information, resulting in the omission of key aspects. Further optimization of prompt engineering may help address this issue and improve the completeness of the transformation. %M 39836954 %R 10.2196/67143 %U https://www.jmir.org/2025/1/e67143 %U https://doi.org/10.2196/67143 %U http://www.ncbi.nlm.nih.gov/pubmed/39836954 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e69742 %T Advantages and Inconveniences of a Multi-Agent Large Language Model System to Mitigate Cognitive Biases in Diagnostic Challenges %A Bousquet,Cedric %A Beltramin,Divà %+ Laboratory of Medical Informatics and Knowledge Engineering in e-Health, Inserm, Sorbonne University, 15 rue de l'école de Médecine, Paris, F-75006, France, 33 0477127974, cedric.bousquet@chu-st-etienne.fr %K large language model %K multi-agent system %K diagnostic errors %K cognition %K clinical decision-making %K cognitive bias %K generative artificial intelligence %D 2025 %7 20.1.2025 %9 Letter to the Editor %J J Med Internet Res %G English %X %M 39832364 %R 10.2196/69742 %U https://www.jmir.org/2025/1/e69742 %U https://doi.org/10.2196/69742 %U http://www.ncbi.nlm.nih.gov/pubmed/39832364 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e54121 %T Machine Learning for the Early Prediction of Delayed Cerebral Ischemia in Patients With Subarachnoid Hemorrhage: Systematic Review and Meta-Analysis %A Zhang,Haofuzi %A Zou,Peng %A Luo,Peng %A Jiang,Xiaofan %+ Department of Neurosurgery, Xijing Hospital, Fourth Military Medical University, No. 127, Changle West Road, Xincheng District, Xi'an, , China, 86 186 0298 0377, jiangxf@fmmu.edu.cn %K machine learning %K subarachnoid hemorrhage %K delayed cerebral ischemia %K systematic review %D 2025 %7 20.1.2025 %9 Review %J J Med Internet Res %G English %X Background: Delayed cerebral ischemia (DCI) is a primary contributor to death after subarachnoid hemorrhage (SAH), with significant incidence. Therefore, early determination of the risk of DCI is an urgent need. Machine learning (ML) has received much attention in clinical practice. Recently, some studies have attempted to apply ML models for early noninvasive prediction of DCI. However, systematic evidence for its predictive accuracy is still lacking. Objective: The aim of this study was to synthesize the prediction accuracy of ML models for DCI to provide evidence for the development or updating of intelligent detection tools. Methods: PubMed, Cochrane, Embase, and Web of Science databases were systematically searched up to May 18, 2023. The risk of bias in the included studies was assessed using PROBAST (Prediction Model Risk of Bias Assessment Tool). During the analysis, we discussed the performance of different models in the training and validation sets. Results: We finally included 48 studies containing 16,294 patients with SAH and 71 ML models with logistic regression as the main model type. In the training set, the pooled concordance index (C index), sensitivity, and specificity of all the models were 0.786 (95% CI 0.737-0.835), 0.77 (95% CI 0.69-0.84), and 0.83 (95% CI 0.75-0.89), respectively, while those of the logistic regression models were 0.770 (95% CI 0.724-0.817), 0.75 (95% CI 0.67-0.82), and 0.71 (95% CI 0.63-0.78), respectively. In the validation set, the pooled C index, sensitivity, and specificity of all the models were 0.767 (95% CI 0.741-0.793), 0.66 (95% CI 0.53-0.77), and 0.78 (95% CI 0.71-0.84), respectively, while those of the logistic regression models were 0.757 (95% CI 0.715-0.800), 0.59 (95% CI 0.57-0.80), and 0.80 (95% CI 0.71-0.87), respectively. Conclusions: ML models appear to have relatively desirable power for early noninvasive prediction of DCI after SAH. However, enhancing the prediction sensitivity of these models is challenging. Therefore, efficient, noninvasive, or minimally invasive low-cost predictors should be further explored in future studies to improve the prediction accuracy of ML models. Trial Registration: PROSPERO (CRD42023438399); https://tinyurl.com/yfuuudde %M 39832368 %R 10.2196/54121 %U https://www.jmir.org/2025/1/e54121 %U https://doi.org/10.2196/54121 %U http://www.ncbi.nlm.nih.gov/pubmed/39832368 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e69007 %T Era of Generalist Conversational Artificial Intelligence to Support Public Health Communications %A Sezgin,Emre %A Kocaballi,Ahmet Baki %+ The Abigail Wexner Research Institute at Nationwide Children’s Hospital, 700 Children's Dr, Columbus, OH, 43205, United States, 1 6147223179, emre.sezgin@nationwidechildrens.org %K messaging apps %K public health communication %K language models %K artificial intelligence %K AI %K generative AI %K conversational AI %D 2025 %7 20.1.2025 %9 Viewpoint %J J Med Internet Res %G English %X The integration of artificial intelligence (AI) into health communication systems has introduced a transformative approach to public health management, particularly during public health emergencies, capable of reaching billions through familiar digital channels. This paper explores the utility and implications of generalist conversational artificial intelligence (CAI) advanced AI systems trained on extensive datasets to handle a wide range of conversational tasks across various domains with human-like responsiveness. The specific focus is on the application of generalist CAI within messaging services, emphasizing its potential to enhance public health communication. We highlight the evolution and current applications of AI-driven messaging services, including their ability to provide personalized, scalable, and accessible health interventions. Specifically, we discuss the integration of large language models and generative AI in mainstream messaging platforms, which potentially outperform traditional information retrieval systems in public health contexts. We report a critical examination of the advantages of generalist CAI in delivering health information, with a case of its operationalization during the COVID-19 pandemic and propose the strategic deployment of these technologies in collaboration with public health agencies. In addition, we address significant challenges and ethical considerations, such as AI biases, misinformation, privacy concerns, and the required regulatory oversight. We envision a future with leverages generalist CAI in messaging apps, proposing a multiagent approach to enhance the reliability and specificity of health communications. We hope this commentary initiates the necessary conversations and research toward building evaluation approaches, adaptive strategies, and robust legal and technical frameworks to fully realize the benefits of AI-enhanced communications in public health, aiming to ensure equitable and effective health outcomes across diverse populations. %M 39832358 %R 10.2196/69007 %U https://www.jmir.org/2025/1/e69007 %U https://doi.org/10.2196/69007 %U http://www.ncbi.nlm.nih.gov/pubmed/39832358 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e54990 %T Health Care Professionals and Data Scientists’ Perspectives on a Machine Learning System to Anticipate and Manage the Risk of Decompensation From Patients With Heart Failure: Qualitative Interview Study %A Seringa,Joana %A Hirata,Anna %A Pedro,Ana Rita %A Santana,Rui %A Magalhães,Teresa %+ NOVA National School of Public Health, Public Health Research Centre, Comprehensive Health Research Center, NOVA University Lisbon, Avenida Padre Cruz, Lisbon, 1600-407, Portugal, 351 910417628, jm.seringa@ensp.unl.pt %K heart failure %K machine learning system %K decompensation %K qualitative research %K cardiovascular diseases %K heart failure management %K interview %D 2025 %7 20.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Heart failure (HF) is a significant global health problem, affecting approximately 64.34 million people worldwide. The worsening of HF, also known as HF decompensation, is a major factor behind hospitalizations, contributing to substantial health care costs related to this condition. Objective: This study aimed to explore the perspectives of health care professionals and data scientists regarding the relevance, challenges, and potential benefits of using machine learning (ML) models to predict decompensation from patients with HF. Methods: A total of 13 individual, semistructured, qualitative interviews were conducted in Portugal between October 31, 2022, and June 23, 2023. Participants represented different health care specialties and were selected from different contexts and regions of the country to ensure a comprehensive understanding of the topic. Data saturation was determined as the point at which no new themes emerged from participants’ perspectives, ensuring a sufficient sample size for analysis. The interviews were audio recorded, transcribed, and analyzed using MAXQDA (VERBI Software GmbH) through a reflexive thematic analysis. Two researchers (JS and AH) coded the interviews to ensure the consistency of the codes. Ethical approval was granted by the NOVA National School of Public Health ethics committee (CEENSP 14/2022), and informed consent was obtained from all participants. Results: The participants recognized the potential benefits of ML models for early detection, risk stratification, and personalized care of patients with HF. The importance of selecting appropriate variables for model development, such as rapid weight gain and symptoms, was emphasized. The use of wearables for recording vital signs was considered necessary, although challenges related to adoption among older patients were identified. Risk stratification emerged as a crucial aspect, with the model needing to identify patients at high-, medium-, and low-risk levels. Participants emphasized the need for a response model involving health care professionals to validate ML-generated alerts and determine appropriate interventions. Conclusions: The study’s findings highlight ML models’ potential benefits and challenges for predicting HF decompensation. The relevance of ML models for improving patient outcomes, reducing health care costs, and promoting patient engagement in disease management is highlighted. Adequate variable selection, risk stratification, and response models were identified as essential components for the effective implementation of ML models in health care. In addition, the study identified technical, regulatory and ethical, and adoption and acceptance challenges that need to be overcome for the successful integration of ML models into clinical workflows. Interpretation of the findings suggests that future research should focus on more extensive and diverse samples, incorporate the patient perspective, and explore the impact of ML models on patient outcomes and personalized care in HF management. Incorporation of this study’s findings into practice is expected to contribute to developing and implementing ML-based predictive models that positively impact HF management. %M 39832170 %R 10.2196/54990 %U https://www.jmir.org/2025/1/e54990 %U https://doi.org/10.2196/54990 %U http://www.ncbi.nlm.nih.gov/pubmed/39832170 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e65434 %T Explainable Predictive Model for Suicidal Ideation During COVID-19: Social Media Discourse Study %A Bouktif,Salah %A Khanday,Akib Mohi Ud Din %A Ouni,Ali %+ Department of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, Sheikh Khalifa Bin Zayed, Asharij, Al Ain, Abu Dhabi, 1551, United Arab Emirates, 971 507605406, salahb@uaeu.ac.ae %K COVID-19 %K suicide %K social networking sites %K deep learning %K explainable artificial intelligence %K suicidal ideation %K artificial intelligence %K AI %K social media %K predictive model %K mental health %K pandemic %K natural language processing %K NLP %K suicidal thought %K deep neural network approach %D 2025 %7 17.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Studying the impact of COVID-19 on mental health is both compelling and imperative for the health care system’s preparedness development. Discovering how pandemic conditions and governmental strategies and measures have impacted mental health is a challenging task. Mental health issues, such as depression and suicidal tendency, are traditionally explored through psychological battery tests and clinical procedures. To address the stigma associated with mental illness, social media is used to examine language patterns in posts related to suicide. This strategy enhances the comprehension and interpretation of suicidal ideation. Despite easy expression via social media, suicidal thoughts remain sensitive and complex to comprehend and detect. Suicidal ideation captures the new suicidal statements used during the COVID-19 pandemic that represents a different context of expressions. Objective: In this study, our aim was to detect suicidal ideation by mining textual content extracted from social media by leveraging state-of-the-art natural language processing (NLP) techniques. Methods: The work was divided into 2 major phases, one to classify suicidal ideation posts and the other to extract factors that cause suicidal ideation. We proposed a hybrid deep learning–based neural network approach (Bidirectional Encoder Representations from Transformers [BERT]+convolutional neural network [CNN]+long short-term memory [LSTM]) to classify suicidal and nonsuicidal posts. Two state-of-the-art deep learning approaches (CNN and LSTM) were combined based on features (terms) selected from term frequency–inverse document frequency (TF-IDF), Word2vec, and BERT. Explainable artificial intelligence (XAI) was used to extract key factors that contribute to suicidal ideation in order to provide a reliable and sustainable solution. Results: Of 348,110 records, 3154 (0.9%) were selected, resulting in 1338 (42.4%) suicidal and 1816 (57.6%) nonsuicidal instances. The CNN+LSTM+BERT model achieved superior performance, with a precision of 94%, a recall of 95%, an F1-score of 94%, and an accuracy of 93.65%. Conclusions: Considering the dynamic nature of suicidal behavior posts, we proposed a fused architecture that captures both localized and generalized contextual information that is important for understanding the language patterns and predict the evolution of suicidal ideation over time. According to Local Interpretable Model-Agnostic Explanations (LIME) and Shapley Additive Explanations (SHAP) XAI algorithms, there was a drift in the features during and before COVID-19. Due to the COVID-19 pandemic, new features have been added, which leads to suicidal tendencies. In the future, strategies need to be developed to combat this deadly disease. %M 39823631 %R 10.2196/65434 %U https://www.jmir.org/2025/1/e65434 %U https://doi.org/10.2196/65434 %U http://www.ncbi.nlm.nih.gov/pubmed/39823631 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 11 %N %P e56850 %T Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: Comparative Evaluation Study %A Wang,Ying-Mei %A Shen,Hung-Wei %A Chen,Tzeng-Ji %A Chiang,Shu-Chiung %A Lin,Ting-Guan %K artificial intelligence %K ChatGPT %K chat generative pre-trained transformer %K GPT-4 %K medical education %K educational measurement %K pharmacy licensure %K Taiwan %K Taiwan national pharmacist licensing examination %K learning model %K AI %K Chatbot %K pharmacist %K evaluation and comparison study %K pharmacy %K statistical analyses %K medical databases %K medical decision-making %K generative AI %K machine learning %D 2025 %7 17.1.2025 %9 %J JMIR Med Educ %G English %X Background: OpenAI released versions ChatGPT-3.5 and GPT-4 between 2022 and 2023. GPT-3.5 has demonstrated proficiency in various examinations, particularly the United States Medical Licensing Examination. However, GPT-4 has more advanced capabilities. Objective: This study aims to examine the efficacy of GPT-3.5 and GPT-4 within the Taiwan National Pharmacist Licensing Examination and to ascertain their utility and potential application in clinical pharmacy and education. Methods: The pharmacist examination in Taiwan consists of 2 stages: basic subjects and clinical subjects. In this study, exam questions were manually fed into the GPT-3.5 and GPT-4 models, and their responses were recorded; graphic-based questions were excluded. This study encompassed three steps: (1) determining the answering accuracy of GPT-3.5 and GPT-4, (2) categorizing question types and observing differences in model performance across these categories, and (3) comparing model performance on calculation and situational questions. Microsoft Excel and R software were used for statistical analyses. Results: GPT-4 achieved an accuracy rate of 72.9%, overshadowing GPT-3.5, which achieved 59.1% (P<.001). In the basic subjects category, GPT-4 significantly outperformed GPT-3.5 (73.4% vs 53.2%; P<.001). However, in clinical subjects, only minor differences in accuracy were observed. Specifically, GPT-4 outperformed GPT-3.5 in the calculation and situational questions. Conclusions: This study demonstrates that GPT-4 outperforms GPT-3.5 in the Taiwan National Pharmacist Licensing Examination, particularly in basic subjects. While GPT-4 shows potential for use in clinical practice and pharmacy education, its limitations warrant caution. Future research should focus on refining prompts, improving model stability, integrating medical databases, and designing questions that better assess student competence and minimize guessing. %R 10.2196/56850 %U https://mededu.jmir.org/2025/1/e56850 %U https://doi.org/10.2196/56850 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 11 %N %P e64284 %T Performance Evaluation and Implications of Large Language Models in Radiology Board Exams: Prospective Comparative Analysis %A Wei,Boxiong %K large language models %K LLM %K artificial intelligence %K AI %K GPT-4 %K radiology exams %K medical education %K diagnostics %K medical training %K radiology %K ultrasound %D 2025 %7 16.1.2025 %9 %J JMIR Med Educ %G English %X Background: Artificial intelligence advancements have enabled large language models to significantly impact radiology education and diagnostic accuracy. Objective: This study evaluates the performance of mainstream large language models, including GPT-4, Claude, Bard, Tongyi Qianwen, and Gemini Pro, in radiology board exams. Methods: A comparative analysis of 150 multiple-choice questions from radiology board exams without images was conducted. Models were assessed on their accuracy for text-based questions and were categorized by cognitive levels and medical specialties using χ2 tests and ANOVA. Results: GPT-4 achieved the highest accuracy (83.3%, 125/150), significantly outperforming all other models. Specifically, Claude achieved an accuracy of 62% (93/150; P<.001), Bard 54.7% (82/150; P<.001), Tongyi Qianwen 70.7% (106/150; P=.009), and Gemini Pro 55.3% (83/150; P<.001). The odds ratios compared to GPT-4 were 0.33 (95% CI 0.18‐0.60) for Claude, 0.24 (95% CI 0.13‐0.44) for Bard, and 0.25 (95% CI 0.14‐0.45) for Gemini Pro. Tongyi Qianwen performed relatively well with an accuracy of 70.7% (106/150; P=0.02) and had an odds ratio of 0.48 (95% CI 0.27‐0.87) compared to GPT-4. Performance varied across question types and specialties, with GPT-4 excelling in both lower-order and higher-order questions, while Claude and Bard struggled with complex diagnostic questions. Conclusions: GPT-4 and Tongyi Qianwen show promise in medical education and training. The study emphasizes the need for domain-specific training datasets to enhance large language models’ effectiveness in specialized fields like radiology. %R 10.2196/64284 %U https://mededu.jmir.org/2025/1/e64284 %U https://doi.org/10.2196/64284 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e57298 %T Development and Validation of a Machine Learning Method Using Vocal Biomarkers for Identifying Frailty in Community-Dwelling Older Adults: Cross-Sectional Study %A Kim,Taehwan %A Choi,Jung-Yeon %A Ko,Myung Jin %A Kim,Kwang-il %K frailty %K cross-sectional study %K vocal biomarkers %K older adults %K artificial intelligence %K machine learning %K classification model %K self-supervised %D 2025 %7 16.1.2025 %9 %J JMIR Med Inform %G English %X Background: The two most commonly used methods to identify frailty are the frailty phenotype and the frailty index. However, both methods have limitations in clinical application. In addition, methods for measuring frailty have not yet been standardized. Objective: We aimed to develop and validate a classification model for predicting frailty status using vocal biomarkers in community-dwelling older adults, based on voice recordings obtained from the picture description task (PDT). Methods: We recruited 127 participants aged 50 years and older and collected clinical information through a short form of the Comprehensive Geriatric Assessment scale. Voice recordings were collected with a tablet device during the Korean version of the PDT, and we preprocessed audio data to remove background noise before feature extraction. Three artificial intelligence (AI) models were developed for identifying frailty status: SpeechAI (using speech data only), DemoAI (using demographic data only), and DemoSpeechAI (combining both data types). Results: Our models were trained and evaluated on the basis of 5-fold cross-validation for 127 participants and compared. The SpeechAI model, using deep learning–based acoustic features, outperformed in terms of accuracy and area under the receiver operating characteristic curve (AUC), 80.4% (95% CI 76.89%‐83.91%) and 0.89 (95% CI 0.86‐0.92), respectively, while the model using only demographics showed an accuracy of 67.96% (95% CI 67.63%‐68.29%) and an AUC of 0.74 (95% CI 0.73‐0.75). The SpeechAI model outperformed the model using only demographics significantly in AUC (t4=8.705 [2-sided]; P<.001). The DemoSpeechAI model, which combined demographics with deep learning–based acoustic features, showed superior performance (accuracy 85.6%, 95% CI 80.03%‐91.17% and AUC 0.93, 95% CI 0.89‐0.97), but there was no significant difference in AUC between the SpeechAI and DemoSpeechAI models (t4=1.057 [2-sided]; P=.35). Compared with models using traditional acoustic features from the openSMILE toolkit, the SpeechAI model demonstrated superior performance (AUC 0.89) over traditional methods (logistic regression: AUC 0.62; decision tree: AUC 0.57; random forest: AUC 0.66). Conclusions: Our findings demonstrate that vocal biomarkers derived from deep learning–based acoustic features can be effectively used to predict frailty status in community-dwelling older adults. The SpeechAI model showed promising accuracy and AUC, outperforming models based solely on demographic data or traditional acoustic features. Furthermore, while the combined DemoSpeechAI model showed slightly improved performance over the SpeechAI model, the difference was not statistically significant. These results suggest that speech-based AI models offer a noninvasive, scalable method for frailty detection, potentially streamlining assessments in clinical and community settings. %R 10.2196/57298 %U https://medinform.jmir.org/2025/1/e57298 %U https://doi.org/10.2196/57298 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 12 %N %P e67192 %T Natural Language Processing and Social Determinants of Health in Mental Health Research: AI-Assisted Scoping Review %A Scherbakov,Dmitry A %A Hubig,Nina C %A Lenert,Leslie A %A Alekseyenko,Alexander V %A Obeid,Jihad S %K natural language processing %K datasets %K mental health %K automated review %K depression %K suicide %K mental health research %K NLP %K artificial intelligence %K AI %K scoping review %K determinant %K large language model %K LLM %K quantitative %K automation %D 2025 %7 16.1.2025 %9 %J JMIR Ment Health %G English %X Background: The use of natural language processing (NLP) in mental health research is increasing, with a wide range of applications and datasets being investigated. Objective: This review aims to summarize the use of NLP in mental health research, with a special focus on the types of text datasets and the use of social determinants of health (SDOH) in NLP projects related to mental health. Methods: The search was conducted in September 2024 using a broad search strategy in PubMed, Scopus, and CINAHL Complete. All citations were uploaded to Covidence (Veritas Health Innovation) software. The screening and extraction process took place in Covidence with the help of a custom large language model (LLM) module developed by our team. This LLM module was calibrated and tuned to automate many aspects of the review process. Results: The screening process, assisted by the custom LLM, led to the inclusion of 1768 studies in the final review. Most of the reviewed studies (n=665, 42.8%) used clinical data as their primary text dataset, followed by social media datasets (n=523, 33.7%). The United States contributed the highest number of studies (n=568, 36.6%), with depression (n=438, 28.2%) and suicide (n=240, 15.5%) being the most frequently investigated mental health issues. Traditional demographic variables, such as age (n=877, 56.5%) and gender (n=760, 49%), were commonly extracted, while SDOH factors were less frequently reported, with urban or rural status being the most used (n=19, 1.2%). Over half of the citations (n=826, 53.2%) did not provide clear information on dataset accessibility, although a sizable number of studies (n=304, 19.6%) made their datasets publicly available. Conclusions: This scoping review underscores the significant role of clinical notes and social media in NLP-based mental health research. Despite the clear relevance of SDOH to mental health, their underutilization presents a gap in current research. This review can be a starting point for researchers looking for an overview of mental health projects using text data. Shared datasets could be used to place more emphasis on SDOH in future studies. %R 10.2196/67192 %U https://mental.jmir.org/2025/1/e67192 %U https://doi.org/10.2196/67192 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 14 %N %P e63875 %T In Silico Evaluation of Algorithm-Based Clinical Decision Support Systems: Protocol for a Scoping Review %A Dorosan,Michael %A Chen,Ya-Lin %A Zhuang,Qingyuan %A Lam,Shao Wei Sean %+ Health Services Research Centre, Singapore Health Services Pte Ltd, Health Services Research Institute (HSRI) Academia, Ngee Ann Kongsi Discovery Tower Level 6, 20 College Road, Singapore, 169856, Singapore, 65 65767140, gmslasws@nus.edu.sg %K clinical decision support algorithms %K in silico evaluation %K clinical workflow simulation %K health care modeling %K digital twin %K quadruple aims %K clinical decision %K decision-making %K decision support %K workflow %K support system %K protocol %K scoping review %K algorithm-based %K screening %K thematic analysis %K descriptive analysis %K clinical decision-making %D 2025 %7 16.1.2025 %9 Protocol %J JMIR Res Protoc %G English %X Background: Integrating algorithm-based clinical decision support (CDS) systems poses significant challenges in evaluating their actual clinical value. Such CDS systems are traditionally assessed via controlled but resource-intensive clinical trials. Objective: This paper presents a review protocol for preimplementation in silico evaluation methods to enable broadened impact analysis under simulated environments before clinical trials. Methods: We propose a scoping review protocol that follows an enhanced Arksey and O’Malley framework and PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines to investigate the scope and research gaps in the in silico evaluation of algorithm-based CDS models—specifically CDS decision-making end points and objectives, evaluation metrics used, and simulation paradigms used to assess potential impacts. The databases searched are PubMed, Embase, CINAHL, PsycINFO, Cochrane, IEEEXplore, Web of Science, and arXiv. A 2-stage screening process identified pertinent articles. The information extracted from articles was iteratively refined. The review will use thematic, trend, and descriptive analyses to meet scoping aims. Results: We conducted an automated search of the databases above in May 2023, with most title and abstract screenings completed by November 2023 and full-text screening extended from December 2023 to May 2024. Concurrent charting and full-text analysis were carried out, with the final analysis and manuscript preparation set for completion in July 2024. Publication of the review results is targeted from July 2024 to February 2025. As of April 2024, a total of 21 articles have been selected following a 2-stage screening process; these will proceed to data extraction and analysis. Conclusions: We refined our data extraction strategy through a collaborative, multidisciplinary approach, planning to analyze results using thematic analyses to identify approaches to in silico evaluation. Anticipated findings aim to contribute to developing a unified in silico evaluation framework adaptable to various clinical workflows, detailing clinical decision-making characteristics, impact measures, and reusability of methods. The study’s findings will be published and presented in forums combining artificial intelligence and machine learning, clinical decision-making, and health technology impact analysis. Ultimately, we aim to bridge the development-deployment gap through in silico evaluation-based potential impact assessments. International Registered Report Identifier (IRRID): DERR1-10.2196/63875 %M 39819973 %R 10.2196/63875 %U https://www.researchprotocols.org/2025/1/e63875 %U https://doi.org/10.2196/63875 %U http://www.ncbi.nlm.nih.gov/pubmed/39819973 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67378 %T Application of a 3D Fusion Model to Evaluate the Efficacy of Clear Aligner Therapy in Malocclusion Patients: Prospective Observational Study %A Liu,Chaofeng %A Liu,Yan %A Yi,Chunyan %A Xie,Tao %A Tian,Jingjun %A Deng,Peishen %A Liu,Changyu %A Shan,Yan %A Dong,Hangyu %A Xu,Yanhua %+ Yunnan Key Laboratory of Stomatology, Department of Orthodontics, Kunming Medical University & Affiliated Stomatological Hospital, C Building, Hecheng International, 1088 Haiyuan Middle Road, Kunming, 650500, China, 86 0871 65330099 ext 8038, xuyanhua18@163.com %K clear aligners %K CBCT %K intraoral scanning %K fusion model %K artificial intelligence %K efficacy evaluation %K orthodontic treatment %D 2025 %7 15.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Investigating the safe range of orthodontic tooth movement is essential for maintaining oral and maxillofacial stability posttreatment. Although clear aligners rely on pretreatment digital models, their effect on periodontal hard tissues remains uncertain. By integrating cone beam computed tomography–derived cervical and root data with crown data from digital intraoral scans, a 3D fusion model may enhance precision and safety. Objective: This study aims to construct a 3D fusion model based on artificial intelligence software that matches cone beam computed tomography and intraoral scanning data using the Andrews’ Six Element standard. The model will be used to assess the 3D effects of clear aligners on tooth movement, to provide a reference for the design of pretreatment target positions. Methods: Between May 2022 and May 2024, a total of 320 patients who completed clear aligner therapy at our institution were screened; 136 patients (aged 13-35 years, fully erupted permanent dentition and periodontal pocket depth <3 mm) met the criteria. Baseline (“simulation”) and posttreatment (“fusion”) models were compared. Outcomes included upper core discrepancy (UCD), upper incisors anteroposterior discrepancy (UAP), lower Spee curve deep discrepancy (LSD), upper anterior teeth width discrepancy (UAW), upper canine width discrepancy (UCW), upper molar width discrepancy (UMW), and total scores. Subanalyses examined sex, age stage (adolescent vs adult), and treatment method (extraction vs nonextraction). Results: The study was funded in May 2022, with data collection beginning the same month and continuing until May 2024. Of 320 initial participants, 136 met the inclusion criteria. Data analysis is ongoing, and final results are expected by late 2024. Among the 136 participants, 90 (66%) were female, 46 (34%) were male, 64 (47%) were adolescents, 72 (53%) were adults, 38 (28%) underwent extraction, and 98 (72%) did not. Total scores did not differ significantly by sex (mean difference 0.01, 95% CI –0.13 to 0.15; P=.85), age stage (mean difference 0.03, 95% CI –0.10 to 0.17; P=.60), or treatment method (mean difference 0.07, 95% CI –0.22 to 0.07; P=.32). No significant differences were found in UCD (mean difference 0.001, 95% CI –0.02 to 0.01; P=.90) or UAP (mean difference 0.01, 95% CI –0.03 to 0.00; P=.06) by treatment method. However, adolescents exhibited smaller differences in UCD, UAW, UCW, and UMW yet larger differences in UAP and LSD (df=134; P<.001). Extraction cases showed smaller LSD, UAW, and UCW but larger UMW differences compared with nonextraction (df=134; P<.001). Conclusions: The 3D fusion model provides a reliable clinical reference for target position design and treatment outcome evaluation in clear aligner systems. The construction and application of a 3D fusion model in clear aligner orthodontics represent a significant leap forward, offering substantial clinical benefits while establishing a new standard for precision, personalization, and evidence-based treatment planning in the field. Trial Registration: Chinese Clinical Trial Registry ChiCTR2400094304, https://www.chictr.org.cn/hvshowproject.html?id=266090&v=1.0 %M 39715692 %R 10.2196/67378 %U https://www.jmir.org/2025/1/e67378 %U https://doi.org/10.2196/67378 %U http://www.ncbi.nlm.nih.gov/pubmed/39715692 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e59111 %T From Theory to Practice: Viewpoint on Economic Indicators for Trust in Digital Health %A Gille,Felix %A Maaß,Laura %A Ho,Benjamin %A Srivastava,Divya %+ University of Zurich, Digital Society Initiative, Rämistrasse 69, Zurich, 8001, Switzerland, 41 44635 7133, felix.gille@uzh.ch %K trust %K economics %K digital health %K digital health innovation %K artificial intelligence %K AI %K economic evaluation %K public trust %K health data %K medical apps %D 2025 %7 15.1.2025 %9 Viewpoint %J J Med Internet Res %G English %X User trust is pivotal for the adoption of digital health systems interventions (DHI). In response, numerous trust-building guidelines have recently emerged targeting DHIs such as artificial intelligence. The common aim of these guidelines aimed at private sector actors and government policy makers is to build trustworthy DHI. While these guidelines provide some indication of what trustworthiness is, the guidelines typically only define trust and trustworthiness in broad terms, they rarely offer guidance about economic considerations that would allow implementers to measure and balance trade-offs between costs and benefits. These considerations are important when deciding how best to allocate scarce resources (eg, financial capital, workforce, or time). The missing focus on economics undermines the potential usefulness of such guidelines. We propose the development of actionable trust-performance-indicators (including but not limited to surveys) to gather evidence on the cost-effectiveness of trust-building principles as a crucial step for successful implementation. Furthermore, we offer guidance on navigating the conceptual complexity surrounding trust and on how to sharpen the trust discourse. Successful implementation of economic considerations is critical to successfully build user trust in DHI. %M 39813672 %R 10.2196/59111 %U https://www.jmir.org/2025/1/e59111 %U https://doi.org/10.2196/59111 %U http://www.ncbi.nlm.nih.gov/pubmed/39813672 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e55046 %T A Supervised Explainable Machine Learning Model for Perioperative Neurocognitive Disorder in Liver-Transplantation Patients and External Validation on the Medical Information Mart for Intensive Care IV Database: Retrospective Study %A Ding,Zhendong %A Zhang,Linan %A Zhang,Yihan %A Yang,Jing %A Luo,Yuheng %A Ge,Mian %A Yao,Weifeng %A Hei,Ziqing %A Chen,Chaojin %+ Department of Anesthesiology, The Third Affiliated Hospital of Sun Yat-sen University, No. 600 Tianhe Road, Guangzhou, 510630, China, 86 13430322182, chenchj28@mail.sysu.edu.cn %K machine learning %K risk factors %K liver transplantation %K perioperative neurocognitive disorders %K MIMIC-Ⅳ database %K external validation %D 2025 %7 15.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Patients undergoing liver transplantation (LT) are at risk of perioperative neurocognitive dysfunction (PND), which significantly affects the patients’ prognosis. Objective: This study used machine learning (ML) algorithms with an aim to extract critical predictors and develop an ML model to predict PND among LT recipients. Methods: In this retrospective study, data from 958 patients who underwent LT between January 2015 and January 2020 were extracted from the Third Affiliated Hospital of Sun Yat-sen University. Six ML algorithms were used to predict post-LT PND, and model performance was evaluated using area under the receiver operating curve (AUC), accuracy, sensitivity, specificity, and F1-scores. The best-performing model was additionally validated using a temporal external dataset including 309 LT cases from February 2020 to August 2022, and an independent external dataset extracted from the Medical Information Mart for Intensive Care Ⅳ (MIMIC-Ⅳ) database including 325 patients. Results: In the development cohort, 201 out of 751 (33.5%) patients were diagnosed with PND. The logistic regression model achieved the highest AUC (0.799) in the internal validation set, with comparable AUC in the temporal external (0.826) and MIMIC-Ⅳ validation sets (0.72). The top 3 features contributing to post-LT PND diagnosis were the preoperative overt hepatic encephalopathy, platelet level, and postoperative sequential organ failure assessment score, as revealed by the Shapley additive explanations method. Conclusions: A real-time logistic regression model-based online predictor of post-LT PND was developed, providing a highly interoperable tool for use across medical institutions to support early risk stratification and decision making for the LT recipients. %M 39813086 %R 10.2196/55046 %U https://www.jmir.org/2025/1/e55046 %U https://doi.org/10.2196/55046 %U http://www.ncbi.nlm.nih.gov/pubmed/39813086 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e60413 %T Recruiting Young People for Digital Mental Health Research: Lessons From an AI-Driven Adaptive Trial %A Zheng,Wu Yi %A Shvetcov,Artur %A Slade,Aimy %A Jenkins,Zoe %A Hoon,Leonard %A Whitton,Alexis %A Logothetis,Rena %A Ravindra,Smrithi %A Kurniawan,Stefanus %A Gupta,Sunil %A Huckvale,Kit %A Stech,Eileen %A Agarwal,Akash %A Funke Kupper,Joost %A Cameron,Stuart %A Rosenberg,Jodie %A Manoglou,Nicholas %A Senadeera,Manisha %A Venkatesh,Svetha %A Mouzakis,Kon %A Vasa,Rajesh %A Christensen,Helen %A Newby,Jill M %+ Black Dog Institute, University of New South Wales, Hospital Road, Randwick, Sydney, 2031, Australia, 61 0422510718, wuyi.zheng@blackdog.org.au %K recruitment %K Facebook %K retention, COVID-19 %K artificial intelligence %D 2025 %7 14.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: With increasing adoption of remote clinical trials in digital mental health, identifying cost-effective and time-efficient recruitment methodologies is crucial for the success of such trials. Evidence on whether web-based recruitment methods are more effective than traditional methods such as newspapers, media, or flyers is inconsistent. Here we present insights from our experience recruiting tertiary education students for a digital mental health artificial intelligence–driven adaptive trial—Vibe Up. Objective: We evaluated the effectiveness of recruitment via Facebook and Instagram compared to traditional methods for a treatment trial and compared different recruitment methods’ retention rates. With recruitment coinciding with COVID-19 lockdowns across Australia, we also compared the cost-effectiveness of social media recruitment during and after lockdowns. Methods: Recruitment was completed for 2 pilot trials and 6 minitrials from June 2021 to May 2022. To recruit participants, paid social media advertising on Facebook and Instagram was used, alongside mailing lists of university networks and student organizations or services, media releases, announcements during classes and events, study posters or flyers on university campuses, and health professional networks. Recruitment data, including engagement metrics collected by Meta (Facebook and Instagram), advertising costs, and Qualtrics data on recruitment methods and survey completion rates, were analyzed using RStudio with R (version 3.6.3; R Foundation for Statistical Computing). Results: In total, 1314 eligible participants (aged 22.79, SD 4.71 years; 1079, 82.1% female) were recruited to 2 pilot trials and 6 minitrials. The vast majority were recruited via Facebook and Instagram advertising (n=1203; 92%). Pairwise comparisons revealed that the lead institution’s website was more effective in recruiting eligible participants than Facebook (z=3.47; P=.003) and Instagram (z=4.23; P<.001). No differences were found between recruitment methods in retaining participants at baseline, at midpoint, and at study completion. Wilcoxon tests found significant differences between lockdown (pilot 1 and pilot 2) and postlockdown (minitrials 1-6) on costs incurred per link click (lockdown: median Aus $0.35 [US $0.22], IQR Aus $0.27-$0.47 [US $0.17-$0.29]; postlockdown: median Aus $1.00 [US $0.62], IQR Aus $0.70-$1.47 [US $0.44-$0.92]; W=9087; P<.001) and the amount spent per hour to reach the target sample size (lockdown: median Aus $4.75 [US $2.95], IQR Aus $1.94-6.34 [US $1.22-$3.97]; postlockdown: median Aus $13.29 [US $8.26], IQR Aus $4.70-25.31 [US $2.95-$15.87]; W=16044; P<.001). Conclusions: Social media advertising via Facebook and Instagram was the most successful strategy for recruiting distressed tertiary students into this artificial intelligence–driven adaptive trial, providing evidence for the use of this recruitment method for this type of trial in digital mental health research. No recruitment method stood out in terms of participant retention. Perhaps a reflection of the added distress experienced by young people, social media recruitment during the COVID-19 lockdown period was more cost-effective. Trial Registration: Australian New Zealand Clinical Trials Registry ACTRN12621001092886; https://tinyurl.com/39f2pdmd; Australian New Zealand Clinical Trials Registry ACTRN12621001223820; https://tinyurl.com/bdhkvucv %M 39808785 %R 10.2196/60413 %U https://www.jmir.org/2025/1/e60413 %U https://doi.org/10.2196/60413 %U http://www.ncbi.nlm.nih.gov/pubmed/39808785 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e65589 %T Therapeutic Potential of Social Chatbots in Alleviating Loneliness and Social Anxiety: Quasi-Experimental Mixed Methods Study %A Kim,Myungsung %A Lee,Seonmi %A Kim,Sieun %A Heo,Jeong-in %A Lee,Sangil %A Shin,Yu-Bin %A Cho,Chul-Hyun %A Jung,Dooyoung %+ Department of Psychiatry, Korea University College of Medicine, 73 Goryeodae-ro, Seongbuk-gu, Seoul, 02841, Republic of Korea, 82 029205505, david0203@gmail.com %K artificial intelligence %K AI %K social chatbot %K loneliness %K social anxiety %K exploratory research %K mixed methods study %D 2025 %7 14.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) social chatbots represent a major advancement in merging technology with mental health, offering benefits through natural and emotional communication. Unlike task-oriented chatbots, social chatbots build relationships and provide social support, which can positively impact mental health outcomes like loneliness and social anxiety. However, the specific effects and mechanisms through which these chatbots influence mental health remain underexplored. Objective: This study explores the mental health potential of AI social chatbots, focusing on their impact on loneliness and social anxiety among university students. The study seeks to (i) assess the impact of engaging with an AI social chatbot in South Korea, "Luda Lee," on these mental health outcomes over a 4-week period and (ii) analyze user experiences to identify perceived strengths and weaknesses, as well as the applicability of social chatbots in therapeutic contexts. Methods: A single-group pre-post study was conducted with university students who interacted with the chatbot for 4 weeks. Measures included loneliness, social anxiety, and mood-related symptoms such as depression, assessed at baseline, week 2, and week 4. Quantitative measures were analyzed using analysis of variance and stepwise linear regression to identify the factors affecting change. Thematic analysis was used to analyze user experiences and assess the perceived benefits and challenges of chatbots. Results: A total of 176 participants (88 males, average age=22.6 (SD 2.92)) took part in the study. Baseline measures indicated slightly elevated levels of loneliness (UCLA Loneliness Scale, mean 27.97, SD (11.07)) and social anxiety (Liebowitz Social Anxiety Scale, mean 25.3, SD (14.19)) compared to typical university students. Significant reductions were observed as loneliness decreasing by week 2 (t175=2.55, P=.02) and social anxiety decreasing by week 4 (t175=2.67, P=.01). Stepwise linear regression identified baseline loneliness (β=0.78, 95% CI 0.67 to 0.89), self-disclosure (β=–0.65, 95% CI –1.07 to –0.23) and resilience (β=0.07, 95% CI 0.01 to 0.13) as significant predictors of week 4 loneliness (R2=0.64). Baseline social anxiety (β=0.92, 95% CI 0.81 to 1.03) significantly predicted week 4 anxiety (R2=0.65). These findings indicate higher baseline loneliness, lower self-disclosure to the chatbot, and higher resilience significantly predicted higher loneliness at week 4. Additionally, higher baseline social anxiety significantly predicted higher social anxiety at week 4. Qualitative analysis highlighted the chatbot's empathy and support as features for reliability, though issues such as inconsistent responses and excessive enthusiasm occasionally disrupted user immersion. Conclusions: Social chatbots may have the potential to mitigate feelings of loneliness and social anxiety, indicating their possible utility as complementary resources in mental health interventions. User insights emphasize the importance of empathy, accessibility, and structured conversations in achieving therapeutic goals. Trial Registration: Clinical Research Information Service (CRIS) KCT0009288; https://tinyurl.com/hxrznt3t %M 39808786 %R 10.2196/65589 %U https://www.jmir.org/2025/1/e65589 %U https://doi.org/10.2196/65589 %U http://www.ncbi.nlm.nih.gov/pubmed/39808786 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e60520 %T Usefulness of Automatic Speech Recognition Assessment of Children With Speech Sound Disorders: Validation Study %A Kim,Do Hyung %A Jeong,Joo Won %A Kang,Dayoung %A Ahn,Taekyung %A Hong,Yeonjung %A Im,Younggon %A Kim,Jaewon %A Kim,Min Jung %A Jang,Dae-Hyun %+ Department of Rehabilitation Medicine, Incheon St Mary’s Hospital, College of Medicine, The Catholic University of Korea, 22 Banpo-daero, Seocho-gu, Seoul, 06591, Republic of Korea, 82 0322806601, dhjangmd@naver.com %K speech sound disorder %K speech recognition software %K speech articulation tests %K speech-language pathology %K child %D 2025 %7 14.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Speech sound disorders (SSDs) are common communication challenges in children, typically assessed by speech-language pathologists (SLPs) using standardized tools. However, traditional evaluation methods are time-intensive and prone to variability, raising concerns about reliability. Objective: This study aimed to compare the evaluation outcomes of SLPs and an automatic speech recognition (ASR) model using two standardized SSD assessments in South Korea, evaluating the ASR model’s performance. Methods: A fine-tuned wav2vec 2.0 XLS-R model, pretrained on 436,000 hours of adult voice data spanning 128 languages, was used. The model was further trained on 93.6 minutes of children’s voices with articulation errors to improve error detection. Participants included children referred to the Department of Rehabilitation Medicine at a general hospital in Incheon, South Korea, from August 19, 2022, to June 14, 2023. Two standardized assessments—the Assessment of Phonology and Articulation for Children (APAC) and the Urimal Test of Articulation and Phonology (U-TAP)—were used, with ASR transcriptions compared to SLP transcriptions. Results: This study included 30 children aged 3-7 years who were suspected of having SSDs. The phoneme error rates for the APAC and U-TAP were 8.42% (457/5430) and 8.91% (402/4514), respectively, indicating discrepancies between the ASR model and SLP transcriptions across all phonemes. Consonant error rates were 10.58% (327/3090) and 11.86% (331/2790) for the APAC and U-TAP, respectively. On average, there were 2.60 (SD 1.54) and 3.07 (SD 1.39) discrepancies per child for correctly produced phonemes, and 7.87 (SD 3.66) and 7.57 (SD 4.85) discrepancies per child for incorrectly produced phonemes, based on the APAC and U-TAP, respectively. The correlation between SLPs and the ASR model in terms of the percentage of consonants correct was excellent, with an intraclass correlation coefficient of 0.984 (95% CI 0.953-0.994) and 0.978 (95% CI 0.941-0.990) for the APAC and UTAP, respectively. The z scores between SLPs and ASR showed more pronounced differences with the APAC than the U-TAP, with 8 individuals showing discrepancies in the APAC compared to 2 in the U-TAP. Conclusions: The results demonstrate the potential of the ASR model in assessing children with SSDs. However, its performance varied based on phoneme or word characteristics, highlighting areas for refinement. Future research should include more diverse speech samples, clinical settings, and speech data to strengthen the model’s refinement and ensure broader clinical applicability. %M 39576242 %R 10.2196/60520 %U https://www.jmir.org/2025/1/e60520 %U https://doi.org/10.2196/60520 %U http://www.ncbi.nlm.nih.gov/pubmed/39576242 %0 Journal Article %@ 2369-2529 %I JMIR Publications %V 12 %N %P e63641 %T User Acceptance of a Home Robotic Assistant for Individuals With Physical Disabilities: Explorative Qualitative Study %A Sørensen,Linda %A Sagen Johannesen,Dag Tomas %A Melkas,Helinä %A Johnsen,Hege Mari %+ Department of Health and Nursing Science, Faculty of Health and Sport Sciences, University of Agder, Postbox 422, Kristiansand, 4604, Norway, 47 99473420, linda.sorensen@uia.no %K physical artificial intelligence %K physical AI %K health care robotics %K assistive technology %K content analysis %K qualitative %K health care %K robotics %K assistive %K robot interaction %K physical disabilities %K readiness %K amputations %D 2025 %7 13.1.2025 %9 Original Paper %J JMIR Rehabil Assist Technol %G English %X Background: Health care is shifting toward 5 proactive approaches: personalized, participatory, preventive, predictive, and precision-focused services (P5 medicine). This patient-centered care leverages technologies such as artificial intelligence (AI)–powered robots, which can personalize and enhance services for users with disabilities. These advancements are crucial given the World Health Organization’s projection of a global shortage of up to 10 million health care workers by 2030. Objective: This study aimed to investigate the acceptance of a humanoid assistive robot among users with physical disabilities during (1) AI-powered (using a Wizard of Oz methodology) robotic performance of predefined personalized assistance tasks and (2) operator-controlled robotic performance (simulated distant service). Methods: An explorative qualitative design was used, involving user testing in a simulated home environment and individual interviews. Directed content analysis was based on the Almere model and the model of domestic social robot acceptance. Results: Nine participants with physical disabilities aged 27 to 78 years engaged in robot interactions. They shared their perceptions across 7 acceptance concepts: hedonic attitudes, utilitarian attitudes, personal norms, social norms, control beliefs, facilitating conditions, and intention to use. Participants valued the robot’s usefulness for practical services but not for personal care. They preferred automation but accepted remote control of the robot for some tasks. Privacy concerns were mixed. Conclusions: This study highlights the complex interplay of functional expectations, technological readiness, and personal and societal norms affecting the acceptance of physically assistive robots. Participants were generally positive about robotic assistance as it increases independence and lessens the need for human caregivers, although they acknowledged some current shortcomings. They were open to trying more home testing if future robots could perform most tasks autonomously. AI-powered robots offer new possibilities for creating more adaptable and personalized assistive technologies, potentially enhancing their effectiveness and viability for individuals with disabilities. %M 39805579 %R 10.2196/63641 %U https://rehab.jmir.org/2025/1/e63641 %U https://doi.org/10.2196/63641 %U http://www.ncbi.nlm.nih.gov/pubmed/39805579 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e50852 %T AI Interventions to Alleviate Healthcare Shortages and Enhance Work Conditions in Critical Care: Qualitative Analysis %A Bienefeld,Nadine %A Keller,Emanuela %A Grote,Gudela %+ ETH Zurich, Weinbergstrasse 56/58, Zurich, 8093, Switzerland, 41 44 632 70 78, nbienefeld@ethz.ch %K artificial intelligence %K AI %K work design %K sociotechnical system %K work %K job %K occupational health %K sociotechnical %K new work %K future of work %K satisfaction %K health care professionals %K intensive care %K ICU %K stress mitigation %K worker %K employee %K stress %K health care professional %K overburdened %K burden %K burnout %K autonomy %K competence %K flexible %K task %K workplace %K hospital %D 2025 %7 13.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: The escalating global scarcity of skilled health care professionals is a critical concern, further exacerbated by rising stress levels and clinician burnout rates. Artificial intelligence (AI) has surfaced as a potential resource to alleviate these challenges. Nevertheless, it is not taken for granted that AI will inevitably augment human performance, as ill-designed systems may inadvertently impose new burdens on health care workers, and implementation may be challenging. An in-depth understanding of how AI can effectively enhance rather than impair work conditions is therefore needed. Objective: This research investigates the efficacy of AI in alleviating stress and enriching work conditions, using intensive care units (ICUs) as a case study. Through a sociotechnical system lens, we delineate how AI systems, tasks, and responsibilities of ICU nurses and physicians can be co-designed to foster motivating, resilient, and health-promoting work. Methods: We use the sociotechnical system framework COMPASS (Complementary Analysis of Sociotechnical Systems) to assess 5 job characteristics: autonomy, skill diversity, flexibility, problem-solving opportunities, and task variety. The qualitative analysis is underpinned by extensive workplace observation in 6 ICUs (approximately 559 nurses and physicians), structured interviews with work unit leaders (n=12), and a comparative analysis of data science experts’ and clinicians’ evaluation of the optimal levels of human-AI teaming. Results: The results indicate that AI holds the potential to positively impact work conditions for ICU nurses and physicians in four key areas. First, autonomy is vital for stress reduction, motivation, and performance improvement. AI systems that ensure transparency, predictability, and human control can reinforce or amplify autonomy. Second, AI can encourage skill diversity and competence development, thus empowering clinicians to broaden their skills, increase the polyvalence of tasks across professional boundaries, and improve interprofessional cooperation. However, careful consideration is required to avoid the deskilling of experienced professionals. Third, AI automation can expand flexibility by relieving clinicians from administrative duties, thereby concentrating their efforts on patient care. Remote monitoring and improved scheduling can help integrate work with other life domains. Fourth, while AI may reduce problem-solving opportunities in certain areas, it can open new pathways, particularly for nurses. Finally, task identity and variety are essential job characteristics for intrinsic motivation and worker engagement but could be compromised depending on how AI tools are designed and implemented. Conclusions: This study demonstrates AI’s capacity to mitigate stress and improve work conditions for ICU nurses and physicians, thereby contributing to resolving health care staffing shortages. AI solutions that are thoughtfully designed in line with the principles for good work design can enhance intrinsic motivation, learning, and worker well-being, thus providing strategic value for hospital management, policy makers, and health care professionals alike. %M 39805110 %R 10.2196/50852 %U https://www.jmir.org/2025/1/e50852 %U https://doi.org/10.2196/50852 %U http://www.ncbi.nlm.nih.gov/pubmed/39805110 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e63004 %T Clinical Decision Support Using Speech Signal Analysis: Systematic Scoping Review of Neurological Disorders %A De Silva,Upeka %A Madanian,Samaneh %A Olsen,Sharon %A Templeton,John Michael %A Poellabauer,Christian %A Schneider,Sandra L %A Narayanan,Ajit %A Rubaiat,Rahmina %+ Department of Computer Science and Software Engineering, Auckland University of Technology, 55 Wellesley Street East, Auckland CBD, Auckland 1010, Auckland, 1010, New Zealand, 64 09 9219999 ext 6539, sam.madanian@aut.ac.nz %K digital health %K health informatics %K digital biomarker %K speech analytics %K artificial intelligence %K machine learning %D 2025 %7 13.1.2025 %9 Review %J J Med Internet Res %G English %X Background: Digital biomarkers are increasingly used in clinical decision support for various health conditions. Speech features as digital biomarkers can offer insights into underlying physiological processes due to the complexity of speech production. This process involves respiration, phonation, articulation, and resonance, all of which rely on specific motor systems for the preparation and execution of speech. Deficits in any of these systems can cause changes in speech signal patterns. Increasing efforts are being made to develop speech-based clinical decision support systems. Objective: This systematic scoping review investigated the technological revolution and recent digital clinical speech signal analysis trends to understand the key concepts and research processes from clinical and technical perspectives. Methods: A systematic scoping review was undertaken in 6 databases guided by a set of research questions. Articles that focused on speech signal analysis for clinical decision-making were identified, and the included studies were analyzed quantitatively. A narrower scope of studies investigating neurological diseases were analyzed using qualitative content analysis. Results: A total of 389 articles met the initial eligibility criteria, of which 72 (18.5%) that focused on neurological diseases were included in the qualitative analysis. In the included studies, Parkinson disease, Alzheimer disease, and cognitive disorders were the most frequently investigated conditions. The literature explored the potential of speech feature analysis in diagnosis, differentiating between, assessing the severity and monitoring the treatment of neurological conditions. The common speech tasks used were sustained phonations, diadochokinetic tasks, reading tasks, activity-based tasks, picture descriptions, and prompted speech tasks. From these tasks, conventional speech features (such as fundamental frequency, jitter, and shimmer), advanced digital signal processing–based speech features (such as wavelet transformation–based features), and spectrograms in the form of audio images were analyzed. Traditional machine learning and deep learning approaches were used to build predictive models, whereas statistical analysis assessed variable relationships and reliability of speech features. Model evaluations primarily focused on analytical validations. A significant research gap was identified: the need for a structured research process to guide studies toward potential technological intervention in clinical settings. To address this, a research framework was proposed that adapts a design science research methodology to guide research studies systematically. Conclusions: The findings highlight how data science techniques can enhance speech signal analysis to support clinical decision-making. By combining knowledge from clinical practice, speech science, and data science within a structured research framework, future research may achieve greater clinical relevance. %M 39804693 %R 10.2196/63004 %U https://www.jmir.org/2025/1/e63004 %U https://doi.org/10.2196/63004 %U http://www.ncbi.nlm.nih.gov/pubmed/39804693 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 9 %N %P e58421 %T Predicting Age and Visual-Motor Integration Using Origami Photographs: Deep Learning Study %A Huang,Chien-Yu %A Yu,Yen-Ting %A Chen,Kuan-Lin %A Lien,Jenn-Jier %A Lin,Gong-Hong %A Hsieh,Ching-Lin %K artificial intelligence %K origami %K child development screening %K child development %K visual motor integration %K children %K developmental status %K activity performance %K deep learning %D 2025 %7 10.1.2025 %9 %J JMIR Form Res %G English %X Background: Origami is a popular activity among preschool children and can be used by therapists as an evaluation tool to assess children’s development in clinical settings. It is easy to implement, appealing to children, and time-efficient, requiring only simple materials—pieces of paper. Furthermore, the products of origami may reflect children’s ages and their visual-motor integration (VMI) development. However, therapists typically evaluate children’s origami creations based primarily on their personal background knowledge and clinical experience, leading to subjective and descriptive feedback. Consequently, the effectiveness of using origami products to determine children’s age and VMI development lacks empirical support. Objective: This study had two main aims. First, we sought to apply artificial intelligence (AI) techniques to origami products to predict children’s ages and VMI development, including VMI level (standardized scores) and VMI developmental status (typical, borderline, or delayed). Second, we explored the performance of the AI models using all combinations of photographs taken from different angles. Methods: A total of 515 children aged 2-6 years were recruited and divided into training and testing groups at a 4:1 ratio. Children created origami dogs, which were photographed from 8 different angles. The Beery–Buktenica Developmental Test of Visual-Motor Integration, 6th Edition, was used to assess the children’s VMI levels and developmental status. Three AI models—ResNet-50, XGBoost, and a multilayer perceptron—were combined sequentially to predict age z scores and VMI z scores using the training group. The trained models were then tested using the testing group, and the accuracy of the predicted VMI developmental status was also calculated. Results: The R2 of the age and the VMI trained models ranged from 0.50 to 0.73 and from 0.50 to 0.66, respectively. The AI models that obtained an R2>0.70 for the age model and an R2>0.60 for the VMI model were selected for model testing. Those models were further examined for the accuracy of the VMI developmental status, the correlations, and the mean absolute error (MAE) of both the age and the VMI models. The accuracy of the VMI developmental status was about 71%-76%. The correlations between the final predicted age z score and the real age z score ranged from 0.84 to 0.85, and the correlations of the final predicted VMI z scores to the real z scores ranged from 0.77 to 0.81. The MAE of the age models ranged from 0.42 to 0.46 and those of the VMI models ranged from 0.43 to 0.48. Conclusion: Our findings indicate that AI techniques have a significant potential for predicting children’s development. The insights provided by AI may assist therapists in better interpreting children’s performance in activities. %R 10.2196/58421 %U https://formative.jmir.org/2025/1/e58421 %U https://doi.org/10.2196/58421 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 11 %N %P e62669 %T Awareness and Attitude Toward Artificial Intelligence Among Medical Students and Pathology Trainees: Survey Study %A Rjoop,Anwar %A Al-Qudah,Mohammad %A Alkhasawneh,Raja %A Bataineh,Nesreen %A Abdaljaleel,Maram %A Rjoub,Moayad A %A Alkhateeb,Mustafa %A Abdelraheem,Mohammad %A Al-Omari,Salem %A Bani-Mari,Omar %A Alkabalan,Anas %A Altulaih,Saoud %A Rjoub,Iyad %A Alshimi,Rula %K artificial intelligence %K AI %K deep learning %K medical schools %K pathology %K Jordan %K medical education %K awareness %K attitude %K medical students %K pathology trainees %K national survey study %K medical practice %K training %K web-based survey %K survey %K questionnaire %D 2025 %7 10.1.2025 %9 %J JMIR Med Educ %G English %X Background: Artificial intelligence (AI) is set to shape the future of medical practice. The perspective and understanding of medical students are critical for guiding the development of educational curricula and training. Objective: This study aims to assess and compare medical AI-related attitudes among medical students in general medicine and in one of the visually oriented fields (pathology), along with illuminating their anticipated role of AI in the rapidly evolving landscape of AI-enhanced health care. Methods: This was a cross-sectional study that used a web-based survey composed of a closed-ended questionnaire. The survey addressed medical students at all educational levels across the 5 public medical schools, along with pathology residents in 4 residency programs in Jordan. Results: A total of 394 respondents participated (328 medical students and 66 pathology residents). The majority of respondents (272/394, 69%) were already aware of AI and deep learning in medicine, mainly relying on websites for information on AI, while only 14% (56/394) were aware of AI through medical schools. There was a statistically significant difference in awareness among respondents who consider themselves tech experts compared with those who do not (P=.03). More than half of the respondents believed that AI could be used to diagnose diseases automatically (213/394, 54.1% agreement), with medical students agreeing more than pathology residents (P=.04). However, more than one-third expressed fear about recent AI developments (167/394, 42.4% agreed). Two-thirds of respondents disagreed that their medical schools had educated them about AI and its potential use (261/394, 66.2% disagreed), while 46.2% (182/394) expressed interest in learning about AI in medicine. In terms of pathology-specific questions, 75.4% (297/394) agreed that AI could be used to identify pathologies in slide examinations automatically. There was a significant difference between medical students and pathology residents in their agreement (P=.001). Overall, medical students and pathology trainees had similar responses. Conclusions: AI education should be introduced into medical school curricula to improve medical students’ understanding and attitudes. Students agreed that they need to learn about AI’s applications, potential hazards, and legal and ethical implications. This is the first study to analyze medical students’ views and awareness of AI in Jordan, as well as the first to include pathology residents’ perspectives. The findings are consistent with earlier research internationally. In comparison with prior research, these attitudes are similar in low-income and industrialized countries, highlighting the need for a global strategy to introduce AI instruction to medical students everywhere in this era of rapidly expanding technology. %R 10.2196/62669 %U https://mededu.jmir.org/2025/1/e62669 %U https://doi.org/10.2196/62669 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 4 %N %P e67621 %T Evaluating ChatGPT’s Efficacy in Pediatric Pneumonia Detection From Chest X-Rays: Comparative Analysis of Specialized AI Models %A Chetla,Nitin %A Tandon,Mihir %A Chang,Joseph %A Sukhija,Kunal %A Patel,Romil %A Sanchez,Ramon %+ Department of Orthopaedics, Albany Medical College, 43 New Scotland Ave, Albany, NY, 12208, United States, 1 3322488708, tandonm@amc.edu %K artificial intelligence %K ChatGPT %K pneumonia %K chest x-ray %K pediatric %K radiology %K large language models %K machine learning %K pneumonia detection %K diagnosis %K pediatric pneumonia %D 2025 %7 10.1.2025 %9 Research Letter %J JMIR AI %G English %X %M 39793007 %R 10.2196/67621 %U https://ai.jmir.org/2025/1/e67621 %U https://doi.org/10.2196/67621 %U http://www.ncbi.nlm.nih.gov/pubmed/39793007 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e65725 %T Business Venturing in Regulated Markets—Taxonomy and Archetypes of Digital Health Business Models in the European Union: Mixed Methods Descriptive and Exploratory Study %A Weimar,Sascha Noel %A Martjan,Rahel Sophie %A Terzidis,Orestis %+ Institute for Entrepreneurship, Technology Management and Innovation (EnTechnon), Karlsruhe Institute of Technology (KIT), Fritz-Erler-Str. 1-3, Karlsruhe, 76133, Germany, 49 721 608 47341, sascha.weimar@kit.edu %K digital health %K telemedicine %K mobile health %K business model %K European Union %K classification %K archetypes %K medical device regulations %K mobile phone %K artificial intelligence %K AI %D 2025 %7 9.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Digital health technology (DHT) has the potential to revolutionize the health care industry by reducing costs and improving the quality of care in a sector that faces significant challenges. However, the health care industry is complex, involving numerous stakeholders, and subject to extensive regulation. Within the European Union, medical device regulations impose stringent requirements on various ventures. Concurrently, new reimbursement pathways are also being developed for DHTs. In this dynamic context, establishing a sustainable and innovative business model around DHTs is fundamental for their successful commercialization. However, there is a notable lack of structured understanding regarding the overarching business models within the digital health sector. Objective: This study aims to address this gap and identify key elements and configurations of business models for DHTs in the European Union, thereby establishing a structured understanding of the archetypal business models in use. Methods: The study was conducted in 2 phases. First, a business model taxonomy for DHTs was developed based on a systematic literature review, the analysis of 169 European real-world business models, and qualitative evaluation through 13 expert interviews. Subsequently, a 2-step clustering analysis was conducted on the 169 DHT business models to identify distinct business model archetypes. Results: The developed taxonomy of DHT business models revealed 11 central dimensions organized into 4 meta-dimensions. Each dimension comprises 2 to 9 characteristics capturing relevant aspects of DHT business models. In addition, 6 archetypes of DHT business models were identified: administration and communication supporter (A1), insurer-to-consumer digital therapeutics and care (A2), diagnostic and treatment enabler (A3), professional monitoring platforms (A4), clinical research and solution accelerators (A5), and direct-to-consumer wellness and lifestyle (A6). Conclusions: The findings highlight the critical elements constituting business models in the DHT domain, emphasizing the substantial impact of medical device regulations and revenue models, which often involve reimbursement from stakeholders such as health insurers. Three drivers contributing to DHT business model innovation were identified: direct targeting of patients and private individuals, use of artificial intelligence as an enabler, and development of DHT-specific reimbursement pathways. The study also uncovered surprising business model patterns, including shifts between regulated medical devices and unregulated research applications, as well as wellness and lifestyle solutions. This research enriches the understanding of business models in digital health, offering valuable insights for researchers and digital health entrepreneurs. %M 39787596 %R 10.2196/65725 %U https://www.jmir.org/2025/1/e65725 %U https://doi.org/10.2196/65725 %U http://www.ncbi.nlm.nih.gov/pubmed/39787596 %0 Journal Article %@ 2561-9128 %I JMIR Publications %V 8 %N %P e59422 %T Development and Validation of a Routine Electronic Health Record-Based Delirium Prediction Model for Surgical Patients Without Dementia: Retrospective Case-Control Study %A Holler,Emma %A Ludema,Christina %A Ben Miled,Zina %A Rosenberg,Molly %A Kalbaugh,Corey %A Boustani,Malaz %A Mohanty,Sanjay %+ Department of Surgery, Indiana University School of Medicine, 545 Barnhill Drive, Indianapolis, IN, 46202, United States, 1 317 944 5376, emorone@iu.edu %K delirium %K machine learning %K prediction %K postoperative %K algorithm %K electronic health records %K surgery %K risk prediction %D 2025 %7 9.1.2025 %9 Original Paper %J JMIR Perioper Med %G English %X Background: Postoperative delirium (POD) is a common complication after major surgery and is associated with poor outcomes in older adults. Early identification of patients at high risk of POD can enable targeted prevention efforts. However, existing POD prediction models require inpatient data collected during the hospital stay, which delays predictions and limits scalability. Objective: This study aimed to develop and externally validate a machine learning-based prediction model for POD using routine electronic health record (EHR) data. Methods: We identified all surgical encounters from 2014 to 2021 for patients aged 50 years and older who underwent an operation requiring general anesthesia, with a length of stay of at least 1 day at 3 Indiana hospitals. Patients with preexisting dementia or mild cognitive impairment were excluded. POD was identified using Confusion Assessment Method records and delirium International Classification of Diseases (ICD) codes. Controls without delirium or nurse-documented confusion were matched to cases by age, sex, race, and year of admission. We trained logistic regression, random forest, extreme gradient boosting (XGB), and neural network models to predict POD using 143 features derived from routine EHR data available at the time of hospital admission. Separate models were developed for each hospital using surveillance periods of 3 months, 6 months, and 1 year before admission. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC). Each model was internally validated using holdout data and externally validated using data from the other 2 hospitals. Calibration was assessed using calibration curves. Results: The study cohort included 7167 delirium cases and 7167 matched controls. XGB outperformed all other classifiers. AUROCs were highest for XGB models trained on 12 months of preadmission data. The best-performing XGB model achieved a mean AUROC of 0.79 (SD 0.01) on the holdout set, which decreased to 0.69-0.74 (SD 0.02) when externally validated on data from other hospitals. Conclusions: Our routine EHR-based POD prediction models demonstrated good predictive ability using a limited set of preadmission and surgical variables, though their generalizability was limited. The proposed models could be used as a scalable, automated screening tool to identify patients at high risk of POD at the time of hospital admission. %M 39786865 %R 10.2196/59422 %U https://periop.jmir.org/2025/1/e59422 %U https://doi.org/10.2196/59422 %U http://www.ncbi.nlm.nih.gov/pubmed/39786865 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e63924 %T Performance of Artificial Intelligence Chatbots on Ultrasound Examinations: Cross-Sectional Comparative Analysis %A Zhang,Yong %A Lu,Xiao %A Luo,Yan %A Zhu,Ying %A Ling,Wenwu %K chatbots %K ChatGPT %K ERNIE Bot %K performance %K accuracy rates %K ultrasound %K language %K examination %D 2025 %7 9.1.2025 %9 %J JMIR Med Inform %G English %X Background: Artificial intelligence chatbots are being increasingly used for medical inquiries, particularly in the field of ultrasound medicine. However, their performance varies and is influenced by factors such as language, question type, and topic. Objective: This study aimed to evaluate the performance of ChatGPT and ERNIE Bot in answering ultrasound-related medical examination questions, providing insights for users and developers. Methods: We curated 554 questions from ultrasound medicine examinations, covering various question types and topics. The questions were posed in both English and Chinese. Objective questions were scored based on accuracy rates, whereas subjective questions were rated by 5 experienced doctors using a Likert scale. The data were analyzed in Excel. Results: Of the 554 questions included in this study, single-choice questions comprised the largest share (354/554, 64%), followed by short answers (69/554, 12%) and noun explanations (63/554, 11%). The accuracy rates for objective questions ranged from 8.33% to 80%, with true or false questions scoring highest. Subjective questions received acceptability rates ranging from 47.62% to 75.36%. ERNIE Bot was superior to ChatGPT in many aspects (P<.05). Both models showed a performance decline in English, but ERNIE Bot’s decline was less significant. The models performed better in terms of basic knowledge, ultrasound methods, and diseases than in terms of ultrasound signs and diagnosis. Conclusions: Chatbots can provide valuable ultrasound-related answers, but performance differs by model and is influenced by language, question type, and topic. In general, ERNIE Bot outperforms ChatGPT. Users and developers should understand model performance characteristics and select appropriate models for different questions and languages to optimize chatbot use. %R 10.2196/63924 %U https://medinform.jmir.org/2025/1/e63924 %U https://doi.org/10.2196/63924 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 9 %N %P e57395 %T Public Health Discussions on Social Media: Evaluating Automated Sentiment Analysis Methods %A Gandy,Lisa M %A Ivanitskaya,Lana V %A Bacon,Leeza L %A Bizri-Baryak,Rodina %+ Department of Computer Science, College of Sciences and Liberal Arts, Kettering University, 1700 University Ave, 2300 AB, Flint, MI, 48504, United States, 1 9898547001, lgandy@kettering.edu %K ChatGPT %K VADER %K valence aware dictionary for sentiment reasoning %K LIWC-22 %K machine learning %K social media %K sentiment analysis %K public health %K population health %K opioids %K drugs %K pharmacotherapy %K pharmaceuticals %K medications %K YouTube %D 2025 %7 8.1.2025 %9 Original Paper %J JMIR Form Res %G English %X Background: Sentiment analysis is one of the most widely used methods for mining and examining text. Social media researchers need guidance on choosing between manual and automated sentiment analysis methods. Objective: Popular sentiment analysis tools based on natural language processing (NLP; VADER [Valence Aware Dictionary for Sentiment Reasoning], TEXT2DATA [T2D], and Linguistic Inquiry and Word Count [LIWC-22]), and a large language model (ChatGPT 4.0) were compared with manually coded sentiment scores, as applied to the analysis of YouTube comments on videos discussing the opioid epidemic. Sentiment analysis methods were also examined regarding ease of programming, monetary cost, and other practical considerations. Methods: Evaluation methods included descriptive statistics, receiver operating characteristic (ROC) curve analysis, confusion matrices, Cohen κ, accuracy, specificity, precision, sensitivity (recall), F1-score harmonic mean, and the Matthews correlation coefficient. An inductive, iterative approach to content analysis of the data was used to obtain manual sentiment codes. Results: A subset of comments were analyzed by a second coder, producing good agreement between the 2 coders’ judgments (κ=0.734). YouTube social media about the opioid crisis had many more negative comments (4286/4871, 88%) than positive comments (79/662, 12%), making it possible to evaluate the performance of sentiment analysis models in an unbalanced dataset. The tone summary measure from LIWC-22 performed better than other tools for estimating the prevalence of negative versus positive sentiment. According to the ROC curve analysis, VADER was best at classifying manually coded negative comments. A comparison of Cohen κ values indicated that NLP tools (VADER, followed by LIWC’s tone and T2D) showed only fair agreement with manual coding. In contrast, ChatGPT 4.0 had poor agreement and failed to generate binary sentiment scores in 2 out of 3 attempts. Variations in accuracy, specificity, precision, sensitivity, F1-score, and MCC did not reveal a single superior model. F1-score harmonic means were 0.34-0.38 (SD 0.02) for NLP tools and very low (0.13) for ChatGPT 4.0. None of the MCCs reached a strong correlation level. Conclusions: Researchers studying negative emotions, public worries, or dissatisfaction with social media face unique challenges in selecting models suitable for unbalanced datasets. We recommend VADER, the only cost-free tool we evaluated, due to its excellent discrimination, which can be further improved when the comments are at least 100 characters long. If estimating the prevalence of negative comments in an unbalanced dataset is important, we recommend the tone summary measure from LIWC-22. Researchers using T2D must know that it may only score some data and, compared with other methods, be more time-consuming and cost-prohibitive. A general-purpose large language model, ChatGPT 4.0, has yet to surpass the performance of NLP models, at least for unbalanced datasets with highly prevalent (7:1) negative comments. %R 10.2196/57395 %U https://formative.jmir.org/2025/1/e57395 %U https://doi.org/10.2196/57395 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e60269 %T Bias Mitigation in Primary Health Care Artificial Intelligence Models: Scoping Review %A Sasseville,Maxime %A Ouellet,Steven %A Rhéaume,Caroline %A Sahlia,Malek %A Couture,Vincent %A Després,Philippe %A Paquette,Jean-Sébastien %A Darmon,David %A Bergeron,Frédéric %A Gagnon,Marie-Pierre %+ Faculté des sciences infirmières, Université Laval, 1050 Av. de la Médecine, Québec, QC, G1V 0A6, Canada, 1 418 656 3356, maxime.sasseville@fsi.ulaval.ca %K artificial intelligence %K AI %K algorithms %K expert system %K decision support %K bias %K community health services %K primary health care %K health disparities %K social equity %K scoping review %D 2025 %7 7.1.2025 %9 Review %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) predictive models in primary health care have the potential to enhance population health by rapidly and accurately identifying individuals who should receive care and health services. However, these models also carry the risk of perpetuating or amplifying existing biases toward diverse groups. We identified a gap in the current understanding of strategies used to assess and mitigate bias in primary health care algorithms related to individuals’ personal or protected attributes. Objective: This study aimed to describe the attempts, strategies, and methods used to mitigate bias in AI models within primary health care, to identify the diverse groups or protected attributes considered, and to evaluate the results of these approaches on both bias reduction and AI model performance. Methods: We conducted a scoping review following Joanna Briggs Institute (JBI) guidelines, searching Medline (Ovid), CINAHL (EBSCO), PsycINFO (Ovid), and Web of Science databases for studies published between January 1, 2017, and November 15, 2022. Pairs of reviewers independently screened titles and abstracts, applied selection criteria, and performed full-text screening. Discrepancies regarding study inclusion were resolved by consensus. Following reporting standards for AI in health care, we extracted data on study objectives, model features, targeted diverse groups, mitigation strategies used, and results. Using the mixed methods appraisal tool, we appraised the quality of the studies. Results: After removing 585 duplicates, we screened 1018 titles and abstracts. From the remaining 189 full-text articles, we included 17 studies. The most frequently investigated protected attributes were race (or ethnicity), examined in 12 of the 17 studies, and sex (often identified as gender), typically classified as “male versus female” in 10 of the studies. We categorized bias mitigation approaches into four clusters: (1) modifying existing AI models or datasets, (2) sourcing data from electronic health records, (3) developing tools with a “human-in-the-loop” approach, and (4) identifying ethical principles for informed decision-making. Algorithmic preprocessing methods, such as relabeling and reweighing data, along with natural language processing techniques that extract data from unstructured notes, showed the greatest potential for bias mitigation. Other methods aimed at enhancing model fairness included group recalibration and the application of the equalized odds metric. However, these approaches sometimes exacerbated prediction errors across groups or led to overall model miscalibrations. Conclusions: The results suggest that biases toward diverse groups are more easily mitigated when data are open-sourced, multiple stakeholders are engaged, and during the algorithm’s preprocessing stage. Further empirical studies that include a broader range of groups, such as Indigenous peoples in Canada, are needed to validate and expand upon these findings. Trial Registration: OSF Registry osf.io/9ngz5/; https://osf.io/9ngz5/ International Registered Report Identifier (IRRID): RR2-10.2196/46684 %M 39773888 %R 10.2196/60269 %U https://www.jmir.org/2025/1/e60269 %U https://doi.org/10.2196/60269 %U http://www.ncbi.nlm.nih.gov/pubmed/39773888 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 8 %N %P e60827 %T The Comparative Sufficiency of ChatGPT, Google Bard, and Bing AI in Answering Diagnosis, Treatment, and Prognosis Questions About Common Dermatological Diagnoses %A Chau,Courtney A %A Feng,Hao %A Cobos,Gabriela %A Park,Joyce %K artificial intelligence %K AI %K ChatGPT %K atopic dermatitis %K acne vulgaris %K cyst %K actinic keratosis %K rosacea %K diagnosis %K treatment %K prognosis %K dermatological %K patient %K chatbot %K dermatologist %D 2025 %7 7.1.2025 %9 %J JMIR Dermatol %G English %X Our team explored the utility of unpaid versions of 3 artificial intelligence chatbots in offering patient-facing responses to questions about 5 common dermatological diagnoses, and highlighted the strengths and limitations of different artificial intelligence chatbots, while demonstrating how chatbots presented the most potential in tandem with dermatologists’ diagnosis. %R 10.2196/60827 %U https://derma.jmir.org/2025/1/e60827 %U https://doi.org/10.2196/60827 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 9 %N %P e57624 %T The Trifecta of Industry, Academic, and Health System Partnership to Improve Mental Health Care Through Smartphone-Based Remote Patient Monitoring: Development and Usability Study %A Epperson,C Neill %A Davis,Rachel %A Dempsey,Allison %A Haller,Heinrich C %A Kupfer,David J %A Love,Tiffany %A Villarreal,Pamela M %A Matthews,Mark %A Moore,Susan L %A Muller,Kimberly %A Schneck,Christopher D %A Scott,Jessica L %A Zane,Richard D %A Frank,Ellen %+ Department of Psychiatry, School of Medicine, University of Colorado Anschutz Medical Campus, 1890 N Revere Ct, Suite 4003, Mail Stop F546, Aurora, CO, 80045, United States, 1 303 724 4940, neill.epperson@cuanschutz.edu %K digital health %K mobile intervention %K telepsychiatry %K artificial intelligence %K psychiatry %K mental health %K depression %K mood %K bipolar %K monitor %K diagnostic tool %K diagnosis %K electronic health record %K EHR %K alert %K notification %K prediction %K mHealth %K mobile health %K smartphone %K passive %K self-reported %K patient generated %D 2025 %7 7.1.2025 %9 Original Paper %J JMIR Form Res %G English %X Background: Mental health treatment is hindered by the limited number of mental health care providers and the infrequency of care. Digital mental health technology can help supplement treatment by remotely monitoring patient symptoms and predicting mental health crises in between clinical visits. However, the feasibility of digital mental health technologies has not yet been sufficiently explored. Rhythms, from the company Health Rhythms, is a smartphone platform that uses passively acquired smartphone data with artificial intelligence and predictive analytics to alert patients and providers to an emerging mental health crisis. Objective: The objective of this study was to test the feasibility and acceptability of Rhythms among patients attending an academic psychiatric outpatient clinic. Methods: Our group embedded Rhythms into the electronic health record of a large health system. Patients with a diagnosis of major depressive disorder, bipolar disorder, or other mood disorder were contacted online and enrolled for a 6-week trial of Rhythms. Participants provided data by completing electronic surveys as well as by active and passive use of Rhythms. Emergent and urgent alerts were monitored and managed according to passively collected data and patient self-ratings. A purposively sampled group of participants also participated in qualitative interviews about their experience with Rhythms at the end of the study. Results: Of the 104 participants, 89 (85.6%) completed 6 weeks of monitoring. The majority of the participants were women (72/104, 69.2%), White (84/104, 80.8%), and non-Hispanic (100/104, 96.2%) and had a diagnosis of major depressive disorder (71/104, 68.3%). Two emergent alerts and 19 urgent alerts were received and managed according to protocol over 16 weeks. More than two-thirds (63/87, 72%) of those participating continued to use Rhythms after study completion. Comments from participants indicated appreciation for greater self-awareness and provider connection, while providers reported that Rhythms provided a more nuanced understanding of patient experience between clinical visits. Conclusions: Rhythms is a user-friendly, electronic health record–adaptable, smartphone-based tool that provides patients and providers with a greater understanding of patient mental health status. Integration of Rhythms into health systems has the potential to facilitate mental health care and improve the experience of both patients and providers. %M 39773396 %R 10.2196/57624 %U https://formative.jmir.org/2025/1/e57624 %U https://doi.org/10.2196/57624 %U http://www.ncbi.nlm.nih.gov/pubmed/39773396 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e62768 %T The Willingness of Doctors to Adopt Artificial Intelligence–Driven Clinical Decision Support Systems at Different Hospitals in China: Fuzzy Set Qualitative Comparative Analysis of Survey Data %A Yu,Zhongguang %A Hu,Ning %A Zhao,Qiuyi %A Hu,Xiang %A Jia,Cunbo %A Zhang,Chunyu %A Liu,Bing %A Li,Yanping %+ Economics and Management School, Wuhan University, 299 Bayi Road, Wuchang District, Wuhan, 430072, China, 86 68753084, ypli@whu.edu.cn %K artificial intelligence %K clinical decision support systems %K willingness %K technology adoption %K fuzzy set qualitative comparative analysis %K fsQCA %K pathways %D 2025 %7 7.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence–driven clinical decision support systems (AI-CDSSs) are pivotal tools for doctors to improve diagnostic and treatment processes, as well as improve the efficiency and quality of health care services. However, not all doctors trust artificial intelligence (AI) technology, and many remain skeptical and unwilling to adopt these systems. Objective: This study aimed to explore in depth the factors influencing doctors’ willingness to adopt AI-CDSSs and assess the causal relationships among these factors to gain a better understanding for promoting the clinical application and widespread implementation of these systems. Methods: Based on the unified theory of acceptance and use of technology (UTAUT) and the technology-organization-environment (TOE) framework, we have proposed and designed a framework for doctors’ willingness to adopt AI-CDSSs. We conducted a nationwide questionnaire survey in China and performed fuzzy set qualitative comparative analysis to explore the willingness of doctors to adopt AI-CDSSs in different types of medical institutions and assess the factors influencing their willingness. Results: The survey was administered to doctors working in tertiary hospitals and primary/secondary hospitals across China. We received 450 valid responses out of 578 questionnaires distributed, indicating a robust response rate of 77.9%. Our analysis of the influencing factors and adoption pathways revealed that doctors in tertiary hospitals exhibited 6 distinct pathways for AI-CDSS adoption, which were centered on technology-driven pathways, individual-driven pathways, and technology-individual dual-driven pathways. Doctors in primary/secondary hospitals demonstrated 3 adoption pathways, which were centered on technology-individual and organization-individual dual-driven pathways. There were commonalities in the factors influencing adoption across different medical institutions, such as the positive perception of AI technology’s utility and individual readiness to try new technologies. There were also variations in the influence of facilitating conditions among doctors at different medical institutions, especially primary/secondary hospitals. Conclusions: From the perspective of the 6 pathways for doctors at tertiary hospitals and the 3 pathways for doctors at primary/secondary hospitals, performance expectancy and personal innovativeness were 2 indispensable and core conditions in the pathways to achieving favorable willingness to adopt AI-CDSSs. %M 39773696 %R 10.2196/62768 %U https://www.jmir.org/2025/1/e62768 %U https://doi.org/10.2196/62768 %U http://www.ncbi.nlm.nih.gov/pubmed/39773696 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e59069 %T Revolutionizing Health Care: The Transformative Impact of Large Language Models in Medicine %A Zhang,Kuo %A Meng,Xiangbin %A Yan,Xiangyu %A Ji,Jiaming %A Liu,Jingqian %A Xu,Hua %A Zhang,Heng %A Liu,Da %A Wang,Jingjia %A Wang,Xuliang %A Gao,Jun %A Wang,Yuan-geng-shuo %A Shao,Chunli %A Wang,Wenyao %A Li,Jiarong %A Zheng,Ming-Qi %A Yang,Yaodong %A Tang,Yi-Da %+ Department of Cardiology and Institute of Vascular Medicine, Key Laboratory of Molecular Cardiovascular Science, Ministry of Education, Peking University Third Hospital, 49 North Garden Road, Beijing, 100191, China, 86 88396171, tangyida@bjmu.edu.cn %K large language models %K LLMs %K digital health %K medical diagnosis %K treatment %K multimodal data integration %K technological fairness %K artificial intelligence %K AI %K natural language processing %K NLP %D 2025 %7 7.1.2025 %9 Viewpoint %J J Med Internet Res %G English %X Large language models (LLMs) are rapidly advancing medical artificial intelligence, offering revolutionary changes in health care. These models excel in natural language processing (NLP), enhancing clinical support, diagnosis, treatment, and medical research. Breakthroughs, like GPT-4 and BERT (Bidirectional Encoder Representations from Transformer), demonstrate LLMs’ evolution through improved computing power and data. However, their high hardware requirements are being addressed through technological advancements. LLMs are unique in processing multimodal data, thereby improving emergency, elder care, and digital medical procedures. Challenges include ensuring their empirical reliability, addressing ethical and societal implications, especially data privacy, and mitigating biases while maintaining privacy and accountability. The paper emphasizes the need for human-centric, bias-free LLMs for personalized medicine and advocates for equitable development and access. LLMs hold promise for transformative impacts in health care. %M 39773666 %R 10.2196/59069 %U https://www.jmir.org/2025/1/e59069 %U https://doi.org/10.2196/59069 %U http://www.ncbi.nlm.nih.gov/pubmed/39773666 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e67256 %T Noninvasive Oral Hyperspectral Imaging–Driven Digital Diagnosis of Heart Failure With Preserved Ejection Fraction: Model Development and Validation Study %A Yang,Xiaomeng %A Li,Zeyan %A Lei,Lei %A Shi,Xiaoyu %A Zhang,Dingming %A Zhou,Fei %A Li,Wenjing %A Xu,Tianyou %A Liu,Xinyu %A Wang,Songyun %A Yuan,Quan %A Yang,Jian %A Wang,Xinyu %A Zhong,Yanfei %A Yu,Lilei %+ Cardiovascular Hospital, Renmin Hospital of Wuhan University, No. 238 Jiefang Road, Wuhan, 430060, China, 86 02788041911, lileiyu@whu.edu.cn %K heart failure with preserved ejection fraction %K HFpEF %K hyperspectral imaging %K HSI %K diagnostic model %K digital health %K Shapley Additive Explanations %K SHAP %K machine learning %K artificial intelligence %K AI %K cardiovascular disease %K predictive modeling %K oral health %D 2025 %7 7.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Oral microenvironmental disorders are associated with an increased risk of heart failure with preserved ejection fraction (HFpEF). Hyperspectral imaging (HSI) technology enables the detection of substances that are visually indistinguishable to the human eye, providing a noninvasive approach with extensive applications in medical diagnostics. Objective: The objective of this study is to develop and validate a digital, noninvasive oral diagnostic model for patients with HFpEF using HSI combined with various machine learning algorithms. Methods: Between April 2023 and August 2023, a total of 140 patients were recruited from Renmin Hospital of Wuhan University to serve as the training and internal testing groups for this study. Subsequently, from August 2024 to September 2024, an additional 35 patients were enrolled from Three Gorges University and Yichang Central People’s Hospital to constitute the external testing group. After preprocessing to ensure image quality, spectral and textural features were extracted from the images. We extracted 25 spectral bands from each patient image and obtained 8 corresponding texture features to evaluate the performance of 28 machine learning algorithms for their ability to distinguish control participants from participants with HFpEF. The model demonstrating the optimal performance in both internal and external testing groups was selected to construct the HFpEF diagnostic model. Hyperspectral bands significant for identifying participants with HFpEF were identified for further interpretative analysis. The Shapley Additive Explanations (SHAP) model was used to provide analytical insights into feature importance. Results: Participants were divided into a training group (n=105), internal testing group (n=35), and external testing group (n=35), with consistent baseline characteristics across groups. Among the 28 algorithms tested, the random forest algorithm demonstrated superior performance with an area under the receiver operating characteristic curve (AUC) of 0.884 and an accuracy of 82.9% in the internal testing group, as well as an AUC of 0.812 and an accuracy of 85.7% in the external testing group. For model interpretation, we used the top 25 features identified by the random forest algorithm. The SHAP analysis revealed discernible distinctions between control participants and participants with HFpEF, thereby validating the diagnostic model’s capacity to accurately identify participants with HFpEF. Conclusions: This noninvasive and efficient model facilitates the identification of individuals with HFpEF, thereby promoting early detection, diagnosis, and treatment. Our research presents a clinically advanced diagnostic framework for HFpEF, validated using independent data sets and demonstrating significant potential to enhance patient care. Trial Registration: China Clinical Trial Registry ChiCTR2300078855; https://www.chictr.org.cn/showproj.html?proj=207133 %M 39773415 %R 10.2196/67256 %U https://www.jmir.org/2025/1/e67256 %U https://doi.org/10.2196/67256 %U http://www.ncbi.nlm.nih.gov/pubmed/39773415 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 8 %N %P e63715 %T The PDC30 Chatbot—Development of a Psychoeducational Resource on Dementia Caregiving Among Family Caregivers: Mixed Methods Acceptability Study %A Cheng,Sheung-Tak %A Ng,Peter H F %+ Department of Health and Physical Education, The Education University of Hong Kong, 10 Lo Ping Road, Tai Po, China (Hong Kong), 852 29486563, takcheng@eduhk.hk %K Alzheimer %K caregiving %K chatbot %K conversational artificial intelligence %K dementia %K digital health %K health care technology %K psychoeducational %K medical innovations %K language models %K mobile phone %D 2025 %7 6.1.2025 %9 Original Paper %J JMIR Aging %G English %X Background: Providing ongoing support to the increasing number of caregivers as their needs change in the long-term course of dementia is a severe challenge to any health care system. Conversational artificial intelligence (AI) operating 24/7 may help to tackle this problem. Objective: This study describes the development of a generative AI chatbot—the PDC30 Chatbot—and evaluates its acceptability in a mixed methods study. Methods: The PDC30 Chatbot was developed using the GPT-4o large language model, with a personality agent to constrain its behavior to provide advice on dementia caregiving based on the Positive Dementia Caregiving in 30 Days Guidebook—a laypeople’s resource based on a validated training manual for dementia caregivers. The PDC30 Chatbot’s responses to 21 common questions were compared with those of ChatGPT and another chatbot (called Chatbot-B) as standards of reference. Chatbot-B was constructed using PDC30 Chatbot’s architecture but replaced the latter’s knowledge base with a collection of authoritative sources, including the World Health Organization’s iSupport, By Us For Us Guides, and 185 web pages or manuals by Alzheimer’s Association, National Institute on Aging, and UK Alzheimer’s Society. In the next phase, to assess the acceptability of the PDC30 Chatbot, 21 family caregivers used the PDC30 Chatbot for two weeks and provided ratings and comments on its acceptability. Results: Among the three chatbots, ChatGPT’s responses tended to be repetitive and not specific enough. PDC30 Chatbot and Chatbot-B, by virtue of their design, produced highly context-sensitive advice, with the former performing slightly better when the questions conveyed significant psychological distress on the part of the caregiver. In the acceptability study, caregivers found the PDC30 Chatbot highly user-friendly, and its responses quite helpful and easy to understand. They were rather satisfied with it and would strongly recommend it to other caregivers. During the 2-week trial period, the majority used the chatbot more than once per day. Thematic analysis of their written feedback revealed three major themes: helpfulness, accessibility, and improved attitude toward AI. Conclusions: The PDC30 Chatbot provides quality responses to caregiver questions, which are well-received by caregivers. Conversational AI is a viable approach to improve the support of caregivers. %R 10.2196/63715 %U https://aging.jmir.org/2025/1/e63715 %U https://doi.org/10.2196/63715 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 9 %N %P e64936 %T Real-Time Analytics and AI for Managing No-Show Appointments in Primary Health Care in the United Arab Emirates: Before-and-After Study %A AlSerkal,Yousif Mohamed %A Ibrahim,Naseem Mohamed %A Alsereidi,Aisha Suhail %A Ibrahim,Mubaraka %A Kurakula,Sudheer %A Naqvi,Sadaf Ahsan %A Khan,Yasir %A Oottumadathil,Neema Preman %K electronic health record %K EHR %K artificial intelligence %K AI %K no-show appointments %K real-time data %K primary health care %K risk prediction %K clinic waiting time %K operational efficiency %D 2025 %7 6.1.2025 %9 %J JMIR Form Res %G English %X Background: Primary health care (PHC) services face operational challenges due to high patient volumes, leading to complex management needs. Patients access services through booked appointments and walk-in visits, with walk-in visits often facing longer waiting times. No-show appointments are significant contributors to inefficiency in PHC operations, which can lead to an estimated 3%-14% revenue loss, disrupt resource allocation, and negatively impact health care quality. Emirates Health Services (EHS) PHC centers handle over 140,000 visits monthly. Baseline data indicate a 21% no-show rate and an average patient wait time exceeding 16 minutes, necessitating an advanced scheduling and resource management system to enhance patient experiences and operational efficiency. Objective: The objective of this study was to evaluate the impact of an artificial intelligence (AI)-driven solution that was integrated with an interactive real-time data dashboard on reducing no-show appointments and improving patient waiting times at the EHS PHCs. Methods: This study introduced an innovative AI-based data application to enhance PHC efficiency. Leveraging our electronic health record system, we deployed an AI model with an 86% accuracy rate to predict no-shows by analyzing historical data and categorizing appointments based on no-show risk. The model was integrated with a real-time dashboard to monitor patient journeys and wait times. Clinic coordinators used the dashboard to proactively manage high-risk appointments and optimize resource allocation. The intervention was assessed through a before-and-after comparison of PHC appointment dynamics and wait times, analyzing data from 135,393 appointments (67,429 before implementation and 67,964 after implementation). Results: Implementation of the AI-powered no-show prediction model resulted in a significant 50.7% reduction in no-show rates (P<.001). The odds ratio for no-shows after implementation was 0.43 (95% CI 0.42-0.45; P<.001), indicating a 57% reduction in the likelihood of no-shows. Additionally, patient wait times decreased by an average of 5.7 minutes overall (P<.001), with some PHCs achieving up to a 50% reduction in wait times. Conclusions: This project demonstrates that integrating AI with a data analytics platform and an electronic health record systems can significantly improve operational efficiency and patient satisfaction in PHC settings. The AI model enabled daily assessments of wait times and allowed for real-time adjustments, such as reallocating patients to different clinicians, thus reducing wait times and optimizing resource use. These findings illustrate the transformative potential of AI and real-time data analytics in health care delivery. %R 10.2196/64936 %U https://formative.jmir.org/2025/1/e64936 %U https://doi.org/10.2196/64936 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e66220 %T Two-Layer Retrieval-Augmented Generation Framework for Low-Resource Medical Question Answering Using Reddit Data: Proof-of-Concept Study %A Das,Sudeshna %A Ge,Yao %A Guo,Yuting %A Rajwal,Swati %A Hairston,JaMor %A Powell,Jeanne %A Walker,Drew %A Peddireddy,Snigdha %A Lakamana,Sahithi %A Bozkurt,Selen %A Reyna,Matthew %A Sameni,Reza %A Xiao,Yunyu %A Kim,Sangmi %A Chandler,Rasheeta %A Hernandez,Natalie %A Mowery,Danielle %A Wightman,Rachel %A Love,Jennifer %A Spadaro,Anthony %A Perrone,Jeanmarie %A Sarker,Abeed %+ Department of Biomedical Informatics, School of Medicine, Emory University, 101 Woodruff Circle, Atlanta, GA, 30322, United States, 1 4047270229, sudeshna.das@emory.edu %K retrieval-augmented generation %K substance use %K social media %K large language models %K natural language processing %K artificial intelligence %K GPT %K psychoactive substance %D 2025 %7 6.1.2025 %9 Short Paper %J J Med Internet Res %G English %X Background: The increasing use of social media to share lived and living experiences of substance use presents a unique opportunity to obtain information on side effects, use patterns, and opinions on novel psychoactive substances. However, due to the large volume of data, obtaining useful insights through natural language processing technologies such as large language models is challenging. Objective: This paper aims to develop a retrieval-augmented generation (RAG) architecture for medical question answering pertaining to clinicians’ queries on emerging issues associated with health-related topics, using user-generated medical information on social media. Methods: We proposed a two-layer RAG framework for query-focused answer generation and evaluated a proof of concept for the framework in the context of query-focused summary generation from social media forums, focusing on emerging drug-related information. Our modular framework generates individual summaries followed by an aggregated summary to answer medical queries from large amounts of user-generated social media data in an efficient manner. We compared the performance of a quantized large language model (Nous-Hermes-2-7B-DPO), deployable in low-resource settings, with GPT-4. For this proof-of-concept study, we used user-generated data from Reddit to answer clinicians’ questions on the use of xylazine and ketamine. Results: Our framework achieves comparable median scores in terms of relevance, length, hallucination, coverage, and coherence when evaluated using GPT-4 and Nous-Hermes-2-7B-DPO, evaluated for 20 queries with 76 samples. There was no statistically significant difference between GPT-4 and Nous-Hermes-2-7B-DPO for coverage (Mann-Whitney U=733.0; n1=37; n2=39; P=.89 two-tailed), coherence (U=670.0; n1=37; n2=39; P=.49 two-tailed), relevance (U=662.0; n1=37; n2=39; P=.15 two-tailed), length (U=672.0; n1=37; n2=39; P=.55 two-tailed), and hallucination (U=859.0; n1=37; n2=39; P=.01 two-tailed). A statistically significant difference was noted for the Coleman-Liau Index (U=307.5; n1=20; n2=16; P<.001 two-tailed). Conclusions: Our RAG framework can effectively answer medical questions about targeted topics and can be deployed in resource-constrained settings. %M 39761554 %R 10.2196/66220 %U https://www.jmir.org/2025/1/e66220 %U https://doi.org/10.2196/66220 %U http://www.ncbi.nlm.nih.gov/pubmed/39761554 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e63020 %T Autonomous International Classification of Diseases Coding Using Pretrained Language Models and Advanced Prompt Learning Techniques: Evaluation of an Automated Analysis System Using Medical Text %A Zhuang,Yan %A Zhang,Junyan %A Li,Xiuxing %A Liu,Chao %A Yu,Yue %A Dong,Wei %A He,Kunlun %+ Medical Big Data Research Center, Chinese PLA General Hospital, 28 Fuxing Road, Beijing, 100853, China, 86 13911232619, kunlunhe@plagh.org %K BERT %K bidirectional encoder representations from transformers %K pretrained language models %K prompt learning %K ICD %K International Classification of Diseases %K cardiovascular disease %K few-shot learning %K multicenter medical data %D 2025 %7 6.1.2025 %9 Original Paper %J JMIR Med Inform %G English %X Background: Machine learning models can reduce the burden on doctors by converting medical records into International Classification of Diseases (ICD) codes in real time, thereby enhancing the efficiency of diagnosis and treatment. However, it faces challenges such as small datasets, diverse writing styles, unstructured records, and the need for semimanual preprocessing. Existing approaches, such as naive Bayes, Word2Vec, and convolutional neural networks, have limitations in handling missing values and understanding the context of medical texts, leading to a high error rate. We developed a fully automated pipeline based on the Key–bidirectional encoder representations from transformers (BERT) approach and large-scale medical records for continued pretraining, which effectively converts long free text into standard ICD codes. By adjusting parameter settings, such as mixed templates and soft verbalizers, the model can adapt flexibly to different requirements, enabling task-specific prompt learning. Objective: This study aims to propose a prompt learning real-time framework based on pretrained language models that can automatically label long free-text data with ICD-10 codes for cardiovascular diseases without the need for semiautomatic preprocessing. Methods: We integrated 4 components into our framework: a medical pretrained BERT, a keyword filtration BERT in a functional order, a fine-tuning phase, and task-specific prompt learning utilizing mixed templates and soft verbalizers. This framework was validated on a multicenter medical dataset for the automated ICD coding of 13 common cardiovascular diseases (584,969 records). Its performance was compared against robustly optimized BERT pretraining approach, extreme language network, and various BERT-based fine-tuning pipelines. Additionally, we evaluated the framework’s performance under different prompt learning and fine-tuning settings. Furthermore, few-shot learning experiments were conducted to assess the feasibility and efficacy of our framework in scenarios involving small- to mid-sized datasets. Results: Compared with traditional pretraining and fine-tuning pipelines, our approach achieved a higher micro–F1-score of 0.838 and a macro–area under the receiver operating characteristic curve (macro-AUC) of 0.958, which is 10% higher than other methods. Among different prompt learning setups, the combination of mixed templates and soft verbalizers yielded the best performance. Few-shot experiments showed that performance stabilized and the AUC peaked at 500 shots. Conclusions: These findings underscore the effectiveness and superior performance of prompt learning and fine-tuning for subtasks within pretrained language models in medical practice. Our real-time ICD coding pipeline efficiently converts detailed medical free text into standardized labels, offering promising applications in clinical decision-making. It can assist doctors unfamiliar with the ICD coding system in organizing medical record information, thereby accelerating the medical process and enhancing the efficiency of diagnosis and treatment. %M 39761555 %R 10.2196/63020 %U https://medinform.jmir.org/2025/1/e63020 %U https://doi.org/10.2196/63020 %U http://www.ncbi.nlm.nih.gov/pubmed/39761555 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e57644 %T Machine Learning Approaches in High Myopia: Systematic Review and Meta-Analysis %A Zuo,Huiyi %A Huang,Baoyu %A He,Jian %A Fang,Liying %A Huang,Minli %+ Ophthalmology Department, First Affiliated Hospital of GuangXi Medical University, No 6 Shuangyong Road, Nanning, Guangxi, Nanning, 530000, China, 86 0771 5356507, 420306@sr.gxmu.edu.cn %K high myopia %K pathological myopia %K high myopia-associated glaucoma %K machine learning %K deep learning %D 2025 %7 3.1.2025 %9 Review %J J Med Internet Res %G English %X Background: In recent years, with the rapid development of machine learning (ML), it has gained widespread attention from researchers in clinical practice. ML models appear to demonstrate promising accuracy in the diagnosis of complex diseases, as well as in predicting disease progression and prognosis. Some studies have applied it to ophthalmology, primarily for the diagnosis of pathologic myopia and high myopia-associated glaucoma, as well as for predicting the progression of high myopia. ML-based detection still requires evidence-based validation to prove its accuracy and feasibility. Objective: This study aims to discern the performance of ML methods in detecting high myopia and pathologic myopia in clinical practice, thereby providing evidence-based support for the future development and refinement of intelligent diagnostic or predictive tools. Methods: PubMed, Cochrane, Embase, and Web of Science were thoroughly retrieved up to September 3, 2023. The prediction model risk of bias assessment tool was leveraged to appraise the risk of bias in the eligible studies. The meta-analysis was implemented using a bivariate mixed-effects model. In the validation set, subgroup analyses were conducted based on the ML target events (diagnosis and prediction of high myopia and diagnosis of pathological myopia and high myopia-associated glaucoma) and modeling methods. Results: This study ultimately included 45 studies, of which 32 were used for quantitative meta-analysis. The meta-analysis results unveiled that for the diagnosis of pathologic myopia, the summary receiver operating characteristic (SROC), sensitivity, and specificity of ML were 0.97 (95% CI 0.95-0.98), 0.91 (95% CI 0.89-0.92), and 0.95 (95% CI 0.94-0.97), respectively. Specifically, deep learning (DL) showed an SROC of 0.97 (95% CI 0.95-0.98), sensitivity of 0.92 (95% CI 0.90-0.93), and specificity of 0.96 (95% CI 0.95-0.97), while conventional ML (non-DL) showed an SROC of 0.86 (95% CI 0.75-0.92), sensitivity of 0.77 (95% CI 0.69-0.84), and specificity of 0.85 (95% CI 0.75-0.92). For the diagnosis and prediction of high myopia, the SROC, sensitivity, and specificity of ML were 0.98 (95% CI 0.96-0.99), 0.94 (95% CI 0.90-0.96), and 0.94 (95% CI 0.88-0.97), respectively. For the diagnosis of high myopia-associated glaucoma, the SROC, sensitivity, and specificity of ML were 0.96 (95% CI 0.94-0.97), 0.92 (95% CI 0.85-0.96), and 0.88 (95% CI 0.67-0.96), respectively. Conclusions: ML demonstrated highly promising accuracy in diagnosing high myopia and pathologic myopia. Moreover, based on the limited evidence available, we also found that ML appeared to have favorable accuracy in predicting the risk of developing high myopia in the future. DL can be used as a potential method for intelligent image processing and intelligent recognition, and intelligent examination tools can be developed in subsequent research to provide help for areas where medical resources are scarce. Trial Registration: PROSPERO CRD42023470820; https://tinyurl.com/2xexp738 %M 39753217 %R 10.2196/57644 %U https://www.jmir.org/2025/1/e57644 %U https://doi.org/10.2196/57644 %U http://www.ncbi.nlm.nih.gov/pubmed/39753217 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 9 %N %P e58426 %T Artificial Intelligence–Powered Training Database for Clinical Thinking: App Development Study %A Wang,Heng %A Zheng,Danni %A Wang,Mengying %A Ji,Hong %A Han,Jiangli %A Wang,Yan %A Shen,Ning %A Qiao,Jie %K artificial intelligence %K clinical thinking ability %K virtual medical records %K distance education %K medical education %K online learning %D 2025 %7 3.1.2025 %9 %J JMIR Form Res %G English %X Background: With the development of artificial intelligence (AI), medicine has entered the era of intelligent medicine, and various aspects, such as medical education and talent cultivation, are also being redefined. The cultivation of clinical thinking abilities poses a formidable challenge even for seasoned clinical educators, as offline training modalities often fall short in bridging the divide between current practice and the desired ideal. Consequently, there arises an imperative need for the expeditious development of a web-based database, tailored to empower physicians in their quest to learn and hone their clinical reasoning skills. Objective: This study aimed to introduce an app named “XueYiKu,” which includes consultations, physical examinations, auxiliary examinations, and diagnosis, incorporating AI and actual complete hospital medical records to build an online-learning platform using human-computer interaction. Methods: The “XueYiKu” app was designed as a contactless, self-service, trial-and-error system application based on actual complete hospital medical records and natural language processing technology to comprehensively assess the “clinical competence” of residents at different stages. Case extraction was performed at a hospital’s case data center, and the best-matching cases were differentiated through natural language processing, word segmentation, synonym conversion, and sorting. More than 400 teaching cases covering 65 kinds of diseases were released for students to learn, and the subjects covered internal medicine, surgery, gynecology and obstetrics, and pediatrics. The difficulty of learning cases was divided into four levels in ascending order. Moreover, the learning and teaching effects were evaluated using 6 dimensions covering systematicness, agility, logic, knowledge expansion, multidimensional evaluation indicators, and preciseness. Results: From the app’s first launch on the Android platform in May 2019 to the last version updated in May 2023, the total number of teacher and student users was 6209 and 1180, respectively. The top 3 subjects most frequently learned were respirology (n=606, 24.1%), general surgery (n=506, 20.1%), and urinary surgery (n=390, 15.5%). For diseases, pneumonia was the most frequently learned, followed by cholecystolithiasis (n=216, 14.1%), benign prostate hyperplasia (n=196, 12.8%), and bladder tumor (n=193, 12.6%). Among 479 students, roughly a third (n=168, 35.1%) scored in the 60 to 80 range, and half of them scored over 80 points (n=238, 49.7%). The app enabled medical students’ learning to become more active and self-motivated, with a variety of formats, and provided real-time feedback through assessments on the platform. The learning effect was satisfactory overall and provided important precedence for establishing scientific models and methods for assessing clinical thinking skills in the future. Conclusions: The integration of AI and medical education will undoubtedly assist in the restructuring of education processes; promote the evolution of the education ecosystem; and provide new convenient ways for independent learning, interactive communication, and educational resource sharing. %R 10.2196/58426 %U https://formative.jmir.org/2025/1/e58426 %U https://doi.org/10.2196/58426 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e52786 %T Major Adverse Kidney Events in Hospitalized Older Patients With Acute Kidney Injury: Machine Learning–Based Model Development and Validation Study %A Luo,Xiao-Qin %A Zhang,Ning-Ya %A Deng,Ying-Hao %A Wang,Hong-Shen %A Kang,Yi-Xin %A Duan,Shao-Bin %+ Department of Nephrology, Hunan Key Laboratory of Kidney Disease and Blood Purification, The Second Xiangya Hospital of Central South University, 139 Renmin Road, Changsha, 410011, China, 86 73185295100, duansb528@csu.edu.cn %K major adverse kidney events within 30 days %K older %K acute kidney injury %K machine learning %K prediction model %D 2025 %7 3.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Acute kidney injury (AKI) is a common complication in hospitalized older patients, associated with increased morbidity, mortality, and health care costs. Major adverse kidney events within 30 days (MAKE30), a composite of death, new renal replacement therapy, or persistent renal dysfunction, has been recommended as a patient-centered endpoint for clinical trials involving AKI. Objective: This study aimed to develop and validate a machine learning–based model to predict MAKE30 in hospitalized older patients with AKI. Methods: A total of 4266 older patients (aged ≥ 65 years) with AKI admitted to the Second Xiangya Hospital of Central South University from January 1, 2015, to December 31, 2020, were included and randomly divided into a training set and an internal test set in a ratio of 7:3. An additional cohort of 11,864 eligible patients from the Medical Information Mart for Intensive Care Ⅳ database served as an external test set. The Boruta algorithm was used to select the most important predictor variables from 53 candidate variables. The eXtreme Gradient Boosting algorithm was applied to establish a prediction model for MAKE30. Model discrimination was evaluated by the area under the receiver operating characteristic curve (AUROC). The SHapley Additive exPlanations method was used to interpret model predictions. Results: The overall incidence of MAKE30 in the 2 study cohorts was 28.3% (95% CI 26.9%-29.7%) and 26.7% (95% CI 25.9%-27.5%), respectively. The prediction model for MAKE30 exhibited adequate predictive performance, with an AUROC of 0.868 (95% CI 0.852-0.881) in the training set and 0.823 (95% CI 0.798-0.846) in the internal test set. Its simplified version achieved an AUROC of 0.744 (95% CI 0.735-0.754) in the external test set. The SHapley Additive exPlanations method showed that the use of vasopressors, mechanical ventilation, blood urea nitrogen level, red blood cell distribution width-coefficient of variation, and serum albumin level were closely associated with MAKE30. Conclusions: An interpretable eXtreme Gradient Boosting model was developed and validated to predict MAKE30, which provides opportunities for risk stratification, clinical decision-making, and the conduct of clinical trials involving AKI. Trial Registration: Chinese Clinical Trial Registry ChiCTR2200061610; https://tinyurl.com/3smf9nuw %M 39752664 %R 10.2196/52786 %U https://www.jmir.org/2025/1/e52786 %U https://doi.org/10.2196/52786 %U http://www.ncbi.nlm.nih.gov/pubmed/39752664 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e63538 %T Development and Evaluation of a Mental Health Chatbot Using ChatGPT 4.0: Mixed Methods User Experience Study With Korean Users %A Kang,Boyoung %A Hong,Munpyo %+ Sungkyunkwan University, 25-2, Sungkyunkwan-Ro, Jongno-gu, Seoul, 03063, Republic of Korea, 82 027401770, bykang2015@gmail.com %K mental health chatbot %K Dr. CareSam %K HoMemeTown %K ChatGPT 4.0 %K large language model %K LLM %K cross-lingual %K pilot testing %K cultural sensitivity %K localization %K Korean students %D 2025 %7 3.1.2025 %9 Original Paper %J JMIR Med Inform %G English %X Background: Mental health chatbots have emerged as a promising tool for providing accessible and convenient support to individuals in need. Building on our previous research on digital interventions for loneliness and depression among Korean college students, this study addresses the limitations identified and explores more advanced artificial intelligence–driven solutions. Objective: This study aimed to develop and evaluate the performance of HoMemeTown Dr. CareSam, an advanced cross-lingual chatbot using ChatGPT 4.0 (OpenAI) to provide seamless support in both English and Korean contexts. The chatbot was designed to address the need for more personalized and culturally sensitive mental health support identified in our previous work while providing an accessible and user-friendly interface for Korean young adults. Methods: We conducted a mixed methods pilot study with 20 Korean young adults aged 18 to 27 (mean 23.3, SD 1.96) years. The HoMemeTown Dr CareSam chatbot was developed using the GPT application programming interface, incorporating features such as a gratitude journal and risk detection. User satisfaction and chatbot performance were evaluated using quantitative surveys and qualitative feedback, with triangulation used to ensure the validity and robustness of findings through cross-verification of data sources. Comparative analyses were conducted with other large language models chatbots and existing digital therapy tools (Woebot [Woebot Health Inc] and Happify [Twill Inc]). Results: Users generally expressed positive views towards the chatbot, with positivity and support receiving the highest score on a 10-point scale (mean 9.0, SD 1.2), followed by empathy (mean 8.7, SD 1.6) and active listening (mean 8.0, SD 1.8). However, areas for improvement were noted in professionalism (mean 7.0, SD 2.0), complexity of content (mean 7.4, SD 2.0), and personalization (mean 7.4, SD 2.4). The chatbot demonstrated statistically significant performance differences compared with other large language models chatbots (F=3.27; P=.047), with more pronounced differences compared with Woebot and Happify (F=12.94; P<.001). Qualitative feedback highlighted the chatbot’s strengths in providing empathetic responses and a user-friendly interface, while areas for improvement included response speed and the naturalness of Korean language responses. Conclusions: The HoMemeTown Dr CareSam chatbot shows potential as a cross-lingual mental health support tool, achieving high user satisfaction and demonstrating comparative advantages over existing digital interventions. However, the study’s limited sample size and short-term nature necessitate further research. Future studies should include larger-scale clinical trials, enhanced risk detection features, and integration with existing health care systems to fully realize its potential in supporting mental well-being across different linguistic and cultural contexts. %M 39752663 %R 10.2196/63538 %U https://medinform.jmir.org/2025/1/e63538 %U https://doi.org/10.2196/63538 %U http://www.ncbi.nlm.nih.gov/pubmed/39752663 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e58457 %T The Transformative Potential of Large Language Models in Mining Electronic Health Records Data: Content Analysis %A Wals Zurita,Amadeo Jesus %A Miras del Rio,Hector %A Ugarte Ruiz de Aguirre,Nerea %A Nebrera Navarro,Cristina %A Rubio Jimenez,Maria %A Muñoz Carmona,David %A Miguez Sanchez,Carlos %+ Servicio Oncologia Radioterápica, Hospital Universitario Virgen Macarena, Andalusian Health Service, Avenida Dr. Fedriani s/n, Seville, 41009, Spain, 34 954712932, amadeoj.wals.sspa@juntadeandalucia.es %K electronic health record %K EHR %K oncology %K radiotherapy %K data mining %K ChatGPT %K large language models %K LLMs %D 2025 %7 2.1.2025 %9 Original Paper %J JMIR Med Inform %G English %X Background: In this study, we evaluate the accuracy, efficiency, and cost-effectiveness of large language models in extracting and structuring information from free-text clinical reports, particularly in identifying and classifying patient comorbidities within oncology electronic health records. We specifically compare the performance of gpt-3.5-turbo-1106 and gpt-4-1106-preview models against that of specialized human evaluators. Objective: We specifically compare the performance of gpt-3.5-turbo-1106 and gpt-4-1106-preview models against that of specialized human evaluators. Methods: We implemented a script using the OpenAI application programming interface to extract structured information in JavaScript object notation format from comorbidities reported in 250 personal history reports. These reports were manually reviewed in batches of 50 by 5 specialists in radiation oncology. We compared the results using metrics such as sensitivity, specificity, precision, accuracy, F-value, κ index, and the McNemar test, in addition to examining the common causes of errors in both humans and generative pretrained transformer (GPT) models. Results: The GPT-3.5 model exhibited slightly lower performance compared to physicians across all metrics, though the differences were not statistically significant (McNemar test, P=.79). GPT-4 demonstrated clear superiority in several key metrics (McNemar test, P<.001). Notably, it achieved a sensitivity of 96.8%, compared to 88.2% for GPT-3.5 and 88.8% for physicians. However, physicians marginally outperformed GPT-4 in precision (97.7% vs 96.8%). GPT-4 showed greater consistency, replicating the exact same results in 76% of the reports across 10 repeated analyses, compared to 59% for GPT-3.5, indicating more stable and reliable performance. Physicians were more likely to miss explicit comorbidities, while the GPT models more frequently inferred nonexplicit comorbidities, sometimes correctly, though this also resulted in more false positives. Conclusions: This study demonstrates that, with well-designed prompts, the large language models examined can match or even surpass medical specialists in extracting information from complex clinical reports. Their superior efficiency in time and costs, along with easy integration with databases, makes them a valuable tool for large-scale data mining and real-world evidence generation. %M 39746191 %R 10.2196/58457 %U https://medinform.jmir.org/2025/1/e58457 %U https://doi.org/10.2196/58457 %U http://www.ncbi.nlm.nih.gov/pubmed/39746191 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 4 %N %P e52270 %T Enhancing Interpretable, Transparent, and Unobtrusive Detection of Acute Marijuana Intoxication in Natural Environments: Harnessing Smart Devices and Explainable AI to Empower Just-In-Time Adaptive Interventions: Longitudinal Observational Study %A Bae,Sang Won %A Chung,Tammy %A Zhang,Tongze %A Dey,Anind K %A Islam,Rahul %+ Human-Computer Interaction and Human-Centered AI Systems Lab, AI for Healthcare Lab, Charles V. Schaefer, Jr. School of Engineering and Science, Stevens Institute of Technology, 1 Castle Point Terrace, Hoboken, NJ, 07030-5906, United States, 1 4122658616, sbae4@stevens.edu %K digital phenotyping %K smart devices %K intoxication %K smartphone-based sensors %K wearables %K mHealth %K marijuana %K cannabis %K data collection %K passive sensing %K Fitbit %K machine learning %K eXtreme Gradient Boosting Machine classifier %K XGBoost %K algorithmic decision-making process %K explainable artificial intelligence %K XAI %K artificial intelligence %K JITAI %K decision support %K just-in-time adaptive interventions %K experience sampling %D 2025 %7 2.1.2025 %9 Original Paper %J JMIR AI %G English %X Background: Acute marijuana intoxication can impair motor skills and cognitive functions such as attention and information processing. However, traditional tests, like blood, urine, and saliva, fail to accurately detect acute marijuana intoxication in real time. Objective: This study aims to explore whether integrating smartphone-based sensors with readily accessible wearable activity trackers, like Fitbit, can enhance the detection of acute marijuana intoxication in naturalistic settings. No previous research has investigated the effectiveness of passive sensing technologies for enhancing algorithm accuracy or enhancing the interpretability of digital phenotyping through explainable artificial intelligence in real-life scenarios. This approach aims to provide insights into how individuals interact with digital devices during algorithmic decision-making, particularly for detecting moderate to intensive marijuana intoxication in real-world contexts. Methods: Sensor data from smartphones and Fitbits, along with self-reported marijuana use, were collected from 33 young adults over a 30-day period using the experience sampling method. Participants rated their level of intoxication on a scale from 1 to 10 within 15 minutes of consuming marijuana and during 3 daily semirandom prompts. The ratings were categorized as not intoxicated (0), low (1-3), and moderate to intense intoxication (4-10). The study analyzed the performance of models using mobile phone data only, Fitbit data only, and a combination of both (MobiFit) in detecting acute marijuana intoxication. Results: The eXtreme Gradient Boosting Machine classifier showed that the MobiFit model, which combines mobile phone and wearable device data, achieved 99% accuracy (area under the curve=0.99; F1-score=0.85) in detecting acute marijuana intoxication in natural environments. The F1-score indicated significant improvements in sensitivity and specificity for the combined MobiFit model compared to using mobile or Fitbit data alone. Explainable artificial intelligence revealed that moderate to intense self-reported marijuana intoxication was associated with specific smartphone and Fitbit metrics, including elevated minimum heart rate, reduced macromovement, and increased noise energy around participants. Conclusions: This study demonstrates the potential of using smartphone sensors and wearable devices for interpretable, transparent, and unobtrusive monitoring of acute marijuana intoxication in daily life. Advanced algorithmic decision-making provides valuable insight into behavioral, physiological, and environmental factors that could support timely interventions to reduce marijuana-related harm. Future real-world applications of these algorithms should be evaluated in collaboration with clinical experts to enhance their practicality and effectiveness. %M 39746202 %R 10.2196/52270 %U https://ai.jmir.org/2025/1/e52270 %U https://doi.org/10.2196/52270 %U http://www.ncbi.nlm.nih.gov/pubmed/39746202 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e56382 %T Leveraging Machine Learning to Identify Subgroups of Misclassified Patients in the Emergency Department: Multicenter Proof-of-Concept Study %A Wyatt,Sage %A Lunde Markussen,Dagfinn %A Haizoune,Mounir %A Vestbø,Anders Strand %A Sima,Yeneabeba Tilahun %A Sandboe,Maria Ilene %A Landschulze,Marcus %A Bartsch,Hauke %A Sauer,Christopher Martin %+ Institute for Artificial Intelligence in Medicine, University Hospital Essen, Girardetstraße 2, Essen, 45131, Germany, 49 201 723 0, sauerc@mit.edu %K emergency department %K triage %K machine learning %K real world evidence %K random forest %K classification %K subgroup %K misclassification %K patient %K multi-center %K proof-of-concept %K hospital %K clinical feature %K Norway %K retrospective %K cohort study %K electronic health system %K electronic health record %D 2024 %7 31.12.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Hospitals use triage systems to prioritize the needs of patients within available resources. Misclassification of a patient can lead to either adverse outcomes in a patient who did not receive appropriate care in the case of undertriage or a waste of hospital resources in the case of overtriage. Recent advances in machine learning algorithms allow for the quantification of variables important to under- and overtriage. Objective: This study aimed to identify clinical features most strongly associated with triage misclassification using a machine learning classification model to capture nonlinear relationships. Methods: Multicenter retrospective cohort data from 2 big regional hospitals in Norway were extracted. The South African Triage System is used at Bergen University Hospital, and the Rapid Emergency Triage and Treatment System is used at Trondheim University Hospital. Variables included triage score, age, sex, arrival time, subject area affiliation, reason for emergency department contact, discharge location, level of care, and time of death were retrieved. Random forest classification models were used to identify features with the strongest association with overtriage and undertriage in clinical practice in Bergen and Trondheim. We reported variable importance as SHAP (SHapley Additive exPlanations)-values. Results: We collected data on 205,488 patient records from Bergen University Hospital and 304,997 patient records from Trondheim University Hospital. Overall, overtriage was very uncommon at both hospitals (all <0.1%), with undertriage differing between both locations, with 0.8% at Bergen and 0.2% at Trondheim University Hospital. Demographics were similar for both hospitals. However, the percentage given a high-priority triage score (red or orange) was higher in Bergen (24%) compared with 9% in Trondheim. The clinical referral department was found to be the variable with the strongest association with undertriage (mean SHAP +0.62 and +0.37 for Bergen and Trondheim, respectively). Conclusions: We identified subgroups of patients consistently undertriaged using 2 common triage systems. While the importance of clinical patient characteristics to triage misclassification varies by triage system and location, we found consistent evidence between the two locations that the clinical referral department is the most important variable associated with triage misclassification. Replication of this approach at other centers could help to further improve triage scoring systems and improve patient care worldwide. %M 39451101 %R 10.2196/56382 %U https://www.jmir.org/2024/1/e56382 %U https://doi.org/10.2196/56382 %U http://www.ncbi.nlm.nih.gov/pubmed/39451101 %0 Journal Article %@ 2291-9279 %I JMIR Publications %V 12 %N %P e56663 %T Use of 4 Open-Ended Text Responses to Help Identify People at Risk of Gaming Disorder: Preregistered Development and Usability Study Using Natural Language Processing %A Strojny,Paweł %A Kapela,Ksawery %A Lipp,Natalia %A Sikström,Sverker %+ Faculty of Management and Social Communication, Jagiellonian University, Łojasiewicza 4, Krakow, 30-348, Poland, 48 12 664 5582, p.strojny@uj.edu.pl %K gaming disorder %K natural language processing %K machine learning %K mental health %K NLP %K text %K open-ended %K response %K risk %K psychological %K Question-based Computational Language Assessment %K QCLA %K transformers-based %K language model analysis %K Polish %K Pearson %K correlation %K Python %D 2024 %7 31.12.2024 %9 Original Paper %J JMIR Serious Games %G English %X Background: Words are a natural way to describe mental states in humans, while numerical values are a convenient and effective way to carry out quantitative psychological research. With the growing interest of researchers in gaming disorder, the number of screening tools is growing. However, they all require self-quantification of mental states. The rapid development of natural language processing creates an opportunity to supplement traditional rating scales with a question-based computational language assessment approach that gives a deeper understanding of the studied phenomenon without losing the rigor of quantitative data analysis. Objective: The aim of the study was to investigate whether transformer-based language model analysis of text responses from active gamers is a potential supplement to traditional rating scales. We compared a tool consisting of 4 open-ended questions formulated based on the clinician's intuition (not directly related to any existing rating scales for measuring gaming disorders) with the results of one of the commonly used rating scales. Methods: Participants recruited using an online panel were asked to answer the Word-Based Gaming Disorder Test, consisting of 4 open-ended questions about gaming. Subsequently, they completed a closed-ended Gaming Disorders Test based on a numerical scale. Of the initial 522 responses collected, we removed a total of 105 due to 1 of 3 criteria (suspiciously low survey completion time, providing nonrelevant or incomplete responses). Final analyses were conducted on the responses of 417 participants. The responses to the open-ended questions were vectorized using HerBERT, a large language model based on Google's Bidirectional Encoder Representations from Transformers (BERT). Last, a machine learning model, specifically ridge regression, was used to predict the scores of the Gaming Disorder Test based on the features of the vectorized open-ended responses. Results: The Pearson correlation between the observable scores from the Gaming Disorder test and the predictions made by the model was 0.476 when using the answers of the 4 respondents as features. When using only 1 of the 4 text responses, the correlation ranged from 0.274 to 0.406. Conclusions: Short open responses analyzed using natural language processing can contribute to a deeper understanding of gaming disorder at no additional cost in time. The obtained results confirmed 2 of 3 preregistered hypotheses. The written statements analyzed using the results of the model correlated with the rating scale. Furthermore, the inclusion in the model of data from more responses that take into account different perspectives on gaming improved the performance of the model. However, there is room for improvement, especially in terms of supplementing the questions with content that corresponds more directly to the definition of gaming disorder. Trial Registration: OSF Registries osf.io/957nz; https://osf.io/957nz %M 39739378 %R 10.2196/56663 %U https://games.jmir.org/2024/1/e56663 %U https://doi.org/10.2196/56663 %U http://www.ncbi.nlm.nih.gov/pubmed/39739378 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54047 %T Slit Lamp Report Generation and Question Answering: Development and Validation of a Multimodal Transformer Model with Large Language Model Integration %A Zhao,Ziwei %A Zhang,Weiyi %A Chen,Xiaolan %A Song,Fan %A Gunasegaram,James %A Huang,Wenyong %A Shi,Danli %A He,Mingguang %A Liu,Na %+ Guangzhou Cadre and Talent Health Management Center, No. 109 Changling Road, Huangpu District, Guangzhou, 510700, China, 86 18701985445, 1256695904@qq.com %K large language model %K slit lamp %K medical report generation %K question answering %D 2024 %7 30.12.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Large language models have shown remarkable efficacy in various medical research and clinical applications. However, their skills in medical image recognition and subsequent report generation or question answering (QA) remain limited. Objective: We aim to finetune a multimodal, transformer-based model for generating medical reports from slit lamp images and develop a QA system using Llama2. We term this entire process slit lamp–GPT. Methods: Our research used a dataset of 25,051 slit lamp images from 3409 participants, paired with their corresponding physician-created medical reports. We used these data, split into training, validation, and test sets, to finetune the Bootstrapping Language-Image Pre-training framework toward report generation. The generated text reports and human-posed questions were then input into Llama2 for subsequent QA. We evaluated performance using qualitative metrics (including BLEU [bilingual evaluation understudy], CIDEr [consensus-based image description evaluation], ROUGE-L [Recall-Oriented Understudy for Gisting Evaluation—Longest Common Subsequence], SPICE [Semantic Propositional Image Caption Evaluation], accuracy, sensitivity, specificity, precision, and F1-score) and the subjective assessments of two experienced ophthalmologists on a 1-3 scale (1 referring to high quality). Results: We identified 50 conditions related to diseases or postoperative complications through keyword matching in initial reports. The refined slit lamp–GPT model demonstrated BLEU scores (1-4) of 0.67, 0.66, 0.65, and 0.65, respectively, with a CIDEr score of 3.24, a ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score of 0.61, and a Semantic Propositional Image Caption Evaluation score of 0.37. The most frequently identified conditions were cataracts (22.95%), age-related cataracts (22.03%), and conjunctival concretion (13.13%). Disease classification metrics demonstrated an overall accuracy of 0.82 and an F1-score of 0.64, with high accuracies (≥0.9) observed for intraocular lens, conjunctivitis, and chronic conjunctivitis, and high F1-scores (≥0.9) observed for cataract and age-related cataract. For both report generation and QA components, the two evaluating ophthalmologists reached substantial agreement, with κ scores between 0.71 and 0.84. In assessing 100 generated reports, they awarded scores of 1.36 for both completeness and correctness; 64% (64/100) were considered “entirely good,” and 93% (93/100) were “acceptable.” In the evaluation of 300 generated answers to questions, the scores were 1.33 for completeness, 1.14 for correctness, and 1.15 for possible harm, with 66.3% (199/300) rated as “entirely good” and 91.3% (274/300) as “acceptable.” Conclusions: This study introduces the slit lamp–GPT model for report generation and subsequent QA, highlighting the potential of large language models to assist ophthalmologists and patients. %M 39753218 %R 10.2196/54047 %U https://www.jmir.org/2024/1/e54047 %U https://doi.org/10.2196/54047 %U http://www.ncbi.nlm.nih.gov/pubmed/39753218 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e57824 %T Applying AI to Structured Real-World Data for Pharmacovigilance Purposes: Scoping Review %A Dimitsaki,Stella %A Natsiavas,Pantelis %A Jaulent,Marie-Christine %+ Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances en e-Santé - LIMICS, Inserm, Université Sorbonne Paris-Nord, Sorbonne Université, 15 Rue de l'École de Médecine, Paris, 75006, France, 33 767968072, Stella.Dimitsaki@etu.sorbonne-universite.fr %K pharmacovigilance %K drug safety %K artificial intelligence %K machine learning %K real-world data %K scoping review %D 2024 %7 30.12.2024 %9 Review %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) applied to real-world data (RWD; eg, electronic health care records) has been identified as a potentially promising technical paradigm for the pharmacovigilance field. There are several instances of AI approaches applied to RWD; however, most studies focus on unstructured RWD (conducting natural language processing on various data sources, eg, clinical notes, social media, and blogs). Hence, it is essential to investigate how AI is currently applied to structured RWD in pharmacovigilance and how new approaches could enrich the existing methodology. Objective: This scoping review depicts the emerging use of AI on structured RWD for pharmacovigilance purposes to identify relevant trends and potential research gaps. Methods: The scoping review methodology is based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) methodology. We queried the MEDLINE database through the PubMed search engine. Relevant scientific manuscripts published from January 2010 to January 2024 were retrieved. The included studies were “mapped” against a set of evaluation criteria, including applied AI approaches, code availability, description of the data preprocessing pipeline, clinical validation of AI models, and implementation of trustworthy AI criteria following the guidelines of the FUTURE (Fairness, Universality, Traceability, Usability, Robustness, and Explainability)-AI initiative. Results: The scoping review ultimately yielded 36 studies. There has been a significant increase in relevant studies after 2019. Most of the articles focused on adverse drug reaction detection procedures (23/36, 64%) for specific adverse effects. Furthermore, a substantial number of studies (34/36, 94%) used nonsymbolic AI approaches, emphasizing classification tasks. Random forest was the most popular machine learning approach identified in this review (17/36, 47%). The most common RWD sources used were electronic health care records (28/36, 78%). Typically, these data were not available in a widely acknowledged data model to facilitate interoperability, and they came from proprietary databases, limiting their availability for reproducing results. On the basis of the evaluation criteria classification, 10% (4/36) of the studies published their code in public registries, 16% (6/36) tested their AI models in clinical environments, and 36% (13/36) provided information about the data preprocessing pipeline. In addition, in terms of trustworthy AI, 89% (32/36) of the studies followed at least half of the trustworthy AI initiative guidelines. Finally, selection and confounding biases were the most common biases in the included studies. Conclusions: AI, along with structured RWD, constitutes a promising line of work for drug safety and pharmacovigilance. However, in terms of AI, some approaches have not been examined extensively in this field (such as explainable AI and causal AI). Moreover, it would be helpful to have a data preprocessing protocol for RWD to support pharmacovigilance processes. Finally, because of personal data sensitivity, evaluation procedures have to be investigated further. %M 39753222 %R 10.2196/57824 %U https://www.jmir.org/2024/1/e57824 %U https://doi.org/10.2196/57824 %U http://www.ncbi.nlm.nih.gov/pubmed/39753222 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e57204 %T Allied Health Professionals’ Perceptions of Artificial Intelligence in the Clinical Setting: Cross-Sectional Survey %A Hoffman,Jane %A Hattingh,Laetitia %A Shinners,Lucy %A Angus,Rebecca L %A Richards,Brent %A Hughes,Ian %A Wenke,Rachel %+ Pharmacy Department, Gold Coast Hospital and Health Service, Gold Coast University Hospital, 1 Hospital Boulevard, Southport, 4215, Australia, 61 756870620, jane.hoffman@health.qld.gov.au %K allied health %K artificial intelligence %K hospital %K digital health %K impact %K AI %K mHealth %K cross sectional %K survey %K health professional %K medical professional %K perception %K clinical setting %K opportunity %K challenge %K healthcare %K delivery %K Australia %K clinician %K confirmatory factor analysis %K linear regression %D 2024 %7 30.12.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Artificial intelligence (AI) has the potential to address growing logistical and economic pressures on the health care system by reducing risk, increasing productivity, and improving patient safety; however, implementing digital health technologies can be disruptive. Workforce perception is a powerful indicator of technology use and acceptance, however, there is little research available on the perceptions of allied health professionals (AHPs) toward AI in health care. Objective: This study aimed to explore AHP perceptions of AI and the opportunities and challenges for its use in health care delivery. Methods: A cross-sectional survey was conducted at a health service in, Queensland, Australia, using the Shinners Artificial Intelligence Perception tool. Results: A total of 231 (22.1%) participants from 11 AHPs responded to the survey. Participants were mostly younger than 40 years (157/231, 67.9%), female (189/231, 81.8%), working in a clinical role (196/231, 84.8%) with a median of 10 years’ experience in their profession. Most participants had not used AI (185/231, 80.1%), had little to no knowledge about AI (201/231, 87%), and reported workforce knowledge and skill as the greatest challenges to incorporating AI in health care (178/231, 77.1%). Age (P=.01), profession (P=.009), and AI knowledge (P=.02) were strong predictors of the perceived professional impact of AI. AHPs generally felt unprepared for the implementation of AI in health care, with concerns about a lack of workforce knowledge on AI and losing valued tasks to AI. Prior use of AI (P=.02) and years of experience as a health care professional (P=.02) were significant predictors of perceived preparedness for AI. Most participants had not received education on AI (190/231, 82.3%) and desired training (170/231, 73.6%) and believed AI would improve health care. Ideas and opportunities suggested for the use of AI within the allied health setting were predominantly nonclinical, administrative, and to support patient assessment tasks, with a view to improving efficiencies and increasing clinical time for direct patient care. Conclusions: Education and experience with AI are needed in health care to support its implementation across allied health, the second largest workforce in health. Industry and academic partnerships with clinicians should not be limited to AHPs with high AI literacy as clinicians across all knowledge levels can identify many opportunities for AI in health care. %M 39753215 %R 10.2196/57204 %U https://formative.jmir.org/2024/1/e57204 %U https://doi.org/10.2196/57204 %U http://www.ncbi.nlm.nih.gov/pubmed/39753215 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e64081 %T Effects of Large Language Model–Based Offerings on the Well-Being of Students: Qualitative Study %A Selim,Rania %A Basu,Arunima %A Anto,Ailin %A Foscht,Thomas %A Eisingerich,Andreas Benedikt %+ Faculty of Medicine, Imperial College London, Exhibition Rd, South Kensington, London, SW7 2AZ, United Kingdom, 44 020 7589 5111, rania.selim18@imperial.ac.uk %K large language models %K ChatGPT %K functional support %K escapism %K fantasy fulfillment %K angst %K despair %K anxiety %K deskilling %K pessimism about the future %D 2024 %7 27.12.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: In recent years, the adoption of large language model (LLM) applications, such as ChatGPT, has seen a significant surge, particularly among students. These artificial intelligence–driven tools offer unprecedented access to information and conversational assistance, which is reshaping the way students engage with academic content and manage the learning process. Despite the growing prevalence of LLMs and reliance on these technologies, there remains a notable gap in qualitative in-depth research examining the emotional and psychological effects of LLMs on users’ mental well-being. Objective: In order to address these emerging and critical issues, this study explores the role of LLM-based offerings, such as ChatGPT, in students’ lives, namely, how postgraduate students use such offerings and how they make students feel, and examines the impact on students’ well-being. Methods: To address the aims of this study, we employed an exploratory approach, using in-depth, semistructured, qualitative, face-to-face interviews with 23 users (13 female and 10 male users; mean age 23 years, SD 1.55 years) of ChatGPT-4o, who were also university students at the time (inclusion criteria). Interviewees were invited to reflect upon how they use ChatGPT, how it makes them feel, and how it may influence their lives. Results: The current findings from the exploratory qualitative interviews showed that users appreciate the functional support (8/23, 35%), escapism (8/23, 35%), and fantasy fulfillment (7/23, 30%) they receive from LLM-based offerings, such as ChatGPT, but at the same time, such usage is seen as a “double-edged sword,” with respondents indicating anxiety (8/23, 35%), dependence (11/23, 48%), concerns about deskilling (12/23, 52%), and angst or pessimism about the future (11/23, 48%). Conclusions: This study employed exploratory in-depth interviews to examine how the usage of LLM-based offerings, such as ChatGPT, makes users feel and assess the effects of using LLM-based offerings on mental well-being. The findings of this study show that students used ChatGPT to make their lives easier and felt a sense of cognitive escapism and even fantasy fulfillment, but this came at the cost of feeling anxious and pessimistic about the future. %M 39729617 %R 10.2196/64081 %U https://formative.jmir.org/2024/1/e64081 %U https://doi.org/10.2196/64081 %U http://www.ncbi.nlm.nih.gov/pubmed/39729617 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52914 %T Artificial Intelligence–Aided Diagnosis System for the Detection and Classification of Private-Part Skin Diseases: Decision Analytical Modeling Study %A Wang,Wei %A Chen,Xiang %A Xu,Licong %A Huang,Kai %A Zhao,Shuang %A Wang,Yong %+ School of Automation, Central South University, 932 South Lushan Road, Changsha, 410083, China, 86 18507313729, ywang@csu.edu.cn %K artificial intelligence-aided diagnosis %K private parts %K skin disease %K knowledge graph %K dermatology %K classification %K artificial intelligence %K AI %K diagnosis %D 2024 %7 27.12.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Private-part skin diseases (PPSDs) can cause a patient’s stigma, which may hinder the early diagnosis of these diseases. Artificial intelligence (AI) is an effective tool to improve the early diagnosis of PPSDs, especially in preventing the deterioration of skin tumors in private parts such as Paget disease. However, to our knowledge, there is currently no research on using AI to identify PPSDs due to the complex backgrounds of the lesion areas and the challenges in data collection. Objective: This study aimed to develop and evaluate an AI-aided diagnosis system for the detection and classification of PPSDs: aiding patients in self-screening and supporting dermatologists’ diagnostic enhancement. Methods: In this decision analytical modeling study, a 2-stage AI-aided diagnosis system was developed to classify PPSDs. In the first stage, a multitask detection network was trained to automatically detect and classify skin lesions (type, color, and shape). In the second stage, we proposed a knowledge graph based on dermatology expertise and constructed a decision network to classify seven PPSDs (condyloma acuminatum, Paget disease, eczema, pearly penile papules, genital herpes, syphilis, and Bowen disease). A reader study with 13 dermatologists of different experience levels was conducted. Dermatologists were asked to classify the testing cohort under reading room conditions, first without and then with system support. This AI-aided diagnostic study used the data of 635 patients from two institutes between July 2019 and April 2022. The data of Institute 1 contained 2701 skin lesion samples from 520 patients, which were used for the training of the multitask detection network in the first stage. In addition, the data of Institute 2 consisted of 115 clinical images and the corresponding medical records, which were used for the test of the whole 2-stage AI-aided diagnosis system. Results: On the test data of Institute 2, the proposed system achieved the average precision, recall, and F1-score of 0.81, 0.86, and 0.83, respectively, better than existing advanced algorithms. For the reader performance test, our system improved the average F1-score of the junior, intermediate, and senior dermatologists by 16%, 7%, and 4%, respectively. Conclusions: In this study, we constructed the first skin-lesion–based dataset and developed the first AI-aided diagnosis system for PPSDs. This system provides the final diagnosis result by simulating the diagnostic process of dermatologists. Compared with existing advanced algorithms, this system is more accurate in identifying PPSDs. Overall, our system can not only help patients achieve self-screening and alleviate their stigma but also assist dermatologists in diagnosing PPSDs. %M 39729353 %R 10.2196/52914 %U https://www.jmir.org/2024/1/e52914 %U https://doi.org/10.2196/52914 %U http://www.ncbi.nlm.nih.gov/pubmed/39729353 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e60024 %T Impact of Artificial Intelligence–Generated Content Labels On Perceived Accuracy, Message Credibility, and Sharing Intentions for Misinformation: Web-Based, Randomized, Controlled Experiment %A Li,Fan %A Yang,Ya %+ School of Journalism and Communication, Beijing Normal University, NO.19, Xinjiekouwai Street, Haidian District, Beijing, 100875, China, 86 18810305219, yangya@bnu.edu.cn %K generative AI %K artificial intelligence %K ChatGPT %K AIGC label %K misinformation %K perceived accuracy %K message credibility %K sharing intention %K social media %K health information %D 2024 %7 24.12.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: The proliferation of generative artificial intelligence (AI), such as ChatGPT, has added complexity and richness to the virtual environment by increasing the presence of AI-generated content (AIGC). Although social media platforms such as TikTok have begun labeling AIGC to facilitate the ability for users to distinguish it from human-generated content, little research has been performed to examine the effect of these AIGC labels. Objective: This study investigated the impact of AIGC labels on perceived accuracy, message credibility, and sharing intention for misinformation through a web-based experimental design, aiming to refine the strategic application of AIGC labels. Methods: The study conducted a 2×2×2 mixed experimental design, using the AIGC labels (presence vs absence) as the between-subjects factor and information type (accurate vs inaccurate) and content category (for-profit vs not-for-profit) as within-subjects factors. Participants, recruited via the Credamo platform, were randomly assigned to either an experimental group (with labels) or a control group (without labels). Each participant evaluated 4 sets of content, providing feedback on perceived accuracy, message credibility, and sharing intention for misinformation. Statistical analyses were performed using SPSS version 29 and included repeated-measures ANOVA and simple effects analysis, with significance set at P<.05. Results: As of April 2024, this study recruited a total of 957 participants, and after screening, 400 participants each were allocated to the experimental and control groups. The main effects of AIGC labels were not significant for perceived accuracy, message credibility, or sharing intention. However, the main effects of information type were significant for all 3 dependent variables (P<.001), as were the effects of content category (P<.001). There were significant differences in interaction effects among the 3 variables. For perceived accuracy, the interaction between information type and content category was significant (P=.005). For message credibility, the interaction between information type and content category was significant (P<.001). Regarding sharing intention, both the interaction between information type and content category (P<.001) and the interaction between information type and AIGC labels (P=.008) were significant. Conclusions: This study found that AIGC labels minimally affect perceived accuracy, message credibility, or sharing intention but help distinguish AIGC from human-generated content. The labels do not negatively impact users’ perceptions of platform content, indicating their potential for fact-checking and governance. However, AIGC labeling applications should vary by information type; they can slightly enhance sharing intention and perceived accuracy for misinformation. This highlights the need for more nuanced strategies for AIGC labels, necessitating further research. %R 10.2196/60024 %U https://formative.jmir.org/2024/1/e60024 %U https://doi.org/10.2196/60024 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e55916 %T Consensus Between Radiologists, Specialists in Internal Medicine, and AI Software on Chest X-Rays in a Hospital-at-Home Service: Prospective Observational Study %A Grossbard,Eitan %A Marziano,Yehonatan %A Sharabi,Adam %A Abutbul,Eliyahu %A Berman,Aya %A Kassif-Lerner,Reut %A Barkai,Galia %A Hakim,Hila %A Segal,Gad %K chest x-ray %K hospital-at-home %K telemedicine %K artificial intelligence %K kappa %K x-ray %K home hospitalization %K clinical data %K chest %K implementation %K comparative analysis %K radiologist %K AI %D 2024 %7 24.12.2024 %9 %J JMIR Form Res %G English %X Background: Home hospitalization is a care modality growing in popularity worldwide. Telemedicine-driven hospital-at-home (HAH) services could replace traditional hospital departments for selected patients. Chest x-rays typically serve as a key diagnostic tool in such cases. Objective: The implementation, analysis, and clinical assimilation of chest x-rays into an HAH service has not been described yet. Our objective is to introduce this essential information to the realm of HAH services for the first time worldwide. Methods: The study involved a prospective follow-up, description, and analysis of the HAH patient population who underwent chest x-rays at home. A comparative analysis was performed to evaluate the level of agreement among three interpretation modalities: a radiologist, a specialist in internal medicine, and a designated artificial intelligence (AI) algorithm. Results: Between February 2021 and May 2023, 300 chest radiographs were performed at the homes of 260 patients, with the median age being 78 (IQR 65‐87) years. The most frequent underlying morbidity was cardiovascular disease (n=185, 71.2%). Of the x-rays, 286 (95.3%) were interpreted by a specialist in internal medicine, 29 (9.7%) by a specialized radiologist, and 95 (31.7%) by the AI software. The overall raw agreement level among these three modalities exceeded 90%. The consensus level evaluated using the Cohen κ coefficient showed substantial agreement (κ=0.65) and moderate agreement (κ=0.49) between the specialist in internal medicine and the radiologist, and between the specialist in internal medicine and the AI software, respectively. Conclusions: Chest x-rays play a crucial role in the HAH setting. Rapid and reliable interpretation of these x-rays is essential for determining whether a patient requires transfer back to in-hospital surveillance. Our comparative results showed that interpretation by an experienced specialist in internal medicine demonstrates a significant level of consensus with that of the radiologists. However, AI algorithm-based interpretation needs to be further developed and revalidated prior to clinical applications. %R 10.2196/55916 %U https://formative.jmir.org/2024/1/e55916 %U https://doi.org/10.2196/55916 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e53863 %T Exploring the Applications of Explainability in Wearable Data Analytics: Systematic Literature Review %A Abdelaal,Yasmin %A Aupetit,Michaël %A Baggag,Abdelkader %A Al-Thani,Dena %+ College of Science and Engineering, Hamad Bin Khalifa University, Penrose Building, Doha, Qatar, 974 74024682, y.abdelaal98@gmail.com %K explainable artificial intelligence %K XAI %K wearable %K machine learning %K deep learning %K health informatics %K wearable sensors %K user experience %K wearable data %K analytics %K interpretation %D 2024 %7 24.12.2024 %9 Review %J J Med Internet Res %G English %X Background: Wearable technologies have become increasingly prominent in health care. However, intricate machine learning and deep learning algorithms often lead to the development of “black box” models, which lack transparency and comprehensibility for medical professionals and end users. In this context, the integration of explainable artificial intelligence (XAI) has emerged as a crucial solution. By providing insights into the inner workings of complex algorithms, XAI aims to foster trust and empower stakeholders to use wearable technologies responsibly. Objective: This paper aims to review the recent literature and explore the application of explainability in wearables. By examining how XAI can enhance the interpretability of generated data and models, this review sought to shed light on the possibilities that arise at the intersection of wearable technologies and XAI. Methods: We collected publications from ACM Digital Library, IEEE Xplore, PubMed, SpringerLink, JMIR, Nature, and Scopus. The eligible studies included technology-based research involving wearable devices, sensors, or mobile phones focused on explainability, machine learning, or deep learning and that used quantified self data in medical contexts. Only peer-reviewed articles, proceedings, or book chapters published in English between 2018 and 2022 were considered. We excluded duplicates, reviews, books, workshops, courses, tutorials, and talks. We analyzed 25 research papers to gain insights into the current state of explainability in wearables in the health care context. Results: Our findings revealed that wrist-worn wearables such as Fitbit and Empatica E4 are prevalent in health care applications. However, more emphasis must be placed on making the data generated by these devices explainable. Among various explainability methods, post hoc approaches stand out, with Shapley Additive Explanations as a prominent choice due to its adaptability. The outputs of explainability methods are commonly presented visually, often in the form of graphs or user-friendly reports. Nevertheless, our review highlights a limitation in user evaluation and underscores the importance of involving users in the development process. Conclusions: The integration of XAI into wearable health care technologies is crucial to address the issue of black box models. While wrist-worn wearables are widespread, there is a notable gap in making the data they generate explainable. Post hoc methods such as Shapley Additive Explanations have gained traction for their adaptability in explaining complex algorithms visually. However, user evaluation remains an area in which improvement is needed, and involving users in the development process can contribute to more transparent and reliable artificial intelligence models in health care applications. Further research in this area is essential to enhance the transparency and trustworthiness of artificial intelligence models used in wearable health care technology. %M 39718820 %R 10.2196/53863 %U https://www.jmir.org/2024/1/e53863 %U https://doi.org/10.2196/53863 %U http://www.ncbi.nlm.nih.gov/pubmed/39718820 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e65123 %T Authors’ Reply: Reassessing AI in Medicine: Exploring the Capabilities of AI in Academic Abstract Synthesis %A Hsu,Tien-Wei %A Liang,Chih-Sung %+ Department of Psychiatry, Tri-service Hospital, Beitou Branch, No. 60, Xinmin Road, Beitou District, Taipei, 112, Taiwan, 886 2 28959808, lcsyfw@gmail.com %K ChatGPT %K AI-generated scientific content %K plagiarism %K AI %K artificial intelligence %K NLP %K natural language processing %K LLM %K language model %K text %K textual %K generation %K generative %K extract %K extraction %K scientific research %K academic research %K publication %K abstract %K comparative analysis %K reviewer bias %D 2024 %7 23.12.2024 %9 Letter to the Editor %J J Med Internet Res %G English %X N/A %R 10.2196/65123 %U https://www.jmir.org/2024/1/e65123 %U https://doi.org/10.2196/65123 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e55920 %T Reassessing AI in Medicine: Exploring the Capabilities of AI in Academic Abstract Synthesis %A Wang,Zijian %A Zhou,Chunyang %+ Department of Radiation Oncology, Qilu Hospital (Qingdao), Cheeloo College of Medicine, Shandong University, 758 Hefei Road, Qingdao, Qingdao, 266000, China, 86 18561813085, chunyangzhou29@163.com %K ChatGPT %K AI-generated scientific content %K plagiarism %K AI %K artificial intelligence %K NLP %K natural language processing %K LLM %K language model %K text %K textual %K generation %K generative %K extract %K extraction %K scientific research %K academic research %K publication %K abstract %K comparative analysis %K reviewer bias %D 2024 %7 23.12.2024 %9 Letter to the Editor %J J Med Internet Res %G English %X %R 10.2196/55920 %U https://www.jmir.org/2024/1/e55920 %U https://doi.org/10.2196/55920 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e63866 %T Building a Human Digital Twin (HDTwin) Using Large Language Models for Cognitive Diagnosis: Algorithm Development and Validation %A Sprint,Gina %A Schmitter-Edgecombe,Maureen %A Cook,Diane %+ School of Electrical Engineering and Computer Science, Washington State University, Box 642752, Pullman, WA, 99164-2752, United States, 1 509 335 4985, djcook@wsu.edu %K human digital twin %K cognitive health %K cognitive diagnosis %K large language models %K artificial intelligence %K machine learning %K digital behavior marker %K interview marker %K health information %K chatbot %K digital twin %K smartwatch %D 2024 %7 23.12.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Human digital twins have the potential to change the practice of personalizing cognitive health diagnosis because these systems can integrate multiple sources of health information and influence into a unified model. Cognitive health is multifaceted, yet researchers and clinical professionals struggle to align diverse sources of information into a single model. Objective: This study aims to introduce a method called HDTwin, for unifying heterogeneous data using large language models. HDTwin is designed to predict cognitive diagnoses and offer explanations for its inferences. Methods: HDTwin integrates cognitive health data from multiple sources, including demographic, behavioral, ecological momentary assessment, n-back test, speech, and baseline experimenter testing session markers. Data are converted into text prompts for a large language model. The system then combines these inputs with relevant external knowledge from scientific literature to construct a predictive model. The model’s performance is validated using data from 3 studies involving 124 participants, comparing its diagnostic accuracy with baseline machine learning classifiers. Results: HDTwin achieves a peak accuracy of 0.81 based on the automated selection of markers, significantly outperforming baseline classifiers. On average, HDTwin yielded accuracy=0.77, precision=0.88, recall=0.63, and Matthews correlation coefficient=0.57. In comparison, the baseline classifiers yielded average accuracy=0.65, precision=0.86, recall=0.35, and Matthews correlation coefficient=0.36. The experiments also reveal that HDTwin yields superior predictive accuracy when information sources are fused compared to single sources. HDTwin’s chatbot interface provides interactive dialogues, aiding in diagnosis interpretation and allowing further exploration of patient data. Conclusions: HDTwin integrates diverse cognitive health data, enhancing the accuracy and explainability of cognitive diagnoses. This approach outperforms traditional models and provides an interface for navigating patient information. The approach shows promise for improving early detection and intervention strategies in cognitive health. %M 39715540 %R 10.2196/63866 %U https://formative.jmir.org/2024/1/e63866 %U https://doi.org/10.2196/63866 %U http://www.ncbi.nlm.nih.gov/pubmed/39715540 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54676 %T Machine Learning and Deep Learning for Diagnosis of Lumbar Spinal Stenosis: Systematic Review and Meta-Analysis %A Wang,Tianyi %A Chen,Ruiyuan %A Fan,Ning %A Zang,Lei %A Yuan,Shuo %A Du,Peng %A Wu,Qichao %A Wang,Aobo %A Li,Jian %A Kong,Xiaochuan %A Zhu,Wenyi %+ Beijing Chaoyang Hospital, Capital Medical University, 5 JingYuan Road, Shijingshan District, Beijing, 100043, China, 86 51718268, zanglei@ccmu.edu.cn %K lumbar spinal stenosis %K LSS %K machine learning %K ML %K deep learning %K artificial intelligence %K AI %K diagnosis %K spine stenosis %K lumbar %K predictive model %K early detection %K diagnostic %K older adult %K %D 2024 %7 23.12.2024 %9 Review %J J Med Internet Res %G English %X Background: Lumbar spinal stenosis (LSS) is a major cause of pain and disability in older individuals worldwide. Although increasing studies of traditional machine learning (TML) and deep learning (DL) were conducted in the field of diagnosing LSS and gained prominent results, the performance of these models has not been analyzed systematically. Objective: This systematic review and meta-analysis aimed to pool the results and evaluate the heterogeneity of the current studies in using TML or DL models to diagnose LSS, thereby providing more comprehensive information for further clinical application. Methods: This review was performed under the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines using articles extracted from PubMed, Embase databases, and Cochrane Library databases. Studies that evaluated DL or TML algorithms assessment value on diagnosing LSS were included, while those with duplicated or unavailable data were excluded. Quality Assessment of Diagnostic Accuracy Studies 2 was used to estimate the risk of bias in each study. The MIDAS module and the METAPROP module of Stata (StataCorp) were used for data synthesis and statistical analyses. Results: A total of 12 studies with 15,044 patients reported the assessment value of TML or DL models for diagnosing LSS. The risk of bias assessment yielded 4 studies with high risk of bias, 3 with unclear risk of bias, and 5 with completely low risk of bias. The pooled sensitivity and specificity were 0.84 (95% CI: 0.82-0.86; I2=99.06%) and 0.87 (95% CI 0.84-0.90; I2=98.7%), respectively. The diagnostic odds ratio was 36 (95% CI 26-49), the positive likelihood ratio (LR+) was 6.6 (95% CI 5.1-8.4), and the negative likelihood ratio (LR–) was 0.18 (95% CI 0.16-0.21). The summary receiver operating characteristic curves, the area under the curve of TML or DL models for diagnosing LSS of 0.92 (95% CI 0.89-0.94), indicating a high diagnostic value. Conclusions: This systematic review and meta-analysis emphasize that despite the generally satisfactory diagnostic performance of artificial intelligence systems in the experimental stage for the diagnosis of LSS, none of them is reliable and practical enough to apply in real clinical practice. Further efforts, including optimization of model balance, widely accepted objective reference standards, multimodal strategy, large dataset for training and testing, external validation, and sufficient and scientific report, should be made to bridge the distance between current TML or DL models and real-life clinical applications in future studies. Trial Registration: PROSPERO CRD42024566535; https://tinyurl.com/msx59x8k %M 39715552 %R 10.2196/54676 %U https://www.jmir.org/2024/1/e54676 %U https://doi.org/10.2196/54676 %U http://www.ncbi.nlm.nih.gov/pubmed/39715552 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e60684 %T AI in Dental Radiology—Improving the Efficiency of Reporting With ChatGPT: Comparative Study %A Stephan,Daniel %A Bertsch,Annika %A Burwinkel,Matthias %A Vinayahalingam,Shankeeth %A Al-Nawas,Bilal %A Kämmerer,Peer W %A Thiem,Daniel GE %+ Department of Oral and Maxillofacial Surgery, Facial Plastic Surgery, University Medical Centre of the Johannes Gutenberg-University Mainz, Augustusplatz 2, Mainz, 55131, Germany, 49 6131177038, stephand@uni-mainz.de %K artificial intelligence %K ChatGPT %K radiology report %K dental radiology %K dental orthopantomogram %K panoramic radiograph %K dental %K radiology %K chatbot %K medical documentation %K medical application %K imaging %K disease detection %K clinical decision support %K natural language processing %K medical licensing %K dentistry %K patient care %D 2024 %7 23.12.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Structured and standardized documentation is critical for accurately recording diagnostic findings, treatment plans, and patient progress in health care. Manual documentation can be labor-intensive and error-prone, especially under time constraints, prompting interest in the potential of artificial intelligence (AI) to automate and optimize these processes, particularly in medical documentation. Objective: This study aimed to assess the effectiveness of ChatGPT (OpenAI) in generating radiology reports from dental panoramic radiographs, comparing the performance of AI-generated reports with those manually created by dental students. Methods: A total of 100 dental students were tasked with analyzing panoramic radiographs and generating radiology reports manually or assisted by ChatGPT using a standardized prompt derived from a diagnostic checklist. Results: Reports generated by ChatGPT showed a high degree of textual similarity to reference reports; however, they often lacked critical diagnostic information typically included in reports authored by students. Despite this, the AI-generated reports were consistent in being error-free and matched the readability of student-generated reports. Conclusions: The findings from this study suggest that ChatGPT has considerable potential for generating radiology reports, although it currently faces challenges in accuracy and reliability. This underscores the need for further refinement in the AI’s prompt design and the development of robust validation mechanisms to enhance its use in clinical settings. %M 39714078 %R 10.2196/60684 %U https://www.jmir.org/2024/1/e60684 %U https://doi.org/10.2196/60684 %U http://www.ncbi.nlm.nih.gov/pubmed/39714078 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 7 %N %P e59370 %T Machine Learning Driven by Magnetic Resonance Imaging for the Classification of Alzheimer Disease Progression: Systematic Review and Meta-Analysis %A Battineni,Gopi %A Chintalapudi,Nalini %A Amenta,Francesco %+ Clinical Research, Telemedicine and Telepharmacy Centre, School of Medicinal and Health Products Sciences, University Camerino, Via Madonna Delle Carceri 9, Camerino, 62032, Italy, 39 3331728206, gopi.battineni@unicam.it %K Alzheimer disease %K ML-based diagnosis %K machine learning %K prevalence %K cognitive impairment %K classification %K biomarkers %K imaging modalities %K MRI %K magnetic resonance imaging %K systematic review %K meta-analysis %D 2024 %7 23.12.2024 %9 Review %J JMIR Aging %G English %X Background: To diagnose Alzheimer disease (AD), individuals are classified according to the severity of their cognitive impairment. There are currently no specific causes or conditions for this disease. Objective: The purpose of this systematic review and meta-analysis was to assess AD prevalence across different stages using machine learning (ML) approaches comprehensively. Methods: The selection of papers was conducted in 3 phases, as per PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) 2020 guidelines: identification, screening, and final inclusion. The final analysis included 24 papers that met the criteria. The selection of ML approaches for AD diagnosis was rigorously based on their relevance to the investigation. The prevalence of patients with AD at 2, 3, 4, and 6 stages was illustrated through the use of forest plots. Results: The prevalence rate for both cognitively normal (CN) and AD across 6 studies was 49.28% (95% CI 46.12%-52.45%; P=.32). The prevalence estimate for the 3 stages of cognitive impairment (CN, mild cognitive impairment, and AD) is 29.75% (95% CI 25.11%-34.84%, P<.001). Among 5 studies with 14,839 participants, the analysis of 4 stages (nondemented, moderately demented, mildly demented, and AD) found an overall prevalence of 13.13% (95% CI 3.75%-36.66%; P<.001). In addition, 4 studies involving 3819 participants estimated the prevalence of 6 stages (CN, significant memory concern, early mild cognitive impairment, mild cognitive impairment, late mild cognitive impairment, and AD), yielding a prevalence of 23.75% (95% CI 12.22%-41.12%; P<.001). Conclusions: The significant heterogeneity observed across studies reveals that demographic and setting characteristics are responsible for the impact on AD prevalence estimates. This study shows how ML approaches can be used to describe AD prevalence across different stages, which provides valuable insights for future research. %M 39714089 %R 10.2196/59370 %U https://aging.jmir.org/2024/1/e59370 %U https://doi.org/10.2196/59370 %U http://www.ncbi.nlm.nih.gov/pubmed/39714089 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e67056 %T Automated Pathologic TN Classification Prediction and Rationale Generation From Lung Cancer Surgical Pathology Reports Using a Large Language Model Fine-Tuned With Chain-of-Thought: Algorithm Development and Validation Study %A Kim,Sanghwan %A Jang,Sowon %A Kim,Borham %A Sunwoo,Leonard %A Kim,Seok %A Chung,Jin-Haeng %A Nam,Sejin %A Cho,Hyeongmin %A Lee,Donghyoung %A Lee,Keehyuck %A Yoo,Sooyoung %+ Office of eHealth Research and Business, Seoul National University Bundang Hospital, Healthcare Innovation Park, Seongnam, 13605, Republic of Korea, 82 317878980, yoosoo0@snubh.org %K AJCC Cancer Staging Manual 8th edition %K American Joint Committee on Cancer %K large language model %K chain-of-thought %K rationale %K lung cancer %K report analysis %K AI %K surgery %K pathology reports %K tertiary hospital %K generative language models %K efficiency %K accuracy %K automated %D 2024 %7 20.12.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Traditional rule-based natural language processing approaches in electronic health record systems are effective but are often time-consuming and prone to errors when handling unstructured data. This is primarily due to the substantial manual effort required to parse and extract information from diverse types of documentation. Recent advancements in large language model (LLM) technology have made it possible to automatically interpret medical context and support pathologic staging. However, existing LLMs encounter challenges in rapidly adapting to specialized guideline updates. In this study, we fine-tuned an LLM specifically for lung cancer pathologic staging, enabling it to incorporate the latest guidelines for pathologic TN classification. Objective: This study aims to evaluate the performance of fine-tuned generative language models in automatically inferring pathologic TN classifications and extracting their rationale from lung cancer surgical pathology reports. By addressing the inefficiencies and extensive parsing efforts associated with rule-based methods, this approach seeks to enable rapid and accurate reclassification aligned with the latest cancer staging guidelines. Methods: We conducted a comparative performance evaluation of 6 open-source LLMs for automated TN classification and rationale generation, using 3216 deidentified lung cancer surgical pathology reports based on the American Joint Committee on Cancer (AJCC) Cancer Staging Manual8th edition, collected from a tertiary hospital. The dataset was preprocessed by segmenting each report according to lesion location and morphological diagnosis. Performance was assessed using exact match ratio (EMR) and semantic match ratio (SMR) as evaluation metrics, which measure classification accuracy and the contextual alignment of the generated rationales, respectively. Results: Among the 6 models, the Orca2_13b model achieved the highest performance with an EMR of 0.934 and an SMR of 0.864. The Orca2_7b model also demonstrated strong performance, recording an EMR of 0.914 and an SMR of 0.854. In contrast, the Llama2_7b model achieved an EMR of 0.864 and an SMR of 0.771, while the Llama2_13b model showed an EMR of 0.762 and an SMR of 0.690. The Mistral_7b and Llama3_8b models, on the other hand, showed lower performance, with EMRs of 0.572 and 0.489, and SMRs of 0.377 and 0.456, respectively. Overall, the Orca2 models consistently outperformed the others in both TN stage classification and rationale generation. Conclusions: The generative language model approach presented in this study has the potential to enhance and automate TN classification in complex cancer staging, supporting both clinical practice and oncology data curation. With additional fine-tuning based on cancer-specific guidelines, this approach can be effectively adapted to other cancer types. %M 39705675 %R 10.2196/67056 %U https://medinform.jmir.org/2024/1/e67056 %U https://doi.org/10.2196/67056 %U http://www.ncbi.nlm.nih.gov/pubmed/39705675 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e66648 %T Large Language Models in Gastroenterology: Systematic Review %A Gong,Eun Jeong %A Bang,Chang Seok %A Lee,Jae Jun %A Park,Jonghyung %A Kim,Eunsil %A Kim,Subeen %A Kimm,Minjae %A Choi,Seoung-Ho %+ Department of Internal Medicine, Hallym University College of Medicine, sakjuro, 77, Chuncheon, Republic of Korea, 82 332405000, csbang@hallym.ac.kr %K large language model %K LLM %K deep learning %K artificial intelligence %K AI %K endoscopy %K gastroenterology %K clinical practice %K systematic review %K diagnostic %K accuracy %K patient engagement %K emotional support %K data privacy %K diagnosis %K clinical reasoning %D 2024 %7 20.12.2024 %9 Review %J J Med Internet Res %G English %X Background: As health care continues to evolve with technological advancements, the integration of artificial intelligence into clinical practices has shown promising potential to enhance patient care and operational efficiency. Among the forefront of these innovations are large language models (LLMs), a subset of artificial intelligence designed to understand, generate, and interact with human language at an unprecedented scale. Objective: This systematic review describes the role of LLMs in improving diagnostic accuracy, automating documentation, and advancing specialist education and patient engagement within the field of gastroenterology and gastrointestinal endoscopy. Methods: Core databases including MEDLINE through PubMed, Embase, and Cochrane Central registry were searched using keywords related to LLMs (from inception to April 2024). Studies were included if they satisfied the following criteria: (1) any type of studies that investigated the potential role of LLMs in the field of gastrointestinal endoscopy or gastroenterology, (2) studies published in English, and (3) studies in full-text format. The exclusion criteria were as follows: (1) studies that did not report the potential role of LLMs in the field of gastrointestinal endoscopy or gastroenterology, (2) case reports and review papers, (3) ineligible research objects (eg, animals or basic research), and (4) insufficient data regarding the potential role of LLMs. Risk of Bias in Non-Randomized Studies—of Interventions was used to evaluate the quality of the identified studies. Results: Overall, 21 studies on the potential role of LLMs in gastrointestinal disorders were included in the systematic review, and narrative synthesis was done because of heterogeneity in the specified aims and methodology in each included study. The overall risk of bias was low in 5 studies and moderate in 16 studies. The ability of LLMs to spread general medical information, offer advice for consultations, generate procedure reports automatically, or draw conclusions about the presumptive diagnosis of complex medical illnesses was demonstrated by the systematic review. Despite promising benefits, such as increased efficiency and improved patient outcomes, challenges related to data privacy, accuracy, and interdisciplinary collaboration remain. Conclusions: We highlight the importance of navigating these challenges to fully leverage LLMs in transforming gastrointestinal endoscopy practices. Trial Registration: PROSPERO 581772; https://www.crd.york.ac.uk/prospero/ %M 39705703 %R 10.2196/66648 %U https://www.jmir.org/2024/1/e66648 %U https://doi.org/10.2196/66648 %U http://www.ncbi.nlm.nih.gov/pubmed/39705703 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 13 %N %P e57271 %T Unveiling the Influence of AI on Advancements in Respiratory Care: Narrative Review %A Alqahtani,Mohammed M %A Alanazi,Abdullah M M %A Algarni,Saleh S %A Aljohani,Hassan %A Alenezi,Faraj K %A F Alotaibi,Tareq %A Alotaibi,Mansour %A K Alqahtani,Mobarak %A Alahmari,Mushabbab %A S Alwadeai,Khalid %A M Alghamdi,Saeed %A Almeshari,Mohammed A %A Alshammari,Turki Faleh %A Mumenah,Noora %A Al Harbi,Ebtihal %A Al Nufaiei,Ziyad F %A Alhuthail,Eyas %A Alzahrani,Esam %A Alahmadi,Husam %A Alarifi,Abdulaziz %A Zaidan,Amal %A T Ismaeil,Taha %+ Department of Respiratory Therapy, College of Applied Medical Sciences, King Saud bin Abdulaziz University for Health Sciences, MC-3129, PO Box 3660, Riyadh, 11481, Saudi Arabia, 966 501407856, Qahtanimoh@ksau-hs.edu.sa %K artificial intelligence %K AI %K respiratory care %K machine learning %K digital health %K narrative review %D 2024 %7 20.12.2024 %9 Review %J Interact J Med Res %G English %X Background: Artificial intelligence is experiencing rapid growth, with continual innovation and advancements in the health care field. Objective: This study aims to evaluate the application of artificial intelligence technologies across various domains of respiratory care. Methods: We conducted a narrative review to examine the latest advancements in the use of artificial intelligence in the field of respiratory care. The search was independently conducted by respiratory care experts, each focusing on their respective scope of practice and area of interest. Results: This review illuminates the diverse applications of artificial intelligence, highlighting its use in areas associated with respiratory care. Artificial intelligence is harnessed across various areas in this field, including pulmonary diagnostics, respiratory care research, critical care or mechanical ventilation, pulmonary rehabilitation, telehealth, public health or health promotion, sleep clinics, home care, smoking or vaping behavior, and neonates and pediatrics. With its multifaceted utility, artificial intelligence can enhance the field of respiratory care, potentially leading to superior health outcomes for individuals under this extensive umbrella. Conclusions: As artificial intelligence advances, elevating academic standards in the respiratory care profession becomes imperative, allowing practitioners to contribute to research and understand artificial intelligence’s impact on respiratory care. The permanent integration of artificial intelligence into respiratory care creates the need for respiratory therapists to positively influence its progression. By participating in artificial intelligence development, respiratory therapists can augment their clinical capabilities, knowledge, and patient outcomes. %M 39705080 %R 10.2196/57271 %U https://www.i-jmr.org/2024/1/e57271 %U https://doi.org/10.2196/57271 %U http://www.ncbi.nlm.nih.gov/pubmed/39705080 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e42774 %T Identification of Gender Differences in Acute Myocardial Infarction Presentation and Management at Aga Khan University Hospital-Pakistan: Natural Language Processing Application in a Dataset of Patients With Cardiovascular Disease %A Ngaruiya,Christine %A Samad,Zainab %A Tajuddin,Salma %A Nasim,Zarmeen %A Leff,Rebecca %A Farhad,Awais %A Pires,Kyle %A Khan,Muhammad Alamgir %A Hartz,Lauren %A Safdar,Basmah %+ Department of Emergency Medicine, Yale School of Medicine, 464 Congress Avenue, Suite #260, New Haven, CT, 06519, United States, 1 2037852353, christine.ngaruiya@yale.edu %K natural language processing %K gender-based differences %K acute coronary syndrome %K global health %K Pakistan %K gender %K data %K dataset %K clinical %K research %K management %K patient %K medication %K women %K tool %D 2024 %7 20.12.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Ischemic heart disease is a leading cause of death globally with a disproportionate burden in low- and middle-income countries (LMICs). Natural language processing (NLP) allows for data enrichment in large datasets to facilitate key clinical research. We used NLP to assess gender differences in symptoms and management of patients hospitalized with acute myocardial infarction (AMI) at Aga Khan University Hospital-Pakistan. Objective: The primary objective of this study was to use NLP to assess gender differences in the symptoms and management of patients hospitalized with AMI at a tertiary care hospital in Pakistan. Methods: We developed an NLP-based methodology to extract AMI symptoms and medications from 5358 discharge summaries spanning the years 1988 to 2018. This dataset included patients admitted and discharged between January 1, 1988, and December 31, 2018, who were older than 18 years with a primary discharge diagnosis of AMI (using ICD-9 [International Classification of Diseases, Ninth Revision], diagnostic codes). The methodology used a fuzzy keyword-matching algorithm to extract AMI symptoms from the discharge summaries automatically. It first preprocesses the free text within the discharge summaries to extract passages indicating the presenting symptoms. Then, it applies fuzzy matching techniques to identify relevant keywords or phrases indicative of AMI symptoms, incorporating negation handling to minimize false positives. After manually reviewing the quality of extracted symptoms in a subset of discharge summaries through preliminary experiments, a similarity threshold of 80% was determined. Results: Among 1769 women and 3589 men with AMI, women had higher odds of presenting with shortness of breath (odds ratio [OR] 1.46, 95% CI 1.26-1.70) and lower odds of presenting with chest pain (OR 0.65, 95% CI 0.55-0.75), even after adjustment for diabetes and age. Presentation with abdominal pain, nausea, or vomiting was much less frequent but consistently more common in women (P<.001). “Ghabrahat,” a culturally distinct term for a feeling of impending doom was used by 5.09% of women and 3.69% of men as presenting symptom for AMI (P=.06). First-line medication prescription (statin and β-blockers) was lower in women: women had nearly 30% lower odds (OR 0.71, 95% CI 0.57-0.90) of being prescribed statins, and they had 40% lower odds (OR 0.67, 95% CI 0.57-0.78) of being prescribed β-blockers. Conclusions: Gender-based differences in clinical presentation and medication management were demonstrated in patients with AMI at a tertiary care hospital in Pakistan. The use of NLP for the identification of culturally nuanced clinical characteristics and management is feasible in LMICs and could be used as a tool to understand gender disparities and address key clinical priorities in LMICs. %M 39705071 %R 10.2196/42774 %U https://formative.jmir.org/2024/1/e42774 %U https://doi.org/10.2196/42774 %U http://www.ncbi.nlm.nih.gov/pubmed/39705071 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51255 %T A Machine Learning–Based Prediction Model for Acute Kidney Injury in Patients With Community-Acquired Pneumonia: Multicenter Validation Study %A Ma,Mengqing %A Chen,Caimei %A Chen,Dawei %A Zhang,Hao %A Du,Xia %A Sun,Qing %A Fan,Li %A Kong,Huiping %A Chen,Xueting %A Cao,Changchun %A Wan,Xin %+ Department of Nephrology, Nanjing First Hospital, Nanjing Medical University, Changle Road 68, Nanjing, Nanjing, 210006, China, 86 18951670991, wanxin@njmu.edu.cn %K acute kidney injury %K community-acquired %K pneumonia %K machine learning %K prediction model %D 2024 %7 19.12.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Acute kidney injury (AKI) is common in patients with community-acquired pneumonia (CAP) and is associated with increased morbidity and mortality. Objective: This study aimed to establish and validate predictive models for AKI in hospitalized patients with CAP based on machine learning algorithms. Methods: We trained and externally validated 5 machine learning algorithms, including logistic regression, support vector machine, random forest, extreme gradient boosting, and deep forest (DF). Feature selection was conducted using the sliding window forward feature selection technique. Shapley additive explanations and local interpretable model-agnostic explanation techniques were applied to the optimal model for visual interpretation. Results: A total of 6371 patients with CAP met the inclusion criteria. The development of CAP-associated AKI (CAP-AKI) was recognized in 1006 (15.8%) patients. The 11 selected indicators were sex, temperature, breathing rate, diastolic blood pressure, C-reactive protein, albumin, white blood cell, hemoglobin, platelet, blood urea nitrogen, and neutrophil count. The DF model achieved the best area under the receiver operating characteristic curve (AUC) and accuracy in the internal (AUC=0.89, accuracy=0.90) and external validation sets (AUC=0.87, accuracy=0.83). Furthermore, the DF model had the best calibration among all models. In addition, a web-based prediction platform was developed to predict CAP-AKI. Conclusions: The model described in this study is the first multicenter-validated AKI prediction model that accurately predicts CAP-AKI during hospitalization. The web-based prediction platform embedded with the DF model serves as a user-friendly tool for early identification of high-risk patients. %M 39699941 %R 10.2196/51255 %U https://www.jmir.org/2024/1/e51255 %U https://doi.org/10.2196/51255 %U http://www.ncbi.nlm.nih.gov/pubmed/39699941 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 7 %N %P e59839 %T The Depth Estimation and Visualization of Dermatological Lesions: Development and Usability Study %A Parekh,Pranav %A Oyeleke,Richard %A Vishwanath,Tejas %+ Stevens Institute of Technology, 1 Castle Point Terrace, Hoboken, NJ, 07030, United States, 1 4697744761, pparekh5@stevens.edu %K machine learning %K ML %K computer vision %K neural networks %K explainable AI %K XAI %K computer graphics %K red spot analysis %K mixed reality %K MR %K artificial intelligence %K visualization %D 2024 %7 18.12.2024 %9 Original Paper %J JMIR Dermatol %G English %X Background: Thus far, considerable research has been focused on classifying a lesion as benign or malignant. However, there is a requirement for quick depth estimation of a lesion for the accurate clinical staging of the lesion. The lesion could be malignant and quickly grow beneath the skin. While biopsy slides provide clear information on lesion depth, it is an emerging domain to find quick and noninvasive methods to estimate depth, particularly based on 2D images. Objective: This study proposes a novel methodology for the depth estimation and visualization of skin lesions. Current diagnostic methods are approximate in determining how much a lesion may have proliferated within the skin. Using color gradients and depth maps, this method will give us a definite estimate and visualization procedure for lesions and other skin issues. We aim to generate 3D holograms of the lesion depth such that dermatologists can better diagnose melanoma. Methods: We started by performing classification using a convolutional neural network (CNN), followed by using explainable artificial intelligence to localize the image features responsible for the CNN output. We used the gradient class activation map approach to perform localization of the lesion from the rest of the image. We applied computer graphics for depth estimation and developing the 3D structure of the lesion. We used the depth from defocus method for depth estimation from single images and Gabor filters for volumetric representation of the depth map. Our novel method, called red spot analysis, measures the degree of infection based on how a conical hologram is constructed. We collaborated with a dermatologist to analyze the 3D hologram output and received feedback on how this method can be introduced to clinical implementation. Results: The neural model plus the explainable artificial intelligence algorithm achieved an accuracy of 86% in classifying the lesions correctly as benign or malignant. For the entire pipeline, we mapped the benign and malignant cases to their conical representations. We received exceedingly positive feedback while pitching this idea at the King Edward Memorial Institute in India. Dermatologists considered this a potentially useful tool in the depth estimation of lesions. We received a number of ideas for evaluating the technique before it can be introduced to the clinical scene. Conclusions: When we map the CNN outputs (benign or malignant) to the corresponding hologram, we observe that a malignant lesion has a higher concentration of red spots (infection) in the upper and deeper portions of the skin, and that the malignant cases have deeper conical sections when compared with the benign cases. This proves that the qualitative results map with the initial classification performed by the neural model. The positive feedback provided by the dermatologist suggests that the qualitative conclusion of the method is sufficient. %M 39693616 %R 10.2196/59839 %U https://derma.jmir.org/2024/1/e59839 %U https://doi.org/10.2196/59839 %U http://www.ncbi.nlm.nih.gov/pubmed/39693616 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e60665 %T An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontology-Enhanced Large Language Models: Development Study %A Cao,Lang %A Sun,Jimeng %A Cross,Adam %K rare disease %K clinical informatics %K LLM %K natural language processing %K machine learning %K artificial intelligence %K large language models %K data extraction %K ontologies %K knowledge graphs %K text mining %D 2024 %7 18.12.2024 %9 %J JMIR Med Inform %G English %X Background: Rare diseases affect millions worldwide but sometimes face limited research focus individually due to low prevalence. Many rare diseases do not have specific International Classification of Diseases, Ninth Edition (ICD-9) and Tenth Edition (ICD-10), codes and therefore cannot be reliably extracted from granular fields like “Diagnosis” and “Problem List” entries, which complicates tasks that require identification of patients with these conditions, including clinical trial recruitment and research efforts. Recent advancements in large language models (LLMs) have shown promise in automating the extraction of medical information, offering the potential to improve medical research, diagnosis, and management. However, most LLMs lack professional medical knowledge, especially concerning specific rare diseases, and cannot effectively manage rare disease data in its various ontological forms, making it unsuitable for these tasks. Objective: Our aim is to create an end-to-end system called automated rare disease mining (AutoRD), which automates the extraction of rare disease–related information from medical text, focusing on entities and their relations to other medical concepts, such as signs and symptoms. AutoRD integrates up-to-date ontologies with other structured knowledge and demonstrates superior performance in rare disease extraction tasks. We conducted various experiments to evaluate AutoRD’s performance, aiming to surpass common LLMs and traditional methods. Methods: AutoRD is a pipeline system that involves data preprocessing, entity extraction, relation extraction, entity calibration, and knowledge graph construction. We implemented this system using GPT-4 and medical knowledge graphs developed from the open-source Human Phenotype and Orphanet ontologies, using techniques such as chain-of-thought reasoning and prompt engineering. We quantitatively evaluated our system’s performance in entity extraction, relation extraction, and knowledge graph construction. The experiment used the well-curated dataset RareDis2023, which contains medical literature focused on rare disease entities and their relations, making it an ideal dataset for training and testing our methodology. Results: On the RareDis2023 dataset, AutoRD achieved an overall entity extraction F1-score of 56.1% and a relation extraction F1-score of 38.6%, marking a 14.4% improvement over the baseline LLM. Notably, the F1-score for rare disease entity extraction reached 83.5%, indicating high precision and recall in identifying rare disease mentions. These results demonstrate the effectiveness of integrating LLMs with medical ontologies in extracting complex rare disease information. Conclusions: AutoRD is an automated end-to-end system for extracting rare disease information from text to build knowledge graphs, addressing critical limitations of existing LLMs by improving identification of these diseases and connecting them to related clinical features. This work underscores the significant potential of LLMs in transforming health care, particularly in the rare disease domain. By leveraging ontology-enhanced LLMs, AutoRD constructs a robust medical knowledge base that incorporates up-to-date rare disease information, facilitating improved identification of patients and resulting in more inclusive research and trial candidacy efforts. %R 10.2196/60665 %U https://medinform.jmir.org/2024/1/e60665 %U https://doi.org/10.2196/60665 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e57592 %T Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study %A Roos,Jonas %A Martin,Ron %A Kaczmarczyk,Robert %K medical education %K visual question answering %K image analysis %K large language model %K LLM %K student %K performance %K comparative %K case study %K artificial intelligence %K AI %K ChatGPT %K effectiveness %K diagnostic %K training %K accuracy %K utility %K image-based %K question %K image %K AMBOSS %K English %K German %K question and answer %K Python %K AI in health care %K health care %D 2024 %7 17.12.2024 %9 %J JMIR Form Res %G English %X Background: The rapid development of large language models (LLMs) such as OpenAI’s ChatGPT has significantly impacted medical research and education. These models have shown potential in fields ranging from radiological imaging interpretation to medical licensing examination assistance. Recently, LLMs have been enhanced with image recognition capabilities. Objective: This study aims to critically examine the effectiveness of these LLMs in medical diagnostics and training by assessing their accuracy and utility in answering image-based questions from medical licensing examinations. Methods: This study analyzed 1070 image-based multiple-choice questions from the AMBOSS learning platform, divided into 605 in English and 465 in German. Customized prompts in both languages directed the models to interpret medical images and provide the most likely diagnosis. Student performance data were obtained from AMBOSS, including metrics such as the “student passed mean” and “majority vote.” Statistical analysis was conducted using Python (Python Software Foundation), with key libraries for data manipulation and visualization. Results: GPT-4 1106 Vision Preview (OpenAI) outperformed Bard Gemini Pro (Google), correctly answering 56.9% (609/1070) of questions compared to Bard’s 44.6% (477/1070), a statistically significant difference (χ2₁=32.1, P<.001). However, GPT-4 1106 left 16.1% (172/1070) of questions unanswered, significantly higher than Bard’s 4.1% (44/1070; χ2₁=83.1, P<.001). When considering only answered questions, GPT-4 1106’s accuracy increased to 67.8% (609/898), surpassing both Bard (477/1026, 46.5%; χ2₁=87.7, P<.001) and the student passed mean of 63% (674/1070, SE 1.48%; χ2₁=4.8, P=.03). Language-specific analysis revealed both models performed better in German than English, with GPT-4 1106 showing greater accuracy in German (282/465, 60.65% vs 327/605, 54.1%; χ2₁=4.4, P=.04) and Bard Gemini Pro exhibiting a similar trend (255/465, 54.8% vs 222/605, 36.7%; χ2₁=34.3, P<.001). The student majority vote achieved an overall accuracy of 94.5% (1011/1070), significantly outperforming both artificial intelligence models (GPT-4 1106: χ2₁=408.5, P<.001; Bard Gemini Pro: χ2₁=626.6, P<.001). Conclusions: Our study shows that GPT-4 1106 Vision Preview and Bard Gemini Pro have potential in medical visual question-answering tasks and to serve as a support for students. However, their performance varies depending on the language used, with a preference for German. They also have limitations in responding to non-English content. The accuracy rates, particularly when compared to student responses, highlight the potential of these models in medical education, yet the need for further optimization and understanding of their limitations in diverse linguistic contexts remains critical. %R 10.2196/57592 %U https://formative.jmir.org/2024/1/e57592 %U https://doi.org/10.2196/57592 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 7 %N %P e57899 %T Expectations and Requirements of Surgical Staff for an AI-Supported Clinical Decision Support System for Older Patients: Qualitative Study %A Uihlein,Adriane %A Beissel,Lisa %A Ajlani,Anna Hanane %A Orzechowski,Marcin %A Leinert,Christoph %A Kocar,Thomas Derya %A Pankratz,Carlos %A Schuetze,Konrad %A Gebhard,Florian %A Steger,Florian %A Fotteler,Marina Liselotte %A Denkinger,Michael %K traumatology %K orthogeriatrics %K older adult %K elderly %K older people %K aging %K interviews %K mHealth %K mobile health %K mobile application %K digital health %K digital technology %K digital intervention %K CDSS %K clinical decision support system %K artificial intelligence %K AI %K algorithm %K predictive model %K predictive analytics %K predictive system %K practical model %K decision support %K decision support tool %D 2024 %7 17.12.2024 %9 %J JMIR Aging %G English %X Background: Geriatric comanagement has been shown to improve outcomes of older surgical inpatients. Furthermore, the choice of discharge location, that is, continuity of care, can have a fundamental impact on convalescence. These challenges and demands have led to the SURGE-Ahead project that aims to develop a clinical decision support system (CDSS) for geriatric comanagement in surgical clinics including a decision support for the best continuity of care option, supported by artificial intelligence (AI) algorithms. Objective: This qualitative study aims to explore the current challenges and demands in surgical geriatric patient care. Based on these challenges, the study explores the attitude of interviewees toward the introduction of an AI-supported CDSS (AI-CDSS) in geriatric patient care in surgery, focusing on technical and general wishes about an AI-CDSS, as well as ethical considerations. Methods: In this study, 15 personal interviews with physicians, nurses, physiotherapists, and social workers, employed in surgical departments at a university hospital in Southern Germany, were conducted in April 2022. Interviews were conducted in person, transcribed, and coded by 2 researchers (AU, LB) using content and thematic analysis. During the analysis, quotes were sorted into the main categories of geriatric patient care, use of an AI-CDSS, and ethical considerations by 2 authors (AU, LB). The main themes of the interviews were subsequently described in a narrative synthesis, citing key quotes. Results: In total, 399 quotes were extracted and categorized from the interviews. Most quotes could be assigned to the primary code challenges in geriatric patient care (111 quotes), with the most frequent subcode being medical challenges (45 quotes). More quotes were assigned to the primary code chances of an AI-CDSS (37 quotes), with its most frequent subcode being holistic patient overview (16 quotes), then to the primary code limits of an AI-CDSS (26 quotes). Regarding the primary code technical wishes (37 quotes), most quotes could be assigned to the subcode intuitive usability (15 quotes), followed by mobile availability and easy access (11 quotes). Regarding the main category ethical aspects of an AI-CDSS, most quotes could be assigned to the subcode critical position toward trust in an AI-CDSS (9 quotes), followed by the subcodes respecting the patient’s will and individual situation (8 quotes) and responsibility remaining in the hands of humans (7 quotes). Conclusions: Support regarding medical geriatric challenges and responsible handling of AI-based recommendations, as well as necessity for a holistic approach focused on usability, were the most important topics of health care professionals in surgery regarding development of an AI-CDSS for geriatric care. These findings, together with the wish to preserve the patient-caregiver relationship, will help set the focus for the ongoing development of AI-supported CDSS. %R 10.2196/57899 %U https://aging.jmir.org/2024/1/e57899 %U https://doi.org/10.2196/57899 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e64362 %T Geospatial Modeling of Deep Neural Visual Features for Predicting Obesity Prevalence in Missouri: Quantitative Study %A Dahu,Butros M %A Khan,Solaiman %A Toubal,Imad Eddine %A Alshehri,Mariam %A Martinez-Villar,Carlos I %A Ogundele,Olabode B %A Sheets,Lincoln R %A Scott,Grant J %+ University of Missouri, Institute for Data Science and Informatics, Columbia, MO, United States, 1 8325124825, peterdahu@gmail.com %K geospatial modeling %K deep convolutional neural network %K DCNN %K Residual Network-50 %K ResNet-50 %K satellite imagery %K Moran I %K local indicators of spatial association %K LISA %K spatial lag model %K obesity rate %K artificial intelligence %K AI %D 2024 %7 17.12.2024 %9 Original Paper %J JMIR AI %G English %X Background: The global obesity epidemic demands innovative approaches to understand its complex environmental and social determinants. Spatial technologies, such as geographic information systems, remote sensing, and spatial machine learning, offer new insights into this health issue. This study uses deep learning and spatial modeling to predict obesity rates for census tracts in Missouri. Objective: This study aims to develop a scalable method for predicting obesity prevalence using deep convolutional neural networks applied to satellite imagery and geospatial analysis, focusing on 1052 census tracts in Missouri. Methods: Our analysis followed 3 steps. First, Sentinel-2 satellite images were processed using the Residual Network-50 model to extract environmental features from 63,592 image chips (224×224 pixels). Second, these features were merged with obesity rate data from the Centers for Disease Control and Prevention for Missouri census tracts. Third, a spatial lag model was used to predict obesity rates and analyze the association between deep neural visual features and obesity prevalence. Spatial autocorrelation was used to identify clusters of obesity rates. Results: Substantial spatial clustering of obesity rates was found across Missouri, with a Moran I value of 0.68, indicating similar obesity rates among neighboring census tracts. The spatial lag model demonstrated strong predictive performance, with an R2 of 0.93 and a spatial pseudo R2 of 0.92, explaining 93% of the variation in obesity rates. Local indicators from a spatial association analysis revealed regions with distinct high and low clusters of obesity, which were visualized through choropleth maps. Conclusions: This study highlights the effectiveness of integrating deep convolutional neural networks and spatial modeling to predict obesity prevalence based on environmental features from satellite imagery. The model’s high accuracy and ability to capture spatial patterns offer valuable insights for public health interventions. Future work should expand the geographical scope and include socioeconomic data to further refine the model for broader applications in obesity research. %M 39688897 %R 10.2196/64362 %U https://ai.jmir.org/2024/1/e64362 %U https://doi.org/10.2196/64362 %U http://www.ncbi.nlm.nih.gov/pubmed/39688897 %0 Journal Article %@ 2373-6658 %I JMIR Publications %V 8 %N %P e67928 %T Emotional Touch Nursing Competencies Model of the Fourth Industrial Revolution: Instrument Validation Study %A Jung,Sun-Young %A Lee,Ji-Hyeon %+ College of Nursing, Daegu University, Seongdang-ro 50-gil 33, Nam-gu, Daegu, 42601, Republic of Korea, 82 536508016, jihyeonnlee@naver.com %K nurse %K therapeutic touch %K clinical competence %K factor analysis %K statistical %K reliability %K scale %K tool %K nursing %K industrial revolution %K competencies %K health care %K emotional %K interview %K collaborative practice %K learning agility %K professional commitment %K positive self-worth %K compliance %K ethics %K practice ability %K relationship ability %K nursing sensitivity %D 2024 %7 16.12.2024 %9 Original Paper %J Asian Pac Isl Nurs J %G English %X Background: The Fourth Industrial Revolution is transforming the health care sector through advanced technologies such as artificial intelligence, the Internet of Things, and big data, leading to new expectations for rapid and accurate treatment. While the integration of technology in nursing tasks is on the rise, there remains a critical need to balance technological efficiency with empathy and emotional connection. This study aims to develop and validate a competency model for emotional touch nursing that responds to the evolving demands of the changing health care environment. Objective: The aims of our study are to develop an emotional touch nursing competencies model and to verify its reliability and validity. Methods: A conceptual framework and construct factors were developed based on an extensive literature review and in-depth interviews with nurses. The potential competencies were confirmed by 20 experts, and preliminary questions were prepared. The final version of the scale was verified through exploratory factor analysis (n=255) and confirmatory factor analysis (n=256) to assess its validity and reliability. Results: From the exploratory analysis, 8 factors and 38 items (client-centered collaborative practice, learning agility for nursing, nursing professional commitment, positive self-worth, compliance with ethics and roles, nursing practice competence, nurse-client relationship, and nursing sensitivity) were extracted. These items were verified through convergent and discriminant validity testing. The internal consistency reliability was acceptable (Cronbach α=0.95). Conclusions: The findings from this study confirmed that this scale has sufficient validity and reliability to measure emotional touch nursing competencies. It is expected to be used to build a knowledge and educational system for emotional touch nursing. %M 39680900 %R 10.2196/67928 %U https://apinj.jmir.org/2024/1/e67928 %U https://doi.org/10.2196/67928 %U http://www.ncbi.nlm.nih.gov/pubmed/39680900 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e60794 %T Investigating Older Adults’ Perceptions of AI Tools for Medication Decisions: Vignette-Based Experimental Survey %A Vordenberg,Sarah E %A Nichols,Julianna %A Marshall,Vincent D %A Weir,Kristie Rebecca %A Dorsch,Michael P %+ College of Pharmacy, University of Michigan, 428 Church St, Ann Arbor, MI, 48109, United States, 1 734 763 6691, skelling@med.umich.edu %K older adults %K survey %K decisions %K artificial intelligence %K vignette %K drug %K pharmacology %K pharmaceutic %K medication %K decision-making %K geriatric %K aging %K surveys %K attitude %K perception %K perspective %K recommendation %K electronic heath record %D 2024 %7 16.12.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Given the public release of large language models, research is needed to explore whether older adults would be receptive to personalized medication advice given by artificial intelligence (AI) tools. Objective: This study aims to identify predictors of the likelihood of older adults stopping a medication and the influence of the source of the information. Methods: We conducted a web-based experimental survey in which US participants aged ≥65 years were asked to report their likelihood of stopping a medication based on the source of information using a 6-point Likert scale (scale anchors: 1=not at all likely; 6=extremely likely). In total, 3 medications were presented in a randomized order: aspirin (risk of bleeding), ranitidine (cancer-causing chemical), or simvastatin (lack of benefit with age). In total, 5 sources of information were presented: primary care provider (PCP), pharmacist, AI that connects with the electronic health record (EHR) and provides advice to the PCP (“EHR-PCP”), AI with EHR access that directly provides advice (“EHR-Direct”), and AI that asks questions to provide advice (“Questions-Direct”) directly. We calculated descriptive statistics to identify participants who were extremely likely (score 6) to stop the medication and used logistic regression to identify demographic predictors of being likely (scores 4-6) as opposed to unlikely (scores 1-3) to stop a medication. Results: Older adults (n=1245) reported being extremely likely to stop a medication based on a PCP’s recommendation (n=748, 60.1% [aspirin] to n=858, 68.9% [ranitidine]) compared to a pharmacist (n=227, 18.2% [simvastatin] to n=361, 29% [ranitidine]). They were infrequently extremely likely to stop a medication when recommended by AI (EHR-PCP: n=182, 14.6% [aspirin] to n=289, 23.2% [ranitidine]; EHR-Direct: n=118, 9.5% [simvastatin] to n=212, 17% [ranitidine]; Questions-Direct: n=121, 9.7% [aspirin] to n=204, 16.4% [ranitidine]). In adjusted analyses, characteristics that increased the likelihood of following an AI recommendation included being Black or African American as compared to White (Questions-Direct: odds ratio [OR] 1.28, 95% CI 1.06-1.54 to EHR-PCP: OR 1.42, 95% CI 1.17-1.73), having higher self-reported health (EHR-PCP: OR 1.09, 95% CI 1.01-1.18 to EHR-Direct: OR 1.13 95%, CI 1.05-1.23), having higher confidence in using an EHR (Questions-Direct: OR 1.36, 95% CI 1.16-1.58 to EHR-PCP: OR 1.55, 95% CI 1.33-1.80), and having higher confidence using apps (EHR-Direct: OR 1.38, 95% CI 1.18-1.62 to EHR-PCP: OR 1.49, 95% CI 1.27-1.74). Older adults with higher health literacy were less likely to stop a medication when recommended by AI (EHR-PCP: OR 0.81, 95% CI 0.75-0.88 to EHR-Direct: OR 0.85, 95% CI 0.78-0.92). Conclusions: Older adults have reservations about following an AI recommendation to stop a medication. However, individuals who are Black or African American, have higher self-reported health, or have higher confidence in using an EHR or apps may be receptive to AI-based medication recommendations. %R 10.2196/60794 %U https://www.jmir.org/2024/1/e60794 %U https://doi.org/10.2196/60794 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51409 %T Longitudinal Model Shifts of Machine Learning–Based Clinical Risk Prediction Models: Evaluation Study of Multiple Use Cases Across Different Hospitals %A Cabanillas Silva,Patricia %A Sun,Hong %A Rezk,Mohamed %A Roccaro-Waldmeyer,Diana M %A Fliegenschmidt,Janis %A Hulde,Nikolai %A von Dossow,Vera %A Meesseman,Laurent %A Depraetere,Kristof %A Stieg,Joerg %A Szymanowsky,Ralph %A Dahlweid,Fried-Michael %+ Dedalus HealthCare, Roderveldlaan 2, Antwerp, 2600, Belgium, 32 0784244010, mohamed.rezk@dedalus.com %K model shift %K model monitoring %K prediction models %K acute kidney injury %K AKI %K sepsis %K delirium %K decision curve analysis %K DCA %D 2024 %7 13.12.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: In recent years, machine learning (ML)–based models have been widely used in clinical domains to predict clinical risk events. However, in production, the performances of such models heavily rely on changes in the system and data. The dynamic nature of the system environment, characterized by continuous changes, has significant implications for prediction models, leading to performance degradation and reduced clinical efficacy. Thus, monitoring model shifts and evaluating their impact on prediction models are of utmost importance. Objective: This study aimed to assess the impact of a model shift on ML-based prediction models by evaluating 3 different use cases—delirium, sepsis, and acute kidney injury (AKI)—from 2 hospitals (M and H) with different patient populations and investigate potential model deterioration during the COVID-19 pandemic period. Methods: We trained prediction models using retrospective data from earlier years and examined the presence of a model shift using data from more recent years. We used the area under the receiver operating characteristic curve (AUROC) to evaluate model performance and analyzed the calibration curves over time. We also assessed the influence on clinical decisions by evaluating the alert rate, the rates of over- and underdiagnosis, and the decision curve. Results: The 2 data sets used in this study contained 189,775 and 180,976 medical cases for hospitals M and H, respectively. Statistical analyses (Z test) revealed no significant difference (P>.05) between the AUROCs from the different years for all use cases and hospitals. For example, in hospital M, AKI did not show a significant difference between 2020 (AUROC=0.898) and 2021 (AUROC=0.907, Z=–1.171, P=.242). Similar results were observed in both hospitals and for all use cases (sepsis and delirium) when comparing all the different years. However, when evaluating the calibration curves at the 2 hospitals, model shifts were observed for the delirium and sepsis use cases but not for AKI. Additionally, to investigate the clinical utility of our models, we performed decision curve analysis (DCA) and compared the results across the different years. A pairwise nonparametric statistical comparison showed no differences in the net benefit at the probability thresholds of interest (P>.05). The comprehensive evaluations performed in this study ensured robust model performance of all the investigated models across the years. Moreover, neither performance deteriorations nor alert surges were observed during the COVID-19 pandemic period. Conclusions: Clinical risk prediction models were affected by the dynamic and continuous evolution of clinical practices and workflows. The performance of the models evaluated in this study appeared stable when assessed using AUROCs, showing no significant variations over the years. Additional model shift investigations suggested that a calibration shift was present for certain use cases (delirium and sepsis). However, these changes did not have any impact on the clinical utility of the models based on DCA. Consequently, it is crucial to closely monitor data changes and detect possible model shifts, along with their potential influence on clinical decision-making. %M 39671571 %R 10.2196/51409 %U https://www.jmir.org/2024/1/e51409 %U https://doi.org/10.2196/51409 %U http://www.ncbi.nlm.nih.gov/pubmed/39671571 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e55856 %T Screening for Depression and Anxiety Using a Nonverbal Working Memory Task in a Sample of Older Brazilians: Observational Study of Preliminary Artificial Intelligence Model Transferability %A Georgescu,Alexandra Livia %A Cummins,Nicholas %A Molimpakis,Emilia %A Giacomazzi,Eduardo %A Rodrigues Marczyk,Joana %A Goria,Stefano %K depression %K anxiety %K Brazil %K machine learning %K n-back %K working memory %K artificial intelligence %K gerontology %K older adults %K mental health %K AI %K transferability %K detection %K screening %K questionnaire %K longitudinal study %D 2024 %7 12.12.2024 %9 %J JMIR Form Res %G English %X Background: Anxiety and depression represent prevalent yet frequently undetected mental health concerns within the older population. The challenge of identifying these conditions presents an opportunity for artificial intelligence (AI)–driven, remotely available, tools capable of screening and monitoring mental health. A critical criterion for such tools is their cultural adaptability to ensure effectiveness across diverse populations. Objective: This study aims to illustrate the preliminary transferability of two established AI models designed to detect high depression and anxiety symptom scores. The models were initially trained on data from a nonverbal working memory game (1- and 2-back tasks) in a dataset by thymia, a company that develops AI solutions for mental health and well-being assessments, encompassing over 6000 participants from the United Kingdom, United States, Mexico, Spain, and Indonesia. We seek to validate the models’ performance by applying it to a new dataset comprising older Brazilian adults, thereby exploring its transferability and generalizability across different demographics and cultures. Methods: A total of 69 Brazilian participants aged 51-92 years old were recruited with the help of Laços Saúde, a company specializing in nurse-led, holistic home care. Participants received a link to the thymia dashboard every Monday and Thursday for 6 months. The dashboard had a set of activities assigned to them that would take 10-15 minutes to complete, which included a 5-minute game with two levels of the n-back tasks. Two Random Forest models trained on thymia data to classify depression and anxiety based on thresholds defined by scores of the Patient Health Questionnaire (8 items) (PHQ-8) ≥10 and those of the Generalized Anxiety Disorder Assessment (7 items) (GAD-7) ≥10, respectively, were subsequently tested on the Laços Saúde patient cohort. Results: The depression classification model exhibited robust performance, achieving an area under the receiver operating characteristic curve (AUC) of 0.78, a specificity of 0.69, and a sensitivity of 0.72. The anxiety classification model showed an initial AUC of 0.63, with a specificity of 0.58 and a sensitivity of 0.64. This performance surpassed a benchmark model using only age and gender, which had AUCs of 0.47 for PHQ-8 and 0.53 for GAD-7. After recomputing the AUC scores on a cross-sectional subset of the data (the first n-back game session), we found AUCs of 0.79 for PHQ-8 and 0.76 for GAD-7. Conclusions: This study successfully demonstrates the preliminary transferability of two AI models trained on a nonverbal working memory task, one for depression and the other for anxiety classification, to a novel sample of older Brazilian adults. Future research could seek to replicate these findings in larger samples and other cultural contexts. Trial Registration: ISRCTN Registry ISRCTN90727704; https://www.isrctn.com/ISRCTN90727704 %R 10.2196/55856 %U https://formative.jmir.org/2024/1/e55856 %U https://doi.org/10.2196/55856 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e56863 %T Development and Validation of a Literature Screening Tool: Few-Shot Learning Approach in Systematic Reviews %A Wiwatthanasetthakarn,Phongphat %A Ponthongmak,Wanchana %A Looareesuwan,Panu %A Tansawet,Amarit %A Numthavaj,Pawin %A McKay,Gareth J %A Attia,John %A Thakkinstian,Ammarin %+ Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, 4th Floor, Sukho Place Building, Sukhothai Road, Dusit, Bangkok, 10300, Thailand, 66 022010833, wanchana.pon@mahidol.edu %K few shots learning %K deep learning %K natural language processing %K S-BERT %K systematic review %K study selection %K sentence-bidirectional encoder representations from transformers %D 2024 %7 11.12.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Systematic reviews (SRs) are considered the highest level of evidence, but their rigorous literature screening process can be time-consuming and resource-intensive. This is particularly challenging given the rapid pace of medical advancements, which can quickly make SRs outdated. Few-shot learning (FSL), a machine learning approach that learns effectively from limited data, offers a potential solution to streamline this process. Sentence-bidirectional encoder representations from transformers (S-BERT) are particularly promising for identifying relevant studies with fewer examples. Objective: This study aimed to develop a model framework using FSL to efficiently screen and select relevant studies for inclusion in SRs, aiming to reduce workload while maintaining high recall rates. Methods: We developed and validated the FSL model framework using 9 previously published SR projects (2016-2018). The framework used S-BERT with titles and abstracts as input data. Key evaluation metrics, including workload reduction, cosine similarity score, and the number needed to screen at 100% recall, were estimated to determine the optimal number of eligible studies for model training. A prospective evaluation phase involving 4 ongoing SRs was then conducted. Study selection by FSL and a secondary reviewer were compared with the principal reviewer (considered the gold standard) to estimate the false negative rate. Results: Model development suggested an optimal range of 4-12 eligible studies for FSL training. Using 4-6 eligible studies during model development resulted in similarity thresholds for 100% recall, ranging from 0.432 to 0.636, corresponding to a workload reduction of 51.11% (95% CI 46.36-55.86) to 97.67% (95% CI 96.76-98.58). The prospective evaluation of 4 SRs aimed for a 50% workload reduction, yielding numbers needed to screen 497 to 1035 out of 995 to 2070 studies. The false negative rate ranged from 1.87% to 12.20% for the FSL model and from 5% to 56.48% for the second reviewer compared with the principal reviewer. Conclusions: Our FSL framework demonstrates the potential for reducing workload in SR screening by over 50%. However, the model did not achieve 100% recall at this threshold, highlighting the potential for omitting eligible studies. Future work should focus on developing a web application to implement the FSL framework, making it accessible to researchers. %M 39662894 %R 10.2196/56863 %U https://www.jmir.org/2024/1/e56863 %U https://doi.org/10.2196/56863 %U http://www.ncbi.nlm.nih.gov/pubmed/39662894 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52597 %T Large Language Models and Empathy: Systematic Review %A Sorin,Vera %A Brin,Dana %A Barash,Yiftach %A Konen,Eli %A Charney,Alexander %A Nadkarni,Girish %A Klang,Eyal %+ Department of Radiology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, United States, 1 5072842511, verasrn@gmail.com %K empathy %K LLMs %K AI %K ChatGPT %K review methods %K review methodology %K systematic review %K scoping %K synthesis %K foundation models %K text-based %K human interaction %K emotional intelligence %K objective metrics %K human assessment %K emotions %K healthcare %K cognitive %K PRISMA %D 2024 %7 11.12.2024 %9 Review %J J Med Internet Res %G English %X Background: Empathy, a fundamental aspect of human interaction, is characterized as the ability to experience another being’s emotions within oneself. In health care, empathy is a fundamental for health care professionals and patients’ interaction. It is a unique quality to humans that large language models (LLMs) are believed to lack. Objective: We aimed to review the literature on the capacity of LLMs in demonstrating empathy. Methods: We conducted a literature search on MEDLINE, Google Scholar, PsyArXiv, medRxiv, and arXiv between December 2022 and February 2024. We included English-language full-length publications that evaluated empathy in LLMs’ outputs. We excluded papers evaluating other topics related to emotional intelligence that were not specifically empathy. The included studies’ results, including the LLMs used, performance in empathy tasks, and limitations of the models, along with studies’ metadata were summarized. Results: A total of 12 studies published in 2023 met the inclusion criteria. ChatGPT-3.5 (OpenAI) was evaluated in all studies, with 6 studies comparing it with other LLMs such GPT-4, LLaMA (Meta), and fine-tuned chatbots. Seven studies focused on empathy within a medical context. The studies reported LLMs to exhibit elements of empathy, including emotions recognition and emotional support in diverse contexts. Evaluation metric included automatic metrics such as Recall-Oriented Understudy for Gisting Evaluation and Bilingual Evaluation Understudy, and human subjective evaluation. Some studies compared performance on empathy with humans, while others compared between different models. In some cases, LLMs were observed to outperform humans in empathy-related tasks. For example, ChatGPT-3.5 was evaluated for its responses to patients’ questions from social media, where ChatGPT’s responses were preferred over those of humans in 78.6% of cases. Other studies used subjective readers’ assigned scores. One study reported a mean empathy score of 1.84-1.9 (scale 0-2) for their fine-tuned LLM, while a different study evaluating ChatGPT-based chatbots reported a mean human rating of 3.43 out of 4 for empathetic responses. Other evaluations were based on the level of the emotional awareness scale, which was reported to be higher for ChatGPT-3.5 than for humans. Another study evaluated ChatGPT and GPT-4 on soft-skills questions in the United States Medical Licensing Examination, where GPT-4 answered 90% of questions correctly. Limitations were noted, including repetitive use of empathic phrases, difficulty following initial instructions, overly lengthy responses, sensitivity to prompts, and overall subjective evaluation metrics influenced by the evaluator’s background. Conclusions: LLMs exhibit elements of cognitive empathy, recognizing emotions and providing emotionally supportive responses in various contexts. Since social skills are an integral part of intelligence, these advancements bring LLMs closer to human-like interactions and expand their potential use in applications requiring emotional intelligence. However, there remains room for improvement in both the performance of these models and the evaluation strategies used for assessing soft skills. %M 39661968 %R 10.2196/52597 %U https://www.jmir.org/2024/1/e52597 %U https://doi.org/10.2196/52597 %U http://www.ncbi.nlm.nih.gov/pubmed/39661968 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e63892 %T Leveraging Large Language Models for Improved Understanding of Communications With Patients With Cancer in a Call Center Setting: Proof-of-Concept Study %A Cho,Seungbeom %A Lee,Mangyeong %A Yu,Jaewook %A Yoon,Junghee %A Choi,Jae-Boong %A Jung,Kyu-Hwan %A Cho,Juhee %+ Department of Medical Device Management and Research, Samsung Advanced Institute for Health Sciences and Technology, Sungkyunkwan University, 81 Irwon‐ro, Gangnam, Seoul, 06355, Republic of Korea, 82 02 3410 3632, kyuhwanjung@gmail.com %K large language model %K cancer %K supportive care %K LLMs %K patient communication %K natural language processing %K NLP %K self-management %K teleconsultation %K triage services %K telephone consultations %D 2024 %7 11.12.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Hospital call centers play a critical role in providing support and information to patients with cancer, making it crucial to effectively identify and understand patient intent during consultations. However, operational efficiency and standardization of telephone consultations, particularly when categorizing diverse patient inquiries, remain significant challenges. While traditional deep learning models like long short-term memory (LSTM) and bidirectional encoder representations from transformers (BERT) have been used to address these issues, they heavily depend on annotated datasets, which are labor-intensive and time-consuming to generate. Large language models (LLMs) like GPT-4, with their in-context learning capabilities, offer a promising alternative for classifying patient intent without requiring extensive retraining. Objective: This study evaluates the performance of GPT-4 in classifying the purpose of telephone consultations of patients with cancer. In addition, it compares the performance of GPT-4 to that of discriminative models, such as LSTM and BERT, with a particular focus on their ability to manage ambiguous and complex queries. Methods: We used a dataset of 430,355 sentences from telephone consultations with patients with cancer between 2016 and 2020. LSTM and BERT models were trained on 300,000 sentences using supervised learning, while GPT-4 was applied using zero-shot and few-shot approaches without explicit retraining. The accuracy of each model was compared using 1,000 randomly selected sentences from 2020 onward, with special attention paid to how each model handled ambiguous or uncertain queries. Results: GPT-4, which uses only a few examples (a few shots), attained a remarkable accuracy of 85.2%, considerably outperforming the LSTM and BERT models, which achieved accuracies of 73.7% and 71.3%, respectively. Notably, categories such as “Treatment,” “Rescheduling,” and “Symptoms” involve multiple contexts and exhibit significant complexity. GPT-4 demonstrated more than 15% superior performance in handling ambiguous queries in these categories. In addition, GPT-4 excelled in categories like “Records” and “Routine,” where contextual clues were clear, outperforming the discriminative models. These findings emphasize the potential of LLMs, particularly GPT-4, for interpreting complicated patient interactions during cancer-related telephone consultations. Conclusions: This study shows the potential of GPT-4 to significantly improve the classification of patient intent in cancer-related telephone oncological consultations. GPT-4’s ability to handle complex and ambiguous queries without extensive retraining provides a substantial advantage over discriminative models like LSTM and BERT. While GPT-4 demonstrates strong performance in various areas, further refinement of prompt design and category definitions is necessary to fully leverage its capabilities in practical health care applications. Future research will explore the integration of LLMs like GPT-4 into hybrid systems that combine human oversight with artificial intelligence–driven technologies. %M 39661975 %R 10.2196/63892 %U https://www.jmir.org/2024/1/e63892 %U https://doi.org/10.2196/63892 %U http://www.ncbi.nlm.nih.gov/pubmed/39661975 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e55986 %T Accuracy of Machine Learning in Detecting Pediatric Epileptic Seizures: Systematic Review and Meta-Analysis %A Zou,Zhuan %A Chen,Bin %A Xiao,Dongqiong %A Tang,Fajuan %A Li,Xihong %+ Department of Emergency, West China Second University Hospital, Sichuan University, No 20, Section 3, Renmin South Road, Wuhou District, Chengdu, 610000, China, 86 13551089846, lixihonghxey@163.com %K epileptic seizures %K machine learning %K deep learning %K electroencephalogram %K EEG %K children %K pediatrics %K epilepsy %K detection %D 2024 %7 11.12.2024 %9 Review %J J Med Internet Res %G English %X Background: Real-time monitoring of pediatric epileptic seizures poses a significant challenge in clinical practice. In recent years, machine learning (ML) has attracted substantial attention from researchers for diagnosing and treating neurological diseases, leading to its application for detecting pediatric epileptic seizures. However, systematic evidence substantiating its feasibility remains limited. Objective: This systematic review aimed to consolidate the existing evidence regarding the effectiveness of ML in monitoring pediatric epileptic seizures with an effort to provide an evidence-based foundation for the development and enhancement of intelligent tools in the future. Methods: We conducted a systematic search of the PubMed, Cochrane, Embase, and Web of Science databases for original studies focused on the detection of pediatric epileptic seizures using ML, with a cutoff date of August 27, 2023. The risk of bias in eligible studies was assessed using the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies–2). Meta-analyses were performed to evaluate the C-index and the diagnostic 4-grid table, using a bivariate mixed-effects model for the latter. We also examined publication bias for the C-index by using funnel plots and the Egger test. Results: This systematic review included 28 original studies, with 15 studies on ML and 13 on deep learning (DL). All these models were based on electroencephalography data of children. The pooled C-index, sensitivity, specificity, and accuracy of ML in the training set were 0.76 (95% CI 0.69-0.82), 0.77 (95% CI 0.73-0.80), 0.74 (95% CI 0.70-0.77), and 0.75 (95% CI 0.72-0.77), respectively. In the validation set, the pooled C-index, sensitivity, specificity, and accuracy of ML were 0.73 (95% CI 0.67-0.79), 0.88 (95% CI 0.83-0.91), 0.83 (95% CI 0.71-0.90), and 0.78 (95% CI 0.73-0.82), respectively. Meanwhile, the pooled C-index of DL in the validation set was 0.91 (95% CI 0.88-0.94), with sensitivity, specificity, and accuracy being 0.89 (95% CI 0.85-0.91), 0.91 (95% CI 0.88-0.93), and 0.89 (95% CI 0.86-0.92), respectively. Conclusions: Our systematic review demonstrates promising accuracy of artificial intelligence methods in epilepsy detection. DL appears to offer higher detection accuracy than ML. These findings support the development of DL-based early-warning tools in future research. Trial Registration: PROSPERO CRD42023467260; https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42023467260 %M 39661965 %R 10.2196/55986 %U https://www.jmir.org/2024/1/e55986 %U https://doi.org/10.2196/55986 %U http://www.ncbi.nlm.nih.gov/pubmed/39661965 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e58623 %T Integrating GPT-Based AI into Virtual Patients to Facilitate Communication Training Among Medical First Responders: Usability Study of Mixed Reality Simulation %A Gutiérrez Maquilón,Rodrigo %A Uhl,Jakob %A Schrom-Feiertag,Helmut %A Tscheligi,Manfred %+ Center for Technology Experience, AIT - Austrian Institute of Technology, Giefinggasse 4, Vienna, 1210, Austria, 43 66478588121, rodrigo.gutierrez@ait.ac.at %K medical first responders %K verbal communication skills %K training %K virtual patient %K generative artificial intelligence %K GPT %K large language models %K prompt engineering %K mixed reality %D 2024 %7 11.12.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Training in social-verbal interactions is crucial for medical first responders (MFRs) to assess a patient’s condition and perform urgent treatment during emergency medical service administration. Integrating conversational agents (CAs) in virtual patients (VPs), that is, digital simulations, is a cost-effective alternative to resource-intensive human role-playing. There is moderate evidence that CAs improve communication skills more effectively when used with instructional interventions. However, more recent GPT-based artificial intelligence (AI) produces richer, more diverse, and more natural responses than previous CAs and has control of prosodic voice qualities like pitch and duration. These functionalities have the potential to better match the interaction expectations of MFRs regarding habitability. Objective: We aimed to study how the integration of GPT-based AI in a mixed reality (MR)–VP could support communication training of MFRs. Methods: We developed an MR simulation of a traffic accident with a VP. ChatGPT (OpenAI) was integrated into the VP and prompted with verified characteristics of accident victims. MFRs (N=24) were instructed on how to interact with the MR scenario. After assessing and treating the VP, the MFRs were administered the Mean Opinion Scale-Expanded, version 2, and the Subjective Assessment of Speech System Interfaces questionnaires to study their perception of the voice quality and the usability of the voice interactions, respectively. Open-ended questions were asked after completing the questionnaires. The observed and logged interactions with the VP, descriptive statistics of the questionnaires, and the output of the open-ended questions are reported. Results: The usability assessment of the VP resulted in moderate positive ratings, especially in habitability (median 4.25, IQR 4-4.81) and likeability (median 4.50, IQR 3.97-5.91). Interactions were negatively affected by the approximately 3-second latency of the responses. MFRs acknowledged the naturalness of determining the physiological states of the VP through verbal communication, for example, with questions such as “Where does it hurt?” However, the question-answer dynamic in the verbal exchange with the VP and the lack of the VP’s ability to start the verbal exchange were noticed. Noteworthy insights highlighted the potential of domain-knowledge prompt engineering to steer the actions of MFRs for effective training. Conclusions: Generative AI in VPs facilitates MFRs’ training but continues to rely on instructions for effective verbal interactions. Therefore, the capabilities of the GPT-VP and a training protocol need to be communicated to trainees. Future interactions should implement triggers based on keyword recognition, the VP pointing to the hurting area, conversational turn-taking techniques, and add the ability for the VP to start a verbal exchange. Furthermore, a local AI server, chunk processing, and lowering the audio resolution of the VP’s voice could ameliorate the delay in response and allay privacy concerns. Prompting could be used in future studies to create a virtual MFR capable of assisting trainees. %M 39661979 %R 10.2196/58623 %U https://formative.jmir.org/2024/1/e58623 %U https://doi.org/10.2196/58623 %U http://www.ncbi.nlm.nih.gov/pubmed/39661979 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e60063 %T EyeGPT for Patient Inquiries and Medical Education: Development and Validation of an Ophthalmology Large Language Model %A Chen,Xiaolan %A Zhao,Ziwei %A Zhang,Weiyi %A Xu,Pusheng %A Wu,Yue %A Xu,Mingpu %A Gao,Le %A Li,Yinwen %A Shang,Xianwen %A Shi,Danli %A He,Mingguang %+ School of Optometry, The Hong Kong Polytechnic University, 11 Yuk Choi Road, Hung Hom, KLN, Hong Kong, 999077, China, 852 27664825, danli.shi@polyu.edu.hk %K large language model %K generative pretrained transformer %K generative artificial intelligence %K ophthalmology %K retrieval-augmented generation %K medical assistant %K EyeGPT %K generative AI %D 2024 %7 11.12.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Large language models (LLMs) have the potential to enhance clinical flow and improve medical education, but they encounter challenges related to specialized knowledge in ophthalmology. Objective: This study aims to enhance ophthalmic knowledge by refining a general LLM into an ophthalmology-specialized assistant for patient inquiries and medical education. Methods: We transformed Llama2 into an ophthalmology-specialized LLM, termed EyeGPT, through the following 3 strategies: prompt engineering for role-playing, fine-tuning with publicly available data sets filtered for eye-specific terminology (83,919 samples), and retrieval-augmented generation leveraging a medical database and 14 ophthalmology textbooks. The efficacy of various EyeGPT variants was evaluated by 4 board-certified ophthalmologists through comprehensive use of 120 diverse category questions in both simple and complex question-answering scenarios. The performance of the best EyeGPT model was then compared with that of the unassisted human physician group and the EyeGPT+human group. We proposed 4 metrics for assessment: accuracy, understandability, trustworthiness, and empathy. The proportion of hallucinations was also reported. Results: The best fine-tuned model significantly outperformed the original Llama2 model at providing informed advice (mean 9.30, SD 4.42 vs mean 13.79, SD 5.70; P<.001) and mitigating hallucinations (97/120, 80.8% vs 53/120, 44.2%, P<.001). Incorporating information retrieval from reliable sources, particularly ophthalmology textbooks, further improved the model's response compared with solely the best fine-tuned model (mean 13.08, SD 5.43 vs mean 15.14, SD 4.64; P=.001) and reduced hallucinations (71/120, 59.2% vs 57/120, 47.4%, P=.02). Subgroup analysis revealed that EyeGPT showed robustness across common diseases, with consistent performance across different users and domains. Among the variants, the model integrating fine-tuning and book retrieval ranked highest, closely followed by the combination of fine-tuning and the manual database, standalone fine-tuning, and pure role-playing methods. EyeGPT demonstrated competitive capabilities in understandability and empathy when compared with human ophthalmologists. With the assistance of EyeGPT, the performance of the ophthalmologist was notably enhanced. Conclusions: We pioneered and introduced EyeGPT by refining a general domain LLM and conducted a comprehensive comparison and evaluation of different strategies to develop an ophthalmology-specific assistant. Our results highlight EyeGPT’s potential to assist ophthalmologists and patients in medical settings. %M 39661433 %R 10.2196/60063 %U https://www.jmir.org/2024/1/e60063 %U https://doi.org/10.2196/60063 %U http://www.ncbi.nlm.nih.gov/pubmed/39661433 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52107 %T Early Detection of Dementia in Populations With Type 2 Diabetes: Predictive Analytics Using Machine Learning Approach %A Thanh Phuc,Phan %A Nguyen,Phung-Anh %A Nguyen,Nam Nhat %A Hsu,Min-Huei %A Le,Nguyen Quoc Khanh %A Tran,Quoc-Viet %A Huang,Chih-Wei %A Yang,Hsuan-Chia %A Chen,Cheng-Yu %A Le,Thi Anh Hoa %A Le,Minh Khoi %A Nguyen,Hoang Bac %A Lu,Christine Y %A Hsu,Jason C %+ College of Management, Taipei Medical University, 11F. Biomedical Technology Building, No. 301, Yuantong Rd., Zhonghe Dist, New Taipei, 235, Taiwan, 886 2 66202589 ext 16119, jasonhsu@tmu.edu.tw %K diabetes %K dementia %K machine learning %K prediction model %K TMUCRD %K Taipei Medical University Clinical Research Database %D 2024 %7 11.12.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: The possible association between diabetes mellitus and dementia has raised concerns, given the observed coincidental occurrences. Objective: This study aimed to develop a personalized predictive model, using artificial intelligence, to assess the 5-year and 10-year dementia risk among patients with type 2 diabetes mellitus (T2DM) who are prescribed antidiabetic medications. Methods: This retrospective multicenter study used data from the Taipei Medical University Clinical Research Database, which comprises electronic medical records from 3 hospitals in Taiwan. This study applied 8 machine learning algorithms to develop prediction models, including logistic regression, linear discriminant analysis, gradient boosting machine, light gradient boosting machine, AdaBoost, random forest, extreme gradient boosting, and artificial neural network (ANN). These models incorporated a range of variables, encompassing patient characteristics, comorbidities, medication usage, laboratory results, and examination data. Results: This study involved a cohort of 43,068 patients diagnosed with type 2 diabetes mellitus, which accounted for a total of 1,937,692 visits. For model development and validation, 1,300,829 visits were used, while an additional 636,863 visits were reserved for external testing. The area under the curve of the prediction models range from 0.67 for the logistic regression to 0.98 for the ANNs. Based on the external test results, the model built using the ANN algorithm had the best area under the curve (0.97 for 5-year follow-up period and 0.98 for 10-year follow-up period). Based on the best model (ANN), age, gender, triglyceride, hemoglobin A1c, antidiabetic agents, stroke history, and other long-term medications were the most important predictors. Conclusions: We have successfully developed a novel, computer-aided, dementia risk prediction model that can facilitate the clinical diagnosis and management of patients prescribed with antidiabetic medications. However, further investigation is required to assess the model’s feasibility and external validity. %M 39434474 %R 10.2196/52107 %U https://www.jmir.org/2024/1/e52107 %U https://doi.org/10.2196/52107 %U http://www.ncbi.nlm.nih.gov/pubmed/39434474 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e60650 %T Testing 3 Modalities (Voice Assistant, Chatbot, and Mobile App) to Assist Older African American and Black Adults in Seeking Information on Alzheimer Disease and Related Dementias: Wizard of Oz Usability Study %A Bosco,Cristina %A Shojaei,Fereshtehossadat %A Theisz,Alec Andrew %A Osorio Torres,John %A Cureton,Bianca %A Himes,Anna K %A Jessup,Nenette M %A Barnes,Priscilla A %A Lu,Yvonne %A Hendrie,Hugh C %A Hill,Carl V %A Shih,Patrick C %+ Luddy School of Informatics, Computing, and Engineering, Indiana University, 700 N Woodlawn Ave, Bloomington, IN, 47408, United States, 1 (812) 856 5754, cribosco@iu.edu %K older African American and Black adults %K Alzheimer disease and related dementias %K health literacy %K Wizard of Oz %K voice assistant %K chatbot %K mobile app %K dementia %K geriatric %K aging %K Alzheimer disease %K artificial intelligence %K AI %K mHealth %K digital tools %D 2024 %7 9.12.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Older African American and Black adults are twice as likely to develop Alzheimer disease and related dementias (ADRD) and have the lowest level of ADRD health literacy compared to any other ethnic group in the United States. Low health literacy concerning ADRD negatively impacts African American and Black people in accessing adequate health care. Objective: This study explored how 3 technological modalities—voice assistants, chatbots, and mobile apps—can assist older African American and Black adults in accessing ADRD information to improve ADRD health literacy. By testing each modality independently, the focus could be kept on understanding the unique needs and challenges of this population concerning the use of each modality when accessing ADRD-related information. Methods: Using the Wizard of Oz usability testing method, we assessed the 3 modalities with a sample of 15 older African American and Black adults aged >55 years. The 15 participants were asked to interact with the 3 modalities to search for information on local events happening in their geographical area and search for ADRD-related health information. Results: Our findings revealed that, across the 3 modalities, the content should avoid convoluted and complex language and give the possibility to save, store, and share it to be fully accessible by this population. In addition, content should come from credible sources, including information tailored to the participants’ cultural values, as it has to be culturally relevant for African American and Black communities. Finally, the interaction with the tool must be time efficient, and it should be adapted to the user’s needs to foster a sense of control and representation. Conclusions: We conclude that, when designing ADRD-related interventions for African American and Black older adults, it proves to be crucial to tailor the content provided by the technology to the community’s values and construct an interaction with the technology that is built on African American and Black communities’ needs and demands. %R 10.2196/60650 %U https://formative.jmir.org/2024/1/e60650 %U https://doi.org/10.2196/60650 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e55833 %T Current State of Community-Driven Radiological AI Deployment in Medical Imaging %A Gupta,Vikash %A Erdal,Barbaros %A Ramirez,Carolina %A Floca,Ralf %A Genereaux,Bradley %A Bryson,Sidney %A Bridge,Christopher %A Kleesiek,Jens %A Nensa,Felix %A Braren,Rickmer %A Younis,Khaled %A Penzkofer,Tobias %A Bucher,Andreas Michael %A Qin,Ming Melvin %A Bae,Gigon %A Lee,Hyeonhoon %A Cardoso,M Jorge %A Ourselin,Sebastien %A Kerfoot,Eric %A Choudhury,Rahul %A White,Richard D %A Cook,Tessa %A Bericat,David %A Lungren,Matthew %A Haukioja,Risto %A Shuaib,Haris %+ Mayo Clinic, 4500 San Pablo Rd S, Jacksonville, FL, 32224, United States, 1 904 953 2480, erdal.barbaros@mayo.edu %K radiology %K open-source %K radiology in practice %K deep learning %K artificial intelligence %K imaging informatics %K clinical deployment %K imaging %K medical informatics %K workflow %K operation %K implementation %K adoption %K taxonomy %K use case %K model %K integration %K machine learning %K mobile phone %D 2024 %7 9.12.2024 %9 Viewpoint %J JMIR AI %G English %X Artificial intelligence (AI) has become commonplace in solving routine everyday tasks. Because of the exponential growth in medical imaging data volume and complexity, the workload on radiologists is steadily increasing. AI has been shown to improve efficiency in medical image generation, processing, and interpretation, and various such AI models have been developed across research laboratories worldwide. However, very few of these, if any, find their way into routine clinical use, a discrepancy that reflects the divide between AI research and successful AI translation. The goal of this paper is to give an overview of the intersection of AI and medical imaging landscapes. We also want to inform the readers about the importance of using standards in their radiology workflow and the challenges associated with deploying AI models in the clinical workflow. The main focus of this paper is to examine the existing condition of radiology workflow and identify the challenges hindering the implementation of AI in hospital settings. This report reflects extensive weekly discussions and practical problem-solving expertise accumulated over multiple years by industry experts, imaging informatics professionals, research scientists, and clinicians. To gain a deeper understanding of the requirements for deploying AI models, we introduce a taxonomy of AI use cases, supplemented by real-world instances of AI model integration within hospitals. We will also explain how the need for AI integration in radiology can be addressed using the Medical Open Network for AI (MONAI). MONAI is an open-source consortium for providing reproducible deep learning solutions and integration tools for radiology practice in hospitals. %M 39653370 %R 10.2196/55833 %U https://ai.jmir.org/2024/1/e55833 %U https://doi.org/10.2196/55833 %U http://www.ncbi.nlm.nih.gov/pubmed/39653370 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e55827 %T Evaluation of RMES, an Automated Software Tool Utilizing AI, for Literature Screening with Reference to Published Systematic Reviews as Case-Studies: Development and Usability Study %A Sugiura,Ayaka %A Saegusa,Satoshi %A Jin,Yingzi %A Yoshimoto,Riki %A Smith,Nicholas D %A Dohi,Koji %A Higuchi,Tadashi %A Kozu,Tomotake %+ Deloitte Analytics, Deloitte Tohmatsu Risk Advisory LLC, 3-2-3 Marunouchi, Chiyoda-ku, Tokyo, 100-0005, Japan, 81 80 3456 4991, yingzi.jin@tohmatsu.co.jp %K artificial intelligence %K automated literature screening %K natural language processing %K randomized controlled trials %K Rapid Medical Evidence Synthesis %K RMES %K systematic reviews %K text mining %D 2024 %7 9.12.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Systematic reviews and meta-analyses are important to evidence-based medicine, but the information retrieval and literature screening procedures are burdensome tasks. Rapid Medical Evidence Synthesis (RMES; Deloitte Tohmatsu Risk Advisory LLC) is a software designed to support information retrieval, literature screening, and data extraction for evidence-based medicine. Objective: This study aimed to evaluate the accuracy of RMES for literature screening with reference to published systematic reviews. Methods: We used RMES to automatically screen the titles and abstracts of PubMed-indexed articles included in 12 systematic reviews across 6 medical fields, by applying 4 filters: (1) study type; (2) study type + disease; (3) study type + intervention; and (4) study type + disease + intervention. We determined the numbers of articles correctly included by each filter relative to those included by the authors of each systematic review. Only PubMed-indexed articles were assessed. Results: Across the 12 reviews, the number of articles analyzed by RMES ranged from 46 to 5612. The number of PubMed-cited articles included in the reviews ranged from 4 to 47. The median (range) percentage of articles correctly labeled by RMES using filters 1-4 were: 80.9% (57.1%-100%), 65.2% (34.1%-81.8%), 70.5% (0%-100%), and 58.6% (0%-81.8%), respectively. Conclusions: This study demonstrated good performance and accuracy of RMES for the initial screening of the titles and abstracts of articles for use in systematic reviews. RMES has the potential to reduce the workload involved in the initial screening of published studies. %M 39652380 %R 10.2196/55827 %U https://formative.jmir.org/2024/1/e55827 %U https://doi.org/10.2196/55827 %U http://www.ncbi.nlm.nih.gov/pubmed/39652380 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e67409 %T The Triage and Diagnostic Accuracy of Frontier Large Language Models: Updated Comparison to Physician Performance %A Sorich,Michael Joseph %A Mangoni,Arduino Aleksander %A Bacchi,Stephen %A Menz,Bradley Douglas %A Hopkins,Ashley Mark %+ College of Medicine and Public Health, Flinders University, GPO Box 2100, Adelaide, 5001, Australia, 61 82013217, michael.sorich@flinders.edu.au %K generative artificial intelligence %K large language models %K triage %K diagnosis %K accuracy %K physician %K ChatGPT %K diagnostic %K primary care %K physicians %K prediction %K medical care %K internet %K LLMs %K AI %D 2024 %7 6.12.2024 %9 Research Letter %J J Med Internet Res %G English %X %M 39642373 %R 10.2196/67409 %U https://www.jmir.org/2024/1/e67409 %U https://doi.org/10.2196/67409 %U http://www.ncbi.nlm.nih.gov/pubmed/39642373 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e59045 %T Intersection of Performance, Interpretability, and Fairness in Neural Prototype Tree for Chest X-Ray Pathology Detection: Algorithm Development and Validation Study %A Chen,Hongbo %A Alfred,Myrtede %A Brown,Andrew D %A Atinga,Angela %A Cohen,Eldan %+ Department of Mechanical and Industrial Engineering, University of Toronto, 27 King's College Cir, Toronto, ON, Canada, 1 416 978 4184, ecohen@mie.utoronto.ca %K explainable artificial intelligence %K deep learning %K chest x-ray %K thoracic pathology %K fairness %K interpretability %D 2024 %7 5.12.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: While deep learning classifiers have shown remarkable results in detecting chest X-ray (CXR) pathologies, their adoption in clinical settings is often hampered by the lack of transparency. To bridge this gap, this study introduces the neural prototype tree (NPT), an interpretable image classifier that combines the diagnostic capability of deep learning models and the interpretability of the decision tree for CXR pathology detection. Objective: This study aimed to investigate the utility of the NPT classifier in 3 dimensions, including performance, interpretability, and fairness, and subsequently examined the complex interaction between these dimensions. We highlight both local and global explanations of the NPT classifier and discuss its potential utility in clinical settings. Methods: This study used CXRs from the publicly available Chest X-ray 14, CheXpert, and MIMIC-CXR datasets. We trained 6 separate classifiers for each CXR pathology in all datasets, 1 baseline residual neural network (ResNet)–152, and 5 NPT classifiers with varying levels of interpretability. Performance, interpretability, and fairness were measured using the area under the receiver operating characteristic curve (ROC AUC), interpretation complexity (IC), and mean true positive rate (TPR) disparity, respectively. Linear regression analyses were performed to investigate the relationship between IC and ROC AUC, as well as between IC and mean TPR disparity. Results: The performance of the NPT classifier improved as the IC level increased, surpassing that of ResNet-152 at IC level 15 for the Chest X-ray 14 dataset and IC level 31 for the CheXpert and MIMIC-CXR datasets. The NPT classifier at IC level 1 exhibited the highest degree of unfairness, as indicated by the mean TPR disparity. The magnitude of unfairness, as measured by the mean TPR disparity, was more pronounced in groups differentiated by age (chest X-ray 14 0.112, SD 0.015; CheXpert 0.097, SD 0.010; MIMIC 0.093, SD 0.017) compared to sex (chest X-ray 14 0.054 SD 0.012; CheXpert 0.062, SD 0.008; MIMIC 0.066, SD 0.013). A significant positive relationship between interpretability (ie, IC level) and performance (ie, ROC AUC) was observed across all CXR pathologies (P<.001). Furthermore, linear regression analysis revealed a significant negative relationship between interpretability and fairness (ie, mean TPR disparity) across age and sex subgroups (P<.001). Conclusions: By illuminating the intricate relationship between performance, interpretability, and fairness of the NPT classifier, this research offers insightful perspectives that could guide future developments in effective, interpretable, and equitable deep learning classifiers for CXR pathology detection. %M 39636692 %R 10.2196/59045 %U https://formative.jmir.org/2024/1/e59045 %U https://doi.org/10.2196/59045 %U http://www.ncbi.nlm.nih.gov/pubmed/39636692 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e56195 %T Effectiveness of Virtual Simulations Versus Mannequins and Real Persons in Medical and Nursing Education: Meta-Analysis and Trial Sequential Analysis of Randomized Controlled Trials %A Jiang,Nan %A Zhang,Yuelun %A Liang,Siyu %A Lyu,Xiaohong %A Chen,Shi %A Huang,Xiaoming %A Pan,Hui %+ Department of Endocrinology, Key Laboratory of Endocrinology of National Health Commission, Translation Medicine Centre, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, No.1 Shuaifuyuan,Wangfujing, Dongcheng District, Beijing, 100730, China, 86 13683136205, cspumch@163.com %K artificial intelligence %K clinical virtual simulation %K medical education %K meta-analysis %K nursing education %K virtual patient %K virtual reality %D 2024 %7 5.12.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Virtual simulation (VS) is a developing education approach with the recreation of reality using digital technology. The teaching effectiveness of VSs compared to mannequins and real persons (RPs) has never been investigated in medical and nursing education. Objective: This study aims to compare VSs and mannequins or RPs in improving the following clinical competencies: knowledge, procedural skills, clinical reasoning, and communication skills. Methods: Following Cochrane methodology, a meta-analysis was conducted on the effectiveness of VSs in pre- and postregistration medical or nursing participants. The Cochrane Library, PubMed, Embase, and Educational Resource Information Centre databases were searched to identify English-written randomized controlled trials up to August 2024. Two authors independently selected studies, extracted data, and assessed the risk of bias. All pooled estimates were based on random-effects models and assessed by trial sequential analyses. Leave-one-out, subgroup, and univariate meta-regression analyses were performed to explore sources of heterogeneity. Results: A total of 27 studies with 1480 participants were included. Overall, there were no significant differences between VSs and mannequins or RPs in improving knowledge (standard mean difference [SMD]=0.08; 95% CI –0.30 to 0.47; I2=67%; P=.002), procedural skills (SMD=–0.12; 95% CI –0.47 to 0.23; I2=75%; P<.001), clinical reasoning (SMD=0.29; 95% CI –0.26 to 0.85; I2=88%; P<.001), and communication skills (SMD=–0.02; 95% CI: –0.62 to 0.58; I2=86%; P<.001). Trial sequential analysis for clinical reasoning indicated an insufficient sample size for a definitive judgment. For procedural skills, subgroup analyses showed that VSs were less effective among nursing participants (SMD=–0.55; 95% CI –1.07 to –0.03; I2=69%; P=.04). Univariate meta-regression detected a positive effect of publication year (β=.09; P=.02) on communication skill scores. Conclusions: Given favorable cost-utility plus high flexibility regarding time and space, VSs are viable alternatives to traditional face-to-face learning modalities. The comparative effectiveness of VSs deserves to be followed up with the emergence of new technology. In addition, further investigation of VSs with different design features will provide novel insights to drive education reform. Trial Registration: PROSPERO CRD42023466622; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=466622 %M 39636688 %R 10.2196/56195 %U https://www.jmir.org/2024/1/e56195 %U https://doi.org/10.2196/56195 %U http://www.ncbi.nlm.nih.gov/pubmed/39636688 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 7 %N %P e55681 %T Evaluating the Prognostic and Clinical Validity of the Fall Risk Score Derived From an AI-Based mHealth App for Fall Prevention: Retrospective Real-World Data Analysis %A Alves,Sónia A %A Temme,Steffen %A Motamedi,Seyedamirhosein %A Kura,Marie %A Weber,Sebastian %A Zeichen,Johannes %A Pommer,Wolfgang %A Baumgart,André %K falls %K older adults %K mHealth %K prognostic tool %K clinical validity %K AI %K mobile health %K artificial intelligence %D 2024 %7 4.12.2024 %9 %J JMIR Aging %G English %X Background: Falls pose a significant public health concern, with increasing occurrence due to the aging population, and they are associated with high mortality rates and risks such as multimorbidity and frailty. Falls not only lead to physical injuries but also have detrimental psychological and social consequences, negatively impacting quality of life. Identifying individuals at high risk for falls is crucial, particularly for those aged ≥60 years and living in residential care settings; current professional guidelines favor personalized, multifactorial fall risk assessment approaches for effective fall prevention. Objective: This study aimed to explore the prognostic validity of the Fall Risk Score (FRS), a multifactorial-based metric to assess fall risk (using longitudinal real-world data), and establish the clinical relevance of the FRS by identifying threshold values and the minimum clinically important differences. Methods: This retrospective cohort study involved 617 older adults (857 observations: 615 of women, 242 of men; mean age 83.3, SD 8.7 years; mean gait speed 0.49, SD 0.19 m/s; 622 using walking aids) residing in German residential care facilities and used the LINDERA mobile health app for fall risk assessment. The study focused on the association between FRS at the initial assessment (T1) and the normalized number of falls at follow-up (T2). A quadratic regression model and Spearman correlation analysis were utilized to analyze the data, supported by descriptive statistics and subgroup analyses. Results: The quadratic model exhibited the lowest root mean square error (0.015), and Spearman correlation analysis revealed that a higher FRS at T1 was linked to an increased number of falls at T2 (ρ=0.960, P<.001). Subgroups revealed significant strong correlations between FRS at T1 and falls at T2, particularly for older adults with slower gait speeds (ρ=0.954, P<.001) and those using walking aids (ρ=0.955, P<.001). Threshold values revealed that an FRS of 45%, 32%, and 24% corresponded to the expectation of a fall within 6, 12, and 24 months, respectively. Distribution-based minimum clinically important difference values were established, providing ranges for small, medium, and large effect sizes for FRS changes. Conclusions: The FRS exhibits good prognostic validity for predicting future falls, particularly in specific subgroups. The findings support a stratified fall risk assessment approach and emphasize the significance of early and personalized intervention. This study contributes to the knowledge base on fall risk, despite limitations such as demographic focus and potential assessment interval variability. %R 10.2196/55681 %U https://aging.jmir.org/2024/1/e55681 %U https://doi.org/10.2196/55681 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e60380 %T Application of Chatbots to Help Patients Self-Manage Diabetes: Systematic Review and Meta-Analysis %A Wu,Yibo %A Zhang,Jinzi %A Ge,Pu %A Duan,Tingyu %A Zhou,Junyu %A Wu,Yiwei %A Zhang,Yuening %A Liu,Siyu %A Liu,Xinyi %A Wan,Erya %A Sun,Xinying %+ School of Public Health, Peking University, 38 Xueyuan Road, Haidian District, Beijing, 100191, China, 86 13691212050, xysun@bjmu.edu.cn %K artificial intelligence %K chatbot %K diabetes %K health education %K self-management %K systematic review %D 2024 %7 3.12.2024 %9 Review %J J Med Internet Res %G English %X Background: The number of people with diabetes is on the rise globally. Self-management and health education of patients are the keys to control diabetes. With the development of digital therapies and artificial intelligence, chatbots have the potential to provide health-related information and improve accessibility and effectiveness in the field of patient self-management. Objective: This study systematically reviews the current research status and effectiveness of chatbots in the field of diabetes self-management to support the development of diabetes chatbots. Methods: A systematic review and meta-analysis of chatbots that can help patients with diabetes with self-management was conducted. PubMed and Web of Science databases were searched using keywords around diabetes, chatbots, conversational agents, virtual assistants, and more. The search period was from the date of creation of the databases to January 1, 2023. Research articles in English that fit the study topic were selected, and articles that did not fit the study topic or were not available in full text were excluded. Results: In total, 25 studies were included in the review. In terms of study type, all articles could be classified as systematic design studies (n=8, 32%), pilot studies (n=8, 32%), and intervention studies (n=9, 36%). Many articles adopted a nonrandomized controlled trial design in intervention studies (n=6, 24%), and there was only 1 (4%) randomized controlled trial. In terms of research strategy, all articles can be divided into quantitative studies (n=10, 40%), mixed studies (n=6, 24%), and qualitative studies (n=1, 4%). The evaluation criteria for chatbot effectiveness can be divided into technical performance evaluation, user experience evaluation, and user health evaluation. Most chatbots (n=17, 68%) provided education and management focused on patient diet, exercise, glucose monitoring, medications, and complications, and only a few studies (n=2, 8%) provided education on mental health. The meta-analysis found that the chatbot intervention was effective in lowering blood glucose (mean difference 0.30, 95% CI 0.04-0.55; P=.02) and had no significant effect in reducing weight (mean difference 1.41, 95% CI –2.29 to 5.11; P=.46) compared with the baseline. Conclusions: Chatbots have potential for the development of self-management for people with diabetes. However, the evidence level of current research is low, and higher level research (such as randomized controlled trials) is needed to strengthen the evidence base. More use of mixed research in the research strategy is needed to fully use the strengths of both quantitative and qualitative research. Appropriate and innovative theoretical frameworks should be used in the research to provide theoretical support for the study. In addition, researchers should focus on the personalized and user-friendly interactive features of chatbots, as well as improvements in study design. %M 39626235 %R 10.2196/60380 %U https://www.jmir.org/2024/1/e60380 %U https://doi.org/10.2196/60380 %U http://www.ncbi.nlm.nih.gov/pubmed/39626235 %0 Journal Article %@ 2561-6722 %I JMIR Publications %V 7 %N %P e64669 %T Parental Assessment of Postsurgical Pain in Infants at Home Using Artificial Intelligence–Enabled and Observer-Based Tools: Construct Validity and Clinical Utility Evaluation Study %A Sada,Fatos %A Chivers,Paola %A Cecelia,Sokol %A Statovci,Sejdi %A Ukperaj,Kujtim %A Hughes,Jeffery %A Hoti,Kreshnik %+ Faculty of Medicine, University of Prishtina, 31 George Bush St, Prishtina, 10000, Kosovo, 383 44945173, kreshnik.hoti@uni-pr.edu %K PainChek Infant %K Observer-Administered Visual Analog Scale %K parents %K infant pain %K pain assessment %K circumcision %K infant home assessment %K clinical utility %K construct validity %K artificial intelligence %D 2024 %7 3.12.2024 %9 Original Paper %J JMIR Pediatr Parent %G English %X Background: Pain assessment in the infant population is challenging owing to their inability to verbalize and hence self-report pain. Currently, there is a paucity of data on how parents identify and manage this pain at home using standardized pain assessment tools. Objective: This study aimed to explore parents’ assessment and intervention of pain in their infants at home following same-day surgery, using standardized pain assessment tools. Methods: This prospective study initially recruited 109 infant boys undergoing circumcision (same-day surgery). To assess pain at home over 3 days after surgery, parents using iOS devices were assigned to use the PainChek Infant tool, which is a point-of-care artificial intelligence–enabled tool, while parents using Android devices were assigned to use the Observer-Administered Visual Analog Scale (ObsVAS) tool. Chi-square analysis compared the intervention undertaken and pain presence. Generalized estimating equations were used to evaluate outcomes related to construct validity and clinical utility. Receiver operating characteristic analysis assessed pain score cutoffs in relation to the intervention used. Results: A total of 69 parents completed postsurgery pain assessments at home and returned their pain diaries. Of these 69 parents, 24 used ObsVAS and 45 used PainChek Infant. Feeding alone and feeding with medication were the most common pain interventions. Pain presence over time reduced. In the presence of pain, an intervention was likely to be administered (χ22=21.4; P<.001), with a medicinal intervention being 12.6 (95% CI 4.3-37.0; P<.001) times more likely and a nonmedicinal intervention being 5.2 (95% CI 1.8-14.6; P=.002) times more likely than no intervention. In the presence of intervention, score cutoff values were ≥2 for PainChek Infant and ≥20 for ObsVAS. A significant effect between the use of the pain instrument (χ21=7.2, P=.007) and intervention (χ22=43.4, P<.001) was found, supporting the construct validity of both instruments. Standardized pain scores were the highest when a medicinal intervention was undertaken (estimated marginal mean [EMM]=34.2%), followed by a nonmedicinal intervention (EMM=23.5%) and no intervention (EMM=11.2%). Similar trends were seen for both pain instruments. Pain was reduced in 94.5% (224/237) of assessments where parents undertook an intervention. In 75.1% (178/237) of assessments indicative of pain, the score changed from pain to no pain, with PainChek Infant assessments more likely to report this change (odds ratio 4.1, 95% CI 1.4-12.3) compared with ObsVAS assessments. Conclusions: The use of standardized pain assessment instruments by parents at home to assess pain in their infants can inform their decision-making regarding pain identification and management, including determining the effectiveness of the chosen intervention. In addition to the construct validity and clinical utility of PainChek Infant and ObsVAS in this setting, feeding alone and a combination of feeding with medication use were the key pain intervention strategies used by parents. %M 39626240 %R 10.2196/64669 %U https://pediatrics.jmir.org/2024/1/e64669 %U https://doi.org/10.2196/64669 %U http://www.ncbi.nlm.nih.gov/pubmed/39626240 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e56874 %T Predicting and Monitoring Symptoms in Patients Diagnosed With Depression Using Smartphone Data: Observational Study %A Ikäheimonen,Arsi %A Luong,Nguyen %A Baryshnikov,Ilya %A Darst,Richard %A Heikkilä,Roope %A Holmen,Joel %A Martikkala,Annasofia %A Riihimäki,Kirsi %A Saleva,Outi %A Isometsä,Erkki %A Aledavood,Talayeh %+ Department of Computer Science, Aalto University, Konemiehentie 2, Espoo, 02150, Finland, 358 449750110, arsi.ikaheimonen@aalto.fi %K data analysis %K digital phenotyping %K digital behavioral data %K depression symptoms %K depression monitoring %K mHealth %K mobile health %K smartphone %K mobile phone %D 2024 %7 3.12.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Clinical diagnostic assessments and the outcome monitoring of patients with depression rely predominantly on interviews by professionals and the use of self-report questionnaires. The ubiquity of smartphones and other personal consumer devices has prompted research into the potential of data collected via these devices to serve as digital behavioral markers for indicating the presence and monitoring of the outcome of depression. Objective: This paper explores the potential of using behavioral data collected with smartphones to detect and monitor depression symptoms in patients diagnosed with depression. Specifically, it investigates whether this data can accurately classify the presence of depression, as well as monitor the changes in depressive states over time. Methods: In a prospective cohort study, we collected smartphone behavioral data for up to 1 year. The study consists of observations from 164 participants, including healthy controls (n=31) and patients diagnosed with various depressive disorders: major depressive disorder (MDD; n=85), MDD with comorbid borderline personality disorder (n=27), and major depressive episodes with bipolar disorder (n=21). Data were labeled based on depression severity using 9-item Patient Health Questionnaire (PHQ-9) scores. We performed statistical analysis and used supervised machine learning on the data to classify the severity of depression and observe changes in the depression state over time. Results: Our correlation analysis revealed 32 behavioral markers associated with the changes in depressive state. Our analysis classified patients who are depressed with an accuracy of 82% (95% CI 80%-84%) and change in the presence of depression with an accuracy of 75% (95% CI 72%-76%). Notably, the most important smartphone features for classifying depression states were screen-off events, battery charge levels, communication patterns, app usage, and location data. Similarly, for predicting changes in depression state, the most important features were related to location, battery level, screen, and accelerometer data patterns. Conclusions: The use of smartphone digital behavioral markers to supplement clinical evaluations may aid in detecting the presence and changes in severity of symptoms of depression, particularly if combined with intermittent use of self-report of symptoms. %M 39626241 %R 10.2196/56874 %U https://www.jmir.org/2024/1/e56874 %U https://doi.org/10.2196/56874 %U http://www.ncbi.nlm.nih.gov/pubmed/39626241 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e54966 %T Using Artificial Intelligence to Detect Risk of Family Violence: Protocol for a Systematic Review and Meta-Analysis %A de Boer,Kathleen %A Mackelprang,Jessica L %A Nedeljkovic,Maja %A Meyer,Denny %A Iyer,Ravi %+ Department of Psychological Sciences, Swinburne University of Technology, 34 Wakefield Street, Hawthorn, 3122, Australia, 61 9214 3836, kdeboer@swin.edu.au %K family violence %K artificial intelligence %K natural language processing %K voice signal characteristics %K public health %K behaviors %K research literature %K policy %K prevalence %K detection %K social policy %K prevention %K machine learning %K mental health %K suicide risk %K psychological distress %D 2024 %7 2.12.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Despite the implementation of prevention strategies, family violence continues to be a prevalent issue worldwide. Current strategies to reduce family violence have demonstrated mixed success and innovative approaches are needed urgently to prevent the occurrence of family violence. Incorporating artificial intelligence (AI) into prevention strategies is gaining research attention, particularly the use of textual or voice signal data to detect individuals at risk of perpetrating family violence. However, no review to date has collated extant research regarding how accurate AI is at identifying individuals who are at risk of perpetrating family violence. Objective: The primary aim of this systematic review and meta-analysis is to assess the accuracy of AI models in differentiating between individuals at risk of engaging in family violence versus those who are not using textual or voice signal data. Methods: The following databases will be searched from conception to the search date: IEEE Xplore, PubMed, PsycINFO, EBSCOhost (Psychology and Behavioral Sciences collection), and Computers and Applied Sciences Complete. ProQuest Dissertations and Theses A&I will also be used to search the grey literature. Studies will be included if they report on human adults and use machine learning to differentiate between low and high risk of family violence perpetration. Studies may use voice signal data or linguistic (textual) data and must report levels of accuracy in determining risk. In the data screening and full-text review and quality analysis phases, 2 researchers will review the search results and discrepancies and decisions will be resolved through masked review of a third researcher. Results will be reported in a narrative synthesis. In addition, a random effects meta-analysis will be conducted using the area under the receiver operating curve reported in the included studies, assuming sufficient eligible studies are identified. Methodological quality of included studies will be assessed using the risk of bias tool in nonrandomized studies of interventions. Results: As of October 2024, the search has not commenced. The review will document the state of the research concerning the accuracy of AI models in detecting the risk of family violence perpetration using textual or voice signal data. Results will be presented in the form of a narrative synthesis. Results of the meta-analysis will be summarized in tabular form and using a forest plot. Conclusions: The findings from this study will clarify the state of the literature on the accuracy of machine learning models to identify individuals who are at high risk of perpetuating family violence. Findings may be used to inform the development of AI and machine learning models that can be used to support possible prevention strategies. Trial Registration: PROSPERO CRD42023481174; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=481174 International Registered Report Identifier (IRRID): PRR1-10.2196/54966 %M 39621402 %R 10.2196/54966 %U https://www.researchprotocols.org/2024/1/e54966 %U https://doi.org/10.2196/54966 %U http://www.ncbi.nlm.nih.gov/pubmed/39621402 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e59249 %T Analyzing Patient Experience on Weibo: Machine Learning Approach to Topic Modeling and Sentiment Analysis %A Chen,Xiao %A Shen,Zhiyun %A Guan,Tingyu %A Tao,Yuchen %A Kang,Yichen %A Zhang,Yuxia %K patient experience %K experience %K attitude %K opinion %K perception %K perspective %K machine learning %K natural language process %K NLP %K social media %K free-text %K unstructured %K Weibo %K spatiotemporal %K topic modeling %K sentiment %D 2024 %7 29.11.2024 %9 %J JMIR Med Inform %G English %X Background: Social media platforms allow individuals to openly gather, communicate, and share information about their interactions with health care services, becoming an essential supplemental means of understanding patient experience. Objective: We aimed to identify common discussion topics related to health care experience from the public’s perspective and to determine areas of concern from patients’ perspectives that health care providers should act on. Methods: This study conducted a spatiotemporal analysis of the volume, sentiment, and topic of patient experience–related posts on the Weibo platform developed by Sina Corporation. We applied a supervised machine learning approach including human annotation and machine learning–based models for topic modeling and sentiment analysis of the public discourse. A multiclassifier voting method based on logistic regression, multinomial naïve Bayes, and random forest was used. Results: A total of 4008 posts were manually classified into patient experience topics. A patient experience theme framework was developed. The accuracy, precision, recall, and F-measure of the method integrating logistic regression, multinomial naïve Bayes, and random forest for patient experience themes were 0.93, 0.95, 0.80, 0.77, and 0.84, respectively, indicating a satisfactory prediction. The sentiment analysis revealed that negative sentiment posts constituted the highest proportion (3319/4008, 82.81%). Twenty patient experience themes were discussed on the social media platform. The majority of the posts described the interpersonal aspects of care (2947/4008, 73.53%); the five most frequently discussed topics were “health care professionals’ attitude,” “access to care,” “communication, information, and education,” “technical competence,” and “efficacy of treatment.” Conclusions: Hospital administrators and clinicians should consider the value of social media and pay attention to what patients and their family members are communicating on social media. To increase the utility of these data, a machine learning algorithm can be used for topic modeling. The results of this study highlighted the interpersonal and functional aspects of care, especially the interpersonal aspects, which are often the “moment of truth” during a service encounter in which patients make a critical evaluation of hospital services. %R 10.2196/59249 %U https://medinform.jmir.org/2024/1/e59249 %U https://doi.org/10.2196/59249 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e63262 %T Moving Toward Meaningful Evaluations of Monitoring in e-Mental Health Based on the Case of a Web-Based Grief Service for Older Mourners: Mixed Methods Study %A Brandl,Lena %A Jansen-Kosterink,Stephanie %A Brodbeck,Jeannette %A Jacinto,Sofia %A Mooser,Bettina %A Heylen,Dirk %K e-mental health %K digital mental health service %K mental health %K digital health %K internet intervention %K monitoring mental health %K monitor %K e-coach %K coaching %K grieve %K mourn %K old %K affective states %K artificial intelligence %K predictive %K repeatedly measured predictors in regression %K fuzzy cognitive map %K algorithm %K AI %D 2024 %7 28.11.2024 %9 %J JMIR Form Res %G English %X Background: Artificial intelligence (AI) tools hold much promise for mental health care by increasing the scalability and accessibility of care. However, current development and evaluation practices of AI tools limit their meaningfulness for health care contexts and therefore also the practical usefulness of such tools for professionals and clients alike. Objective: The aim of this study is to demonstrate the evaluation of an AI monitoring tool that detects the need for more intensive care in a web-based grief intervention for older mourners who have lost their spouse, with the goal of moving toward meaningful evaluation of AI tools in e-mental health. Method: We leveraged the insights from three evaluation approaches: (1) the F1-score evaluated the tool’s capacity to classify user monitoring parameters as either in need of more intensive support or recommendable to continue using the web-based grief intervention as is; (2) we used linear regression to assess the predictive value of users’ monitoring parameters for clinical changes in grief, depression, and loneliness over the course of a 10-week intervention; and (3) we collected qualitative experience data from e-coaches (N=4) who incorporated the monitoring in their weekly email guidance during the 10-week intervention. Results: Based on n=174 binary recommendation decisions, the F1-score of the monitoring tool was 0.91. Due to minimal change in depression and loneliness scores after the 10-week intervention, only 1 linear regression was conducted. The difference score in grief before and after the intervention was included as a dependent variable. Participants’ (N=21) mean score on the self-report monitoring and the estimated slope of individually fitted growth curves and its standard error (ie, participants’ response pattern to the monitoring questions) were used as predictors. Only the mean monitoring score exhibited predictive value for the observed change in grief (R2=1.19, SE 0.33; t16=3.58, P=.002). The e-coaches appreciated the monitoring tool as an opportunity to confirm their initial impression about intervention participants, personalize their email guidance, and detect when participants’ mental health deteriorated during the intervention. Conclusions: The monitoring tool evaluated in this paper identified a need for more intensive support reasonably well in a nonclinical sample of older mourners, had some predictive value for the change in grief symptoms during a 10-week intervention, and was appreciated as an additional source of mental health information by e-coaches who supported mourners during the intervention. Each evaluation approach in this paper came with its own set of limitations, including (1) skewed class distributions in prediction tasks based on real-life health data and (2) choosing meaningful statistical analyses based on clinical trial designs that are not targeted at evaluating AI tools. However, combining multiple evaluation methods facilitates drawing meaningful conclusions about the clinical value of AI monitoring tools for their intended mental health context. %R 10.2196/63262 %U https://formative.jmir.org/2024/1/e63262 %U https://doi.org/10.2196/63262 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e64380 %T Public Perception on Artificial Intelligence–Driven Mental Health Interventions: Survey Research %A Varghese,Mahima Anna %A Sharma,Poonam %A Patwardhan,Maitreyee %+ Department of Social Science and Language, Vellore Institute of Technology, Vellore Campus, Tiruvalam Road, Vellore, 632014, India, 91 9702872251, poonam.sharma@vit.ac.in %K public perception %K artificial intelligence %K AI %K AI-driven %K human-driven %K mental health inteventions %K mental health stigma %K trust in AI %K public perception %K digital health %K India %K mobile phone %D 2024 %7 28.11.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Artificial intelligence (AI) has become increasingly important in health care, generating both curiosity and concern. With a doctor-patient ratio of 1:834 in India, AI has the potential to alleviate a significant health care burden. Public perception plays a crucial role in shaping attitudes that can facilitate the adoption of new technologies. Similarly, the acceptance of AI-driven mental health interventions is crucial in determining their effectiveness and widespread adoption. Therefore, it is essential to study public perceptions and usage of existing AI-driven mental health interventions by exploring user experiences and opinions on their future applicability, particularly in comparison to traditional, human-based interventions. Objective: This study aims to explore the use, perception, and acceptance of AI-driven mental health interventions in comparison to traditional, human-based interventions. Methods: A total of 466 adult participants from India voluntarily completed a 30-item web-based survey on the use and perception of AI-based mental health interventions between November and December 2023. Results: Of the 466 respondents, only 163 (35%) had ever consulted a mental health professional. Additionally, 305 (65.5%) reported very low knowledge of AI-driven interventions. In terms of trust, 247 (53%) expressed a moderate level of Trust in AI-Driven Mental Health Interventions, while only 24 (5.2%) reported a high level of trust. By contrast, 114 (24.5%) reported high trust and 309 (66.3%) reported moderate Trust in Human-Based Mental Health Interventions; 242 (51.9%) participants reported a high level of stigma associated with using human-based interventions, compared with only 50 (10.7%) who expressed concerns about stigma related to AI-driven interventions. Additionally, 162 (34.8%) expressed a positive outlook toward the future use and social acceptance of AI-based interventions. The majority of respondents indicated that AI could be a useful option for providing general mental health tips and conducting initial assessments. The key benefits of AI highlighted by participants were accessibility, cost-effectiveness, 24/7 availability, and reduced stigma. Major concerns included data privacy, security, the lack of human touch, and the potential for misdiagnosis. Conclusions: There is a general lack of awareness about AI-driven mental health interventions. However, AI shows potential as a viable option for prevention, primary assessment, and ongoing mental health maintenance. Currently, people tend to trust traditional mental health practices more. Stigma remains a significant barrier to accessing traditional mental health services. Currently, the human touch remains an indispensable aspect of human-based mental health care, one that AI cannot replace. However, integrating AI with human mental health professionals is seen as a compelling model. AI is positively perceived in terms of accessibility, availability, and destigmatization. Knowledge and perceived trustworthiness are key factors influencing the acceptance and effectiveness of AI-driven mental health interventions. %M 39607994 %R 10.2196/64380 %U https://formative.jmir.org/2024/1/e64380 %U https://doi.org/10.2196/64380 %U http://www.ncbi.nlm.nih.gov/pubmed/39607994 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54557 %T Artificial Intelligence Applications to Measure Food and Nutrient Intakes: Scoping Review %A Zheng,Jiakun %A Wang,Junjie %A Shen,Jing %A An,Ruopeng %+ School of Economics and Management, Shanghai University of Sport, 399 Changhai Road, Yangpu District, Shanghai, 200438, China, 86 13817507993, zhengjiakun07@163.com %K food %K nutrient %K diet %K artificial intelligence %K machine learning %K deep learning %K neural networks %K computer vision %K natural language processing %K measurement %K AI %K food intake %K systematic literature %K dietary assessments %K AI-based %K disease management %K mobile phone %D 2024 %7 28.11.2024 %9 Review %J J Med Internet Res %G English %X Background: Accurate measurement of food and nutrient intake is crucial for nutrition research, dietary surveillance, and disease management, but traditional methods such as 24-hour dietary recalls, food diaries, and food frequency questionnaires are often prone to recall error and social desirability bias, limiting their reliability. With the advancement of artificial intelligence (AI), there is potential to overcome these limitations through automated, objective, and scalable dietary assessment techniques. However, the effectiveness and challenges of AI applications in this domain remain inadequately explored. Objective: This study aimed to conduct a scoping review to synthesize existing literature on the efficacy, accuracy, and challenges of using AI tools in assessing food and nutrient intakes, offering insights into their current advantages and areas of improvement. Methods: This review followed the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. A comprehensive literature search was conducted in 4 databases—PubMed, Web of Science, Cochrane Library, and EBSCO—covering publications from the databases’ inception to June 30, 2023. Studies were included if they used modern AI approaches to assess food and nutrient intakes in human subjects. Results: The 25 included studies, published between 2010 and 2023, involved sample sizes ranging from 10 to 38,415 participants. These studies used a variety of input data types, including food images (n=10), sound and jaw motion data from wearable devices (n=9), and text data (n=4), with 2 studies combining multiple input types. AI models applied included deep learning (eg, convolutional neural networks), machine learning (eg, support vector machines), and hybrid approaches. Applications were categorized into dietary intake assessment, food detection, nutrient estimation, and food intake prediction. Food detection accuracies ranged from 74% to 99.85%, and nutrient estimation errors varied between 10% and 15%. For instance, the RGB-D (Red, Green, Blue-Depth) fusion network achieved a mean absolute error of 15% in calorie estimation, and a sound-based classification model reached up to 94% accuracy in detecting food intake based on jaw motion and chewing patterns. In addition, AI-based systems provided real-time monitoring capabilities, improving the precision of dietary assessments and demonstrating the potential to reduce recall bias typically associated with traditional self-report methods. Conclusions: While AI demonstrated significant advantages in improving accuracy, reducing labor, and enabling real-time monitoring, challenges remain in adapting to diverse food types, ensuring algorithmic fairness, and addressing data privacy concerns. The findings suggest that AI has transformative potential for dietary assessment at both individual and population levels, supporting precision nutrition and chronic disease management. Future research should focus on enhancing the robustness of AI models across diverse dietary contexts and integrating biological sensors for a holistic dietary assessment approach. %M 39608003 %R 10.2196/54557 %U https://www.jmir.org/2024/1/e54557 %U https://doi.org/10.2196/54557 %U http://www.ncbi.nlm.nih.gov/pubmed/39608003 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 7 %N %P e58980 %T Enhancing Frailty Assessments for Transcatheter Aortic Valve Replacement Patients Using Structured and Unstructured Data: Real-World Evidence Study %A Mardini,Mamoun T %A Bai,Chen %A Bavry,Anthony A %A Zaghloul,Ahmed %A Anderson,R David %A Price,Catherine E Crenshaw %A Al-Ani,Mohammad A Z %K transcatheter aortic valve replacement %K frailty %K cardiology %K machine learning %K TAVR %K minimally invasive surgery %K cardiac surgery %K real-world data %K topic modeling %K clinical notes %K electronic health record %K EHR %D 2024 %7 27.11.2024 %9 %J JMIR Aging %G English %X Background: Transcatheter aortic valve replacement (TAVR) is a commonly used treatment for severe aortic stenosis. As degenerative aortic stenosis is primarily a disease afflicting older adults, a frailty assessment is essential to patient selection and optimal periprocedural outcomes. Objective: This study aimed to enhance frailty assessments of TAVR candidates by integrating real-world structured and unstructured data. Methods: This study analyzed data from 14,000 patients between January 2018 and December 2019 to assess frailty in TAVR patients at the University of Florida. Frailty was identified using the Fried criteria, which includes weight loss, exhaustion, walking speed, grip strength, and physical activity. Latent Dirichlet allocation for topic modeling and Extreme Gradient Boosting for frailty prediction were applied to unstructured clinical notes and structured electronic health record (EHR) data. We also used least absolute shrinkage and selection operator regression for feature selection. Model performance was rigorously evaluated using nested cross-validation, ensuring the generalizability of the findings. Results: Model performance was significantly improved by combining unstructured clinical notes with structured EHR data, achieving an area under the receiver operating characteristic curve of 0.82 (SD 0.07), which surpassed the EHR-only model’s area under the receiver operating characteristic curve of 0.64 (SD 0.08). The Shapley Additive Explanations analysis found that congestive heart failure management, back problems, and atrial fibrillation were the top frailty predictors. Additionally, the latent Dirichlet allocation topic modeling identified 7 key topics, highlighting the role of specific medical treatments in predicting frailty. Conclusions: Integrating unstructured clinical notes and structured EHR data led to a notable enhancement in predicting frailty. This method shows great potential for standardizing frailty assessments using real-world data and improving patient selection for TAVR. %R 10.2196/58980 %U https://aging.jmir.org/2024/1/e58980 %U https://doi.org/10.2196/58980 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54641 %T Development and Validation of a Prediction Model Using Sella Magnetic Resonance Imaging–Based Radiomics and Clinical Parameters for the Diagnosis of Growth Hormone Deficiency and Idiopathic Short Stature: Cross-Sectional, Multicenter Study %A Song,Kyungchul %A Ko,Taehoon %A Chae,Hyun Wook %A Oh,Jun Suk %A Kim,Ho-Seong %A Shin,Hyun Joo %A Kim,Jeong-Ho %A Na,Ji-Hoon %A Park,Chae Jung %A Sohn,Beomseok %+ Department of Radiology, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81, Irwon-ro, Gangnam-gu, Seoul, 06351, Republic of Korea, 82 1049045034, Beomseoksohn@gmail.com %K dwarfism %K pituitary %K idiopathic short stature %K child %K adolescent %K machine learning %K magnetic resonance imaging %K MRI %D 2024 %7 27.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Growth hormone deficiency (GHD) and idiopathic short stature (ISS) are the major etiologies of short stature in children. For the diagnosis of GHD and ISS, meticulous evaluations are required, including growth hormone provocation tests, which are invasive and burdensome for children. Additionally, sella magnetic resonance imaging (MRI) is necessary for assessing etiologies of GHD, which cannot evaluate hormonal secretion. Recently, radiomics has emerged as a revolutionary technique that uses mathematical algorithms to extract various features for the quantitative analysis of medical images. Objective: This study aimed to develop a machine learning–based model using sella MRI–based radiomics and clinical parameters to diagnose GHD and ISS. Methods: A total of 293 children with short stature who underwent sella MRI and growth hormone provocation tests were included in the training set, and 47 children who met the same inclusion criteria were enrolled in the test set from different hospitals for this study. A total of 186 radiomic features were extracted from the pituitary glands using a semiautomatic segmentation process for both the T2-weighted and contrast-enhanced T1-weighted image. The clinical parameters included auxological data, insulin-like growth factor-I, and bone age. The extreme gradient boosting algorithm was used to train the prediction models. Internal validation was conducted using 5-fold cross-validation on the training set, and external validation was conducted on the test set. Model performance was assessed by plotting the area under the receiver operating characteristic curve. The mean absolute Shapley values were computed to quantify the impact of each parameter. Results: The area under the receiver operating characteristic curves (95% CIs) of the clinical, radiomics, and combined models were 0.684 (0.590-0.778), 0.691 (0.620-0.762), and 0.830 (0.741-0.919), respectively, in the external validation. Among the clinical parameters, the major contributing factors to prediction were BMI SD score (SDS), chronological age–bone age, weight SDS, growth velocity, and insulin-like growth factor-I SDS in the clinical model. In the combined model, radiomic features including maximum probability from a T2-weighted image and run length nonuniformity normalized from a T2-weighted image added incremental value to the prediction (combined model vs clinical model, P=.03; combined model vs radiomics model, P=.02). The code for our model is available in a public repository on GitHub. Conclusions: Our model combining both radiomics and clinical parameters can accurately predict GHD from ISS, which was also proven in the external validation. These findings highlight the potential of machine learning–based models using radiomics and clinical parameters for diagnosing GHD and ISS. %M 39602803 %R 10.2196/54641 %U https://www.jmir.org/2024/1/e54641 %U https://doi.org/10.2196/54641 %U http://www.ncbi.nlm.nih.gov/pubmed/39602803 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e53986 %T A Taxonomy and Archetypes of AI-Based Health Care Services: Qualitative Study %A Blaß,Marlene %A Gimpel,Henner %A Karnebogen,Philip %+ FIM Research Center for Information Management, University of Hohenheim, Branch Business & Information Systems Engineering of the Fraunhofer FIT, Schloss Hohenheim 1, Stuttgart, 70599, Germany, 49 0711 459 24051, marlene.blass@fit.fraunhofer.de %K healthcare %K artificial intelligence %K AI %K taxonomy %K services %K cluster analysis %K archetypes %D 2024 %7 27.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: To cope with the enormous burdens placed on health care systems around the world, from the strains and stresses caused by longer life expectancy to the large-scale emergency relief actions required by pandemics like COVID-19, many health care companies have been using artificial intelligence (AI) to adapt their services. Nevertheless, conceptual insights into how AI has been transforming the health care sector are still few and far between. This study aims to provide an overarching structure with which to classify the various real-world phenomena. A clear and comprehensive taxonomy will provide consensus on AI-based health care service offerings and sharpen the view of their adoption in the health care sector. Objective: The goal of this study is to identify the design characteristics of AI-based health care services. Methods: We propose a multilayered taxonomy created in accordance with an established method of taxonomy development. In doing so, we applied 268 AI-based health care services, conducted a structured literature review, and then evaluated the resulting taxonomy. Finally, we performed a cluster analysis to identify the archetypes of AI-based health care services. Results: We identified 4 critical perspectives: agents, data, AI, and health impact. Furthermore, a cluster analysis yielded 13 archetypes that demonstrate our taxonomy’s applicability. Conclusions: This contribution to conceptual knowledge of AI-based health care services enables researchers as well as practitioners to analyze such services and improve their theory-led design. %M 39602787 %R 10.2196/53986 %U https://www.jmir.org/2024/1/e53986 %U https://doi.org/10.2196/53986 %U http://www.ncbi.nlm.nih.gov/pubmed/39602787 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e58666 %T Facilitating Trust Calibration in Artificial Intelligence–Driven Diagnostic Decision Support Systems for Determining Physicians’ Diagnostic Accuracy: Quasi-Experimental Study %A Sakamoto,Tetsu %A Harada,Yukinori %A Shimizu,Taro %K trust calibration %K artificial intelligence %K diagnostic accuracy %K diagnostic decision support %K decision support %K diagnosis %K diagnostic %K chart %K history %K reliable %K reliability %K accurate %K accuracy %K AI %D 2024 %7 27.11.2024 %9 %J JMIR Form Res %G English %X Background: Diagnostic errors are significant problems in medical care. Despite the usefulness of artificial intelligence (AI)–based diagnostic decision support systems, the overreliance of physicians on AI-generated diagnoses may lead to diagnostic errors. Objective: We investigated the safe use of AI-based diagnostic decision support systems with trust calibration by adjusting trust levels to match the actual reliability of AI. Methods: A quasi-experimental study was conducted at Dokkyo Medical University, Japan, with physicians allocated (1:1) to the intervention and control groups. A total of 20 clinical cases were created based on the medical histories recorded by an AI-driven automated medical history–taking system from actual patients who visited a community-based hospital in Japan. The participants reviewed the medical histories of 20 clinical cases generated by an AI-driven automated medical history–taking system with an AI-generated list of 10 differential diagnoses and provided 1 to 3 possible diagnoses. Physicians were asked whether the final diagnosis was in the AI-generated list of 10 differential diagnoses in the intervention group, which served as the trust calibration. We analyzed the diagnostic accuracy of physicians and the correctness of the trust calibration in the intervention group. We also investigated the relationship between the accuracy of the trust calibration and the diagnostic accuracy of physicians, and the physicians’ confidence level regarding the use of AI. Results: Among the 20 physicians assigned to the intervention (n=10) and control (n=10) groups, the mean age was 30.9 (SD 3.9) years and 31.7 (SD 4.2) years, the proportion of men was 80% and 60%, and the mean postgraduate year was 5.8 (SD 2.9) and 7.2 (SD 4.6), respectively, with no significant differences. The physicians’ diagnostic accuracy was 41.5% in the intervention group and 46% in the control group, with no significant difference (95% CI −0.75 to 2.55; P=.27). The overall accuracy of the trust calibration was only 61.5%, and despite correct calibration, the diagnostic accuracy was 54.5%. In the multivariate logistic regression model, the accuracy of the trust calibration was a significant contributor to the diagnostic accuracy of physicians (adjusted odds ratio 5.90, 95% CI 2.93‐12.46; P<.001). The mean confidence level for AI was 72.5% in the intervention group and 45% in the control group, with no significant difference. Conclusions: Trust calibration did not significantly improve physicians’ diagnostic accuracy when considering the differential diagnoses generated by reading medical histories and the possible differential diagnosis lists of an AI-driven automated medical history–taking system. As this was a formative study, the small sample size and suboptimal trust calibration methods may have contributed to the lack of significant differences. This study highlights the need for a larger sample size and the implementation of supportive measures of trust calibration. %R 10.2196/58666 %U https://formative.jmir.org/2024/1/e58666 %U https://doi.org/10.2196/58666 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e48914 %T Preliminary Screening for Hereditary Breast and Ovarian Cancer Using an AI Chatbot as a Genetic Counselor: Clinical Study %A Sato,Ann %A Haneda,Eri %A Hiroshima,Yukihiko %A Narimatsu,Hiroto %+ Department of Genetic Medicine, Kanagawa Cancer Center, 2-3-2 Nakao, Asahi-ku, Yokohama, Kanagawa, 241-8515, Japan, 81 045 520 2222, hiroto-narimatsu@umin.org %K hereditary cancer %K familial cancer %K IBM Watson %K family history %K medical history %K cancer %K feasibility %K social network %K screening %K breast cancer %K ovarian cancer %K artificial intelligence %K AI %K chatbot %K genetic %K counselling %K oncology %K conversational agent %K implementation %K usability %K acceptability %D 2024 %7 27.11.2024 %9 Research Letter %J J Med Internet Res %G English %X Background: Hereditary breast and ovarian cancer (HBOC) is a major type of hereditary cancer. Establishing effective screening to identify high-risk individuals for HBOC remains a challenge. We developed a prototype of a chatbot system that uses artificial intelligence (AI) for preliminary HBOC screening to determine whether individuals meet the National Comprehensive Cancer Network BRCA1/2 testing criteria. Objective: This study’s objective was to validate the feasibility of this chatbot in a clinical setting by using it on a patient population that visited a hospital. Methods: We validated the medical accuracy of the chatbot system by performing a test on patients who consecutively visited the Kanagawa Cancer Center. The participants completed a preoperation questionnaire to understand their background, including information technology literacy. After the operation, qualitative interviews were conducted to collect data on the usability and acceptability of the system and examine points needing improvement. Results: A total of 11 participants were enrolled between October and December 2020. All of the participants were women, and among them, 10 (91%) had cancer. According to the questionnaire, 6 (54%) participants had never heard of a chatbot, while 7 (64%) had never used one. All participants were able to complete the chatbot operation, and the average time required for the operation was 18.0 (SD 5.44) minutes. The determinations by the chatbot of whether the participants met the BRCA1/2 testing criteria based on their medical and family history were consistent with those by certified genetic counselors (CGCs). We compared the medical histories obtained from the participants by the CGCs with those by the chatbot. Of the 11 participants, 3 (27%) entered information different from that obtained by the CGCs. These discrepancies were caused by the participant’s omissions or communication errors with the chatbot. Regarding the family histories, the chatbot provided new information for 3 (27%) of the 11 participants and complemented information for the family members of 5 (45%) participants not interviewed by the CGCs. The chatbot could not obtain some information on the family history of 6 (54%) participants due to several reasons, such as being outside of the scope of the chatbot’s interview questions, the participant’s omissions, and communication errors with the chatbot. Interview data were classified into the following: (1) features, (2) appearance, (3) usability and preferences, (4) concerns, (5) benefits, and (6) implementation. Favorable comments on implementation feasibility and comments on improvements were also obtained. Conclusions: This study demonstrated that the preliminary screening system for HBOC using an AI chatbot was feasible for real patients. %M 39602801 %R 10.2196/48914 %U https://www.jmir.org/2024/1/e48914 %U https://doi.org/10.2196/48914 %U http://www.ncbi.nlm.nih.gov/pubmed/39602801 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e58275 %T Ensuring Appropriate Representation in Artificial Intelligence–Generated Medical Imagery: Protocol for a Methodological Approach to Address Skin Tone Bias %A O'Malley,Andrew %A Veenhuizen,Miriam %A Ahmed,Ayla %+ School of Medicine, University of St Andrews, North Haugh, St Andrews, KY16 9TF, United Kingdom, 44 01334462196, aso2@st-andrews.ac.uk %K artificial intelligence %K generative AI %K AI images %K dermatology %K anatomy %K medical education %K medical imaging %K skin %K skin tone %K United States %K educational material %K psoriasis %K digital imagery %D 2024 %7 27.11.2024 %9 Protocol %J JMIR AI %G English %X Background: In medical education, particularly in anatomy and dermatology, generative artificial intelligence (AI) can be used to create customized illustrations. However, the underrepresentation of darker skin tones in medical textbooks and elsewhere, which serve as training data for AI, poses a significant challenge in ensuring diverse and inclusive educational materials. Objective: This study aims to evaluate the extent of skin tone diversity in AI-generated medical images and to test whether the representation of skin tones can be improved by modifying AI prompts to better reflect the demographic makeup of the US population. Methods: In total, 2 standard AI models (Dall-E [OpenAI] and Midjourney [Midjourney Inc]) each generated 100 images of people with psoriasis. In addition, a custom model was developed that incorporated a prompt injection aimed at “forcing” the AI (Dall-E 3) to reflect the skin tone distribution of the US population according to the 2012 American National Election Survey. This custom model generated another set of 100 images. The skin tones in these images were assessed by 3 researchers using the New Immigrant Survey skin tone scale, with the median value representing each image. A chi-square goodness of fit analysis compared the skin tone distributions from each set of images to that of the US population. Results: The standard AI models (Dalle-3 and Midjourney) demonstrated a significant difference between the expected skin tones of the US population and the observed tones in the generated images (P<.001). Both standard AI models overrepresented lighter skin. Conversely, the custom model with the modified prompt yielded a distribution of skin tones that closely matched the expected demographic representation, showing no significant difference (P=.04). Conclusions: This study reveals a notable bias in AI-generated medical images, predominantly underrepresenting darker skin tones. This bias can be effectively addressed by modifying AI prompts to incorporate real-life demographic distributions. The findings emphasize the need for conscious efforts in AI development to ensure diverse and representative outputs, particularly in educational and medical contexts. Users of generative AI tools should be aware that these biases exist, and that similar tendencies may also exist in other types of generative AI (eg, large language models) and in other characteristics (eg, sex, gender, culture, and ethnicity). Injecting demographic data into AI prompts may effectively counteract these biases, ensuring a more accurate representation of the general population. %M 39602221 %R 10.2196/58275 %U https://ai.jmir.org/2024/1/e58275 %U https://doi.org/10.2196/58275 %U http://www.ncbi.nlm.nih.gov/pubmed/39602221 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e55897 %T Implementation of Machine Learning Applications in Health Care Organizations: Systematic Review of Empirical Studies %A Preti,Luigi M %A Ardito,Vittoria %A Compagni,Amelia %A Petracca,Francesco %A Cappellaro,Giulia %+ Department of Social and Political Sciences, Bocconi University, Via Sarfatti 25, Milan, 20136, Italy, 39 02 58365267, giulia.cappellaro@unibocconi.it %K artificial intelligence %K machine learning %K implementation %K health care organization %K barriers %K facilitators %D 2024 %7 25.11.2024 %9 Review %J J Med Internet Res %G English %X Background: There is a growing enthusiasm for machine learning (ML) among academics and health care practitioners. Despite the transformative potential of ML-based applications for patient care, their uptake and implementation in health care organizations are sporadic. Numerous challenges currently impede or delay the widespread implementation of ML in clinical practice, and limited knowledge is available regarding how these challenges have been addressed. Objective: This work aimed to (1) examine the characteristics of ML-based applications and the implementation process in clinical practice, using the Consolidated Framework for Implementation Research (CFIR) for theoretical guidance and (2) synthesize the strategies adopted by health care organizations to foster successful implementation of ML. Methods: A systematic literature review was conducted based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. The search was conducted in PubMed, Scopus, and Web of Science over a 10-year period (2013-2023). The search strategy was built around 4 blocks of keywords (artificial intelligence, implementation, health care, and study type). Only empirical studies documenting the implementation of ML applications in clinical settings were considered. The implementation process was investigated using a thematic analysis and coding procedure. Results: Thirty-four studies were selected for data synthesis. Selected papers were relatively recent, with only 9% (3/34) of records published before 2019. ML-based applications were implemented mostly within hospitals (29/34, 85%). In terms of clinical workflow, ML-based applications supported mostly prognosis (20/34, 59%) and diagnosis (10/34, 29%). The implementation efforts were analyzed using CFIR domains. As for the inner setting domain, access to knowledge and information (12/34, 35%), information technology infrastructure (11/34, 32%), and organizational culture (9/34, 26%) were among the most observed dimensions influencing the success of implementation. As for the ML innovation itself, factors deemed relevant were its design (15/34, 44%), the relative advantage with respect to existing clinical practice (14/34, 41%), and perceived complexity (14/34, 41%). As for the other domains (ie, processes, roles, and outer setting), stakeholder engagement (12/34, 35%), reflecting and evaluating practices (11/34, 32%), and the presence of implementation leaders (9/34, 26%) were the main factors identified as important. Conclusions: This review sheds some light on the factors that are relevant and that should be accounted for in the implementation process of ML-based applications in health care. While the relevance of ML-specific dimensions, like trust, emerges clearly across several implementation domains, the evidence from this review highlighted that relevant implementation factors are not necessarily specific for ML but rather transversal for digital health technologies. More research is needed to further clarify the factors that are relevant to implementing ML-based applications at the organizational level and to support their uptake within health care organizations. Trial Registration: PROSPERO 403873; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=403873 International Registered Report Identifier (IRRID): RR2-10.2196/47971 %M 39586084 %R 10.2196/55897 %U https://www.jmir.org/2024/1/e55897 %U https://doi.org/10.2196/55897 %U http://www.ncbi.nlm.nih.gov/pubmed/39586084 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54357 %T Antihypertensive Drug Recommendations for Reducing Arterial Stiffness in Patients With Hypertension: Machine Learning–Based Multicohort (RIGIPREV) Study %A Cavero-Redondo,Iván %A Martinez-Rodrigo,Arturo %A Saz-Lara,Alicia %A Moreno-Herraiz,Nerea %A Casado-Vicente,Veronica %A Gomez-Sanchez,Leticia %A Garcia-Ortiz,Luis %A Gomez-Marcos,Manuel A %A , %+ CarVasCare Research Group, Facultad de Enfermería de Cuenca, Universidad de Castilla-La Mancha, C. Santa Teresa Jornet s/n, Cuenca, 16001, Spain, 34 969179100, alicia.delsaz@uclm.es %K antihypertensive %K drugs %K models %K patients %K pulse wave velocity %K recommendations %K hypertension %K machine learning %K drug recommendations %K arterial stiffness %K RIGIPREV %D 2024 %7 25.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: High systolic blood pressure is one of the leading global risk factors for mortality, contributing significantly to cardiovascular diseases. Despite advances in treatment, a large proportion of patients with hypertension do not achieve optimal blood pressure control. Arterial stiffness (AS), measured by pulse wave velocity (PWV), is an independent predictor of cardiovascular events and overall mortality. Various antihypertensive drugs exhibit differential effects on PWV, but the extent to which these effects vary depending on individual patient characteristics is not well understood. Given the complexity of selecting the most appropriate antihypertensive medication for reducing PWV, machine learning (ML) techniques offer an opportunity to improve personalized treatment recommendations. Objective: This study aims to develop an ML model that provides personalized recommendations for antihypertensive medications aimed at reducing PWV. The model considers individual patient characteristics, such as demographic factors, clinical data, and cardiovascular measurements, to identify the most suitable antihypertensive agent for improving AS. Methods: This study, known as the RIGIPREV study, used data from the EVA, LOD-DIABETES, and EVIDENT studies involving individuals with hypertension with baseline and follow-up measurements. Antihypertensive drugs were grouped into classes such as angiotensin-converting enzyme inhibitors (ACEIs), angiotensin receptor blockers (ARBs), β-blockers, diuretics, and combinations of diuretics with ACEIs or ARBs. The primary outcomes were carotid-femoral and brachial-ankle PWV, while the secondary outcomes included various cardiovascular, anthropometric, and biochemical parameters. A multioutput regressor using 6 random forest models was used to predict the impact of each antihypertensive class on PWV reduction. Model performance was evaluated using the coefficient of determination (R2) and mean squared error. Results: The random forest models exhibited strong predictive capabilities, with internal validation yielding R2 values between 0.61 and 0.74, while external validation showed a range of 0.26 to 0.46. The mean squared values ranged from 0.08 to 0.22 for internal validation and from 0.29 to 0.45 for external validation. Variable importance analysis revealed that glycated hemoglobin and weight were the most critical predictors for ACEIs, while carotid-femoral PWV and total cholesterol were key variables for ARBs. The decision tree model achieved an accuracy of 84.02% in identifying the most suitable antihypertensive drug based on individual patient characteristics. Furthermore, the system’s recommendations for ARBs matched 55.3% of patients’ original prescriptions. Conclusions: This study demonstrates the utility of ML techniques in providing personalized treatment recommendations for antihypertensive therapy. By accounting for individual patient characteristics, the model improves the selection of drugs that control blood pressure and reduce AS. These findings could significantly aid clinicians in optimizing hypertension management and reducing cardiovascular risk. However, further studies with larger and more diverse populations are necessary to validate these results and extend the model’s applicability. %M 39585738 %R 10.2196/54357 %U https://www.jmir.org/2024/1/e54357 %U https://doi.org/10.2196/54357 %U http://www.ncbi.nlm.nih.gov/pubmed/39585738 %0 Journal Article %@ 2561-6722 %I JMIR Publications %V 7 %N %P e59564 %T Exploring the Use of a Length AI Algorithm to Estimate Children’s Length from Smartphone Images in a Real-World Setting: Algorithm Development and Usability Study %A Chua,Mei Chien %A Hadimaja,Matthew %A Wong,Jill %A Mukherjee,Sankha Subhra %A Foussat,Agathe %A Chan,Daniel %A Nandal,Umesh %A Yap,Fabian %+ Endocrinology Service, Division of Medicine, KK Women's and Children's Hospital, 100 Bukit Timah Road, Endocrinology Service,, Division of Medicine, Level 3, Clinical Staff Office, Singapore, 229889, Singapore, 65 6394 1127, fabian.yap.k.p@singhealth.com.sg %K computer vision %K length estimation %K artificial intelligence %K smartphone images %K children %K AI %K algorithm %K imaging %K height %K length %K measure %K pediatric %K infant %K neonatal %K newborn %K smartphone %K mHealth %K mobile health %K mobile phone %D 2024 %7 22.11.2024 %9 Original Paper %J JMIR Pediatr Parent %G English %X Background: Length measurement in young children younger than 18 months is important for monitoring growth and development. Accurate length measurement requires proper equipment, standardized methods, and trained personnel. In addition, length measurement requires young children’s cooperation, making it particularly challenging during infancy and toddlerhood. Objective: This study aimed to develop a length artificial intelligence (LAI) algorithm to aid users in determining recumbent length conveniently from smartphone images and explore its performance and suitability for personal and clinical use. Methods: This proof-of-concept study in healthy children (aged 0-18 months) was performed at KK Women’s and Children’s Hospital, Singapore, from November 2021 to March 2022. Smartphone images were taken by parents and investigators. Standardized length-board measurements were taken by trained investigators. Performance was evaluated by comparing the tool’s image-based length estimations with length-board measurements (bias [mean error, mean difference between measured and predicted length]; absolute error [magnitude of error]). Prediction performance was evaluated on an individual-image basis and participant-averaged basis. User experience was collected through questionnaires. Results: A total of 215 participants (median age 4.4, IQR 1.9-9.7 months) were included. The tool produced a length prediction for 99.4% (2211/2224) of photos analyzed. The mean absolute error was 2.47 cm for individual image predictions and 1.77 cm for participant-averaged predictions. Investigators and parents reported no difficulties in capturing the required photos for most participants (182/215, 84.7% participants and 144/200, 72% participants, respectively). Conclusions: The LAI algorithm is an accessible and novel way of estimating children’s length from smartphone images without the need for specialized equipment or trained personnel. The LAI algorithm’s current performance and ease of use suggest its potential for use by parents or caregivers with an accuracy approaching what is typically achieved in general clinics or community health settings. The results show that the algorithm is acceptable for use in a personal setting, serving as a proof of concept for use in clinical settings. Trial Registration: ClinicalTrials.gov NCT05079776; https://clinicaltrials.gov/ct2/show/NCT05079776 %M 39576977 %R 10.2196/59564 %U https://pediatrics.jmir.org/2024/1/e59564 %U https://doi.org/10.2196/59564 %U http://www.ncbi.nlm.nih.gov/pubmed/39576977 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e57486 %T Identification of a Susceptible and High-Risk Population for Postoperative Systemic Inflammatory Response Syndrome in Older Adults: Machine Learning–Based Predictive Model %A Mai,Haiyan %A Lu,Yaxin %A Fu,Yu %A Luo,Tongsen %A Li,Xiaoyue %A Zhang,Yihan %A Liu,Zifeng %A Zhang,Yuenong %A Zhou,Shaoli %A Chen,Chaojin %+ Department of Anesthesiology, The Third Affiliated Hospital of Sun Yat-sen University, 600 Tianhe Road, Tianhe District, Guangzhou, 510631, China, 86 020 85253333, chenchj28@mail.sysu.edu.cn %K older adult patients %K postoperative SIRS %K sepsis %K machine learning %K prediction model %D 2024 %7 22.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Systemic inflammatory response syndrome (SIRS) is a serious postoperative complication among older adult surgical patients that frequently develops into sepsis or even death. Notably, the incidences of SIRS and sepsis steadily increase with age. It is important to identify the risk of postoperative SIRS for older adult patients at a sufficiently early stage, which would allow preemptive individualized enhanced therapy to be conducted to improve the prognosis of older adult patients. In recent years, machine learning (ML) models have been deployed by researchers for many tasks, including disease prediction and risk stratification, exhibiting good application potential. Objective: We aimed to develop and validate an individualized predictive model to identify susceptible and high-risk populations for SIRS in older adult patients to instruct appropriate early interventions. Methods: Data for surgical patients aged ≥65 years from September 2015 to September 2020 in 3 independent medical centers were retrieved and analyzed. The eligible patient cohort in the Third Affiliated Hospital of Sun Yat-sen University was randomly separated into an 80% training set (2882 patients) and a 20% internal validation set (720 patients). We developed 4 ML models to predict postoperative SIRS. The area under the receiver operating curve (AUC), F1 score, Brier score, and calibration curve were used to evaluate the model performance. The model with the best performance was further validated in the other 2 independent data sets involving 844 and 307 cases, respectively. Results: The incidences of SIRS in the 3 medical centers were 24.3% (876/3602), 29.6% (250/844), and 6.5% (20/307), respectively. We identified 15 variables that were significantly associated with postoperative SIRS and used in 4 ML models to predict postoperative SIRS. A balanced cutoff between sensitivity and specificity was chosen to ensure as high a true positive as possible. The random forest classifier (RF) model showed the best overall performance to predict postoperative SIRS, with an AUC of 0.751 (95% CI 0.709-0.793), sensitivity of 0.682, specificity of 0.681, and F1 score of 0.508 in the internal validation set and higher AUCs in the external validation-1 set (0.759, 95% CI 0.723-0.795) and external validation-2 set (0.804, 95% CI 0.746-0.863). Conclusions: We developed and validated a generalizable RF model to predict postoperative SIRS in older adult patients, enabling clinicians to screen susceptible and high-risk patients and implement early individualized interventions. An online risk calculator to make the RF model accessible to anesthesiologists and peers around the world was developed. %M 39501984 %R 10.2196/57486 %U https://www.jmir.org/2024/1/e57486 %U https://doi.org/10.2196/57486 %U http://www.ncbi.nlm.nih.gov/pubmed/39501984 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e55734 %T Development and Validation of a Machine Learning–Based Early Warning Model for Lichenoid Vulvar Disease: Prediction Model Development Study %A Meng,Jian %A Niu,Xiaoyu %A Luo,Can %A Chen,Yueyue %A Li,Qiao %A Wei,Dongmei %+ Department of Obstetrics and Gynecology, West China Second Hospital, Sichuan University, No.20, section 3, South Renmin Road, Wuhou District, Sichuan Province, Chengdu, 610041, China, 86 18982150655, weidongmei@scu.edu.cn %K female %K lichenoid vulvar disease %K risk factors %K evidence-based medicine %K early warning model %D 2024 %7 22.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Given the complexity and diversity of lichenoid vulvar disease (LVD) risk factors, it is crucial to actively explore these factors and construct personalized warning models using relevant clinical variables to assess disease risk in patients. Yet, to date, there has been insufficient research, both nationwide and internationally, on risk factors and warning models for LVD. In light of these gaps, this study represents the first systematic exploration of the risk factors associated with LVD. Objective: The risk factors of LVD in women were explored and a medically evidence-based warning model was constructed to provide an early alert tool for the high-risk target population. The model can be applied in the clinic to identify high-risk patients and evaluate its accuracy and practicality in predicting LVD in women. Simultaneously, it can also enhance the diagnostic and treatment proficiency of medical personnel in primary community health service centers, which is of great significance in reducing overall health care spending and disease burden. Methods: A total of 2990 patients who attended West China Second Hospital of Sichuan University from January 2013 to December 2017 were selected as the study candidates and were divided into 1218 cases in the normal vulvovagina group (group 0) and 1772 cases in the lichenoid vulvar disease group (group 1) according to the results of the case examination. We investigated and collected routine examination data from patients for intergroup comparisons, included factors with significant differences in multifactorial analysis, and constructed logistic regression, random forests, gradient boosting machine (GBM), adaboost, eXtreme Gradient Boosting, and Categorical Boosting analysis models. The predictive efficacy of these six models was evaluated using receiver operating characteristic curve and area under the curve. Results: Univariate analysis revealed that vaginitis, urinary incontinence, humidity of the long-term residential environment, spicy dietary habits, regular intake of coffee or caffeinated beverages, daily sleep duration, diabetes mellitus, smoking history, presence of autoimmune diseases, menopausal status, and hypertension were all significant risk factors affecting female LVD. Furthermore, the area under the receiver operating characteristic curve, accuracy, sensitivity, and F1-score of the GBM warning model were notably higher than the other 5 predictive analysis models. The GBM analysis model indicated that menopausal status had the strongest impact on female LVD, showing a positive correlation, followed by the presence of autoimmune diseases, which also displayed a positive dependency. Conclusions: In accordance with evidence-based medicine, the construction of a predictive warning model for female LVD can be used to identify high-risk populations at an early stage, aiding in the formulation of effective preventive measures, which is of paramount importance for reducing the incidence of LVD in women. %M 39576990 %R 10.2196/55734 %U https://www.jmir.org/2024/1/e55734 %U https://doi.org/10.2196/55734 %U http://www.ncbi.nlm.nih.gov/pubmed/39576990 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e59260 %T Hospital Length of Stay Prediction for Planned Admissions Using Observational Medical Outcomes Partnership Common Data Model: Retrospective Study %A Lee,Haeun %A Kim,Seok %A Moon,Hui-Woun %A Lee,Ho-Young %A Kim,Kwangsoo %A Jung,Se Young %A Yoo,Sooyoung %+ Department of Family Medicine, Seoul National University Bundang Hospital, 172, Dolma-ro bundang-gu, Seongnam-si, 13605, Republic of Korea, 82 0317878845, syjung@snubh.org %K length of stay %K machine learning %K Observational Medical Outcomes Partnership Common Data Model %K allocation of resources %K reproducibility of results %K hospital %K admission %K retrospective study %K prediction model %K electronic health record %K EHR %K South Korea %K logistic regression %K algorithm %K Shapley Additive Explanation %K health care %K clinical informatics %D 2024 %7 22.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Accurate hospital length of stay (LoS) prediction enables efficient resource management. Conventional LoS prediction models with limited covariates and nonstandardized data have limited reproducibility when applied to the general population. Objective: In this study, we developed and validated a machine learning (ML)–based LoS prediction model for planned admissions using the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM). Methods: Retrospective patient-level prediction models used electronic health record (EHR) data converted to the OMOP CDM (version 5.3) from Seoul National University Bundang Hospital (SNUBH) in South Korea. The study included 137,437 hospital admission episodes between January 2016 and December 2020. Covariates from the patient, condition occurrence, medication, observation, measurement, procedure, and visit occurrence tables were included in the analysis. To perform feature selection, we applied Lasso regularization in the logistic regression. The primary outcome was an LoS of 7 days or longer, while the secondary outcome was an LoS of 3 days or longer. The prediction models were developed using 6 ML algorithms, with the training and test set split in a 7:3 ratio. The performance of each model was evaluated based on the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). Shapley Additive Explanations (SHAP) analysis measured feature importance, while calibration plots assessed the reliability of the prediction models. External validation of the developed models occurred at an independent institution, the Seoul National University Hospital. Results: The final sample included 129,938 patient entry events in the planned admissions. The Extreme Gradient Boosting (XGB) model achieved the best performance in binary classification for predicting an LoS of 7 days or longer, with an AUROC of 0.891 (95% CI 0.887-0.894) and an AUPRC of 0.819 (95% CI 0.813-0.826) on the internal test set. The Light Gradient Boosting (LGB) model performed the best in the multiclassification for predicting an LoS of 3 days or more, with an AUROC of 0.901 (95% CI 0.898-0.904) and an AUPRC of 0.770 (95% CI 0.762-0.779). The most important features contributing to the models were the operation performed, frequency of previous outpatient visits, patient admission department, age, and day of admission. The RF model showed robust performance in the external validation set, achieving an AUROC of 0.804 (95% CI 0.802-0.807). Conclusions: The use of the OMOP CDM in predicting hospital LoS for planned admissions demonstrates promising predictive capabilities for stays of varying durations. It underscores the advantage of standardized data in achieving reproducible results. This approach should serve as a model for enhancing operational efficiency and patient care coordination across health care settings. %M 39576284 %R 10.2196/59260 %U https://www.jmir.org/2024/1/e59260 %U https://doi.org/10.2196/59260 %U http://www.ncbi.nlm.nih.gov/pubmed/39576284 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e59902 %T Performance Comparison of Junior Residents and ChatGPT in the Objective Structured Clinical Examination (OSCE) for Medical History Taking and Documentation of Medical Records: Development and Usability Study %A Huang,Ting-Yun %A Hsieh,Pei Hsing %A Chang,Yung-Chun %K large language model %K medical history taking %K clinical documentation %K simulation-based evaluation %K OSCE standards %K LLM %D 2024 %7 21.11.2024 %9 %J JMIR Med Educ %G English %X Background: This study explores the cutting-edge abilities of large language models (LLMs) such as ChatGPT in medical history taking and medical record documentation, with a focus on their practical effectiveness in clinical settings—an area vital for the progress of medical artificial intelligence. Objective: Our aim was to assess the capability of ChatGPT versions 3.5 and 4.0 in performing medical history taking and medical record documentation in simulated clinical environments. The study compared the performance of nonmedical individuals using ChatGPT with that of junior medical residents. Methods: A simulation involving standardized patients was designed to mimic authentic medical history–taking interactions. Five nonmedical participants used ChatGPT versions 3.5 and 4.0 to conduct medical histories and document medical records, mirroring the tasks performed by 5 junior residents in identical scenarios. A total of 10 diverse scenarios were examined. Results: Evaluation of the medical documentation created by laypersons with ChatGPT assistance and those created by junior residents was conducted by 2 senior emergency physicians using audio recordings and the final medical records. The assessment used the Objective Structured Clinical Examination benchmarks in Taiwan as a reference. ChatGPT-4.0 exhibited substantial enhancements over its predecessor and met or exceeded the performance of human counterparts in terms of both checklist and global assessment scores. Although the overall quality of human consultations remained higher, ChatGPT-4.0’s proficiency in medical documentation was notably promising. Conclusions: The performance of ChatGPT 4.0 was on par with that of human participants in Objective Structured Clinical Examination evaluations, signifying its potential in medical history and medical record documentation. Despite this, the superiority of human consultations in terms of quality was evident. The study underscores both the promise and the current limitations of LLMs in the realm of clinical practice. %R 10.2196/59902 %U https://mededu.jmir.org/2024/1/e59902 %U https://doi.org/10.2196/59902 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 11 %N %P e52514 %T The Promise of AI for Image-Driven Medicine: Qualitative Interview Study of Radiologists’ and Pathologists’ Perspectives %A Drogt,Jojanneke %A Milota,Megan %A Veldhuis,Wouter %A Vos,Shoko %A Jongsma,Karin %K digital medicine %K computer vision %K medical AI %K image-driven specialisms %K qualitative interview study %K digital health ethics %K artificial intelligence %K AI %K imaging %K imaging informatics %K radiology %K pathology %D 2024 %7 21.11.2024 %9 %J JMIR Hum Factors %G English %X Background: Image-driven specialisms such as radiology and pathology are at the forefront of medical artificial intelligence (AI) innovation. Many believe that AI will lead to significant shifts in professional roles, so it is vital to investigate how professionals view the pending changes that AI innovation will initiate and incorporate their views in ongoing AI developments. Objective: Our study aimed to gain insights into the perspectives and wishes of radiologists and pathologists regarding the promise of AI. Methods: We have conducted the first qualitative interview study investigating the perspectives of both radiologists and pathologists regarding the integration of AI in their fields. The study design is in accordance with the consolidated criteria for reporting qualitative research (COREQ). Results: In total, 21 participants were interviewed for this study (7 pathologists, 10 radiologists, and 4 computer scientists). The interviews revealed a diverse range of perspectives on the impact of AI. Respondents discussed various task-specific benefits of AI; yet, both pathologists and radiologists agreed that AI had yet to live up to its hype. Overall, our study shows that AI could facilitate welcome changes in the workflows of image-driven professionals and eventually lead to better quality of care. At the same time, these professionals also admitted that many hopes and expectations for AI were unlikely to become a reality in the next decade. Conclusions: This study points to the importance of maintaining a “healthy skepticism” on the promise of AI in imaging specialisms and argues for more structural and inclusive discussions about whether AI is the right technology to solve current problems encountered in daily clinical practice. %R 10.2196/52514 %U https://humanfactors.jmir.org/2024/1/e52514 %U https://doi.org/10.2196/52514 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51477 %T Performance of a Full-Coverage Cervical Cancer Screening Program Using on an Artificial Intelligence– and Cloud-Based Diagnostic System: Observational Study of an Ultralarge Population %A Ji,Lu %A Yao,Yifan %A Yu,Dandan %A Chen,Wen %A Yin,Shanshan %A Fu,Yun %A Tang,Shangfeng %A Yao,Lan %+ School of Medicine and Health Management, Tongji Medical College of Huazhong University of Science and Technology, 13 Hangkong Road, Wuhan, 430030, China, 86 027 83692727, ylhuster@163.com %K full coverage %K cervical cancer screening %K artificial intelligence %K primary health institutions %K accessibility %K efficiency %D 2024 %7 20.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: The World Health Organization has set a global strategy to eliminate cervical cancer, emphasizing the need for cervical cancer screening coverage to reach 70%. In response, China has developed an action plan to accelerate the elimination of cervical cancer, with Hubei province implementing China’s first provincial full-coverage screening program using an artificial intelligence (AI) and cloud-based diagnostic system. Objective: This study aimed to evaluate the performance of AI technology in this full-coverage screening program. The evaluation indicators included accessibility, screening efficiency, diagnostic quality, and program cost. Methods: Characteristics of 1,704,461 individuals screened from July 2022 to January 2023 were used to analyze accessibility and AI screening efficiency. A random sample of 220 individuals was used for external diagnostic quality control. The costs of different participating screening institutions were assessed. Results: Cervical cancer screening services were extended to all administrative districts, especially in rural areas. Rural women had the highest participation rate at 67.54% (1,147,839/1,699,591). Approximately 1.7 million individuals were screened, achieving a cumulative coverage of 13.45% in about 6 months. Full-coverage programs could be achieved by AI technology in approximately 1 year, which was 87.5 times more efficient than the manual reading of slides. The sample compliance rate was as high as 99.1%, and compliance rates for positive, negative, and pathology biopsy reviews exceeded 96%. The cost of this program was CN ¥49 (the average exchange rate in 2022 is as follows: US $1=CN ¥6.7261) per person, with the primary screening institution and the third-party testing institute receiving CN ¥19 and ¥27, respectively. Conclusions: AI-assisted diagnosis has proven to be accessible, efficient, reliable, and low cost, which could support the implementation of full-coverage screening programs, especially in areas with insufficient health resources. AI technology served as a crucial tool for rapidly and effectively increasing screening coverage, which would accelerate the achievement of the World Health Organization’s goals of eliminating cervical cancer. %M 39566061 %R 10.2196/51477 %U https://www.jmir.org/2024/1/e51477 %U https://doi.org/10.2196/51477 %U http://www.ncbi.nlm.nih.gov/pubmed/39566061 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e58329 %T Evaluation Framework of Large Language Models in Medical Documentation: Development and Usability Study %A Seo,Junhyuk %A Choi,Dasol %A Kim,Taerim %A Cha,Won Chul %A Kim,Minha %A Yoo,Haanju %A Oh,Namkee %A Yi,YongJin %A Lee,Kye Hwa %A Choi,Edward %+ Department of Digital Health, Samsung Advanced Institute of Health Sciences and Technology (SAIHST), Sungkyunkwan University, 115, Irwon-ro, Gangnam-gu, Seoul, 06355, Republic of Korea, 82 010 7114 2342, taerim.j.kim@gmail.com %K large language models %K health care documentation %K clinical evaluation %K emergency department %K artificial intelligence %K medical record accuracy %D 2024 %7 20.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: The advancement of large language models (LLMs) offers significant opportunities for health care, particularly in the generation of medical documentation. However, challenges related to ensuring the accuracy and reliability of LLM outputs, coupled with the absence of established quality standards, have raised concerns about their clinical application. Objective: This study aimed to develop and validate an evaluation framework for assessing the accuracy and clinical applicability of LLM-generated emergency department (ED) records, aiming to enhance artificial intelligence integration in health care documentation. Methods: We organized the Healthcare Prompt-a-thon, a competitive event designed to explore the capabilities of LLMs in generating accurate medical records. The event involved 52 participants who generated 33 initial ED records using HyperCLOVA X, a Korean-specialized LLM. We applied a dual evaluation approach. First, clinical evaluation: 4 medical professionals evaluated the records using a 5-point Likert scale across 5 criteria—appropriateness, accuracy, structure/format, conciseness, and clinical validity. Second, quantitative evaluation: We developed a framework to categorize and count errors in the LLM outputs, identifying 7 key error types. Statistical methods, including Pearson correlation and intraclass correlation coefficients (ICC), were used to assess consistency and agreement among evaluators. Results: The clinical evaluation demonstrated strong interrater reliability, with ICC values ranging from 0.653 to 0.887 (P<.001), and a test-retest reliability Pearson correlation coefficient of 0.776 (P<.001). Quantitative analysis revealed that invalid generation errors were the most common, constituting 35.38% of total errors, while structural malformation errors had the most significant negative impact on the clinical evaluation score (Pearson r=–0.654; P<.001). A strong negative correlation was found between the number of quantitative errors and clinical evaluation scores (Pearson r=–0.633; P<.001), indicating that higher error rates corresponded to lower clinical acceptability. Conclusions: Our research provides robust support for the reliability and clinical acceptability of the proposed evaluation framework. It underscores the framework’s potential to mitigate clinical burdens and foster the responsible integration of artificial intelligence technologies in health care, suggesting a promising direction for future research and practical applications in the field. %M 39566044 %R 10.2196/58329 %U https://www.jmir.org/2024/1/e58329 %U https://doi.org/10.2196/58329 %U http://www.ncbi.nlm.nih.gov/pubmed/39566044 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e64844 %T Comparative Analysis of Diagnostic Performance: Differential Diagnosis Lists by LLaMA3 Versus LLaMA2 for Case Reports %A Hirosawa,Takanobu %A Harada,Yukinori %A Tokumasu,Kazuki %A Shiraishi,Tatsuya %A Suzuki,Tomoharu %A Shimizu,Taro %+ Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, 880 Kitakobayashi, Mibu-cho, Shimotsuga, 321-0293, Japan, 81 0282861111, hirosawa@dokkyomed.ac.jp %K artificial intelligence %K clinical decision support system %K generative artificial intelligence %K large language models %K natural language processing %K NLP %K AI %K clinical decision making %K decision support %K decision making %K LLM: diagnostic %K case report %K diagnosis %K generative AI %K LLaMA %D 2024 %7 19.11.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Generative artificial intelligence (AI), particularly in the form of large language models, has rapidly developed. The LLaMA series are popular and recently updated from LLaMA2 to LLaMA3. However, the impacts of the update on diagnostic performance have not been well documented. Objective: We conducted a comparative evaluation of the diagnostic performance in differential diagnosis lists generated by LLaMA3 and LLaMA2 for case reports. Methods: We analyzed case reports published in the American Journal of Case Reports from 2022 to 2023. After excluding nondiagnostic and pediatric cases, we input the remaining cases into LLaMA3 and LLaMA2 using the same prompt and the same adjustable parameters. Diagnostic performance was defined by whether the differential diagnosis lists included the final diagnosis. Multiple physicians independently evaluated whether the final diagnosis was included in the top 10 differentials generated by LLaMA3 and LLaMA2. Results: In our comparative evaluation of the diagnostic performance between LLaMA3 and LLaMA2, we analyzed differential diagnosis lists for 392 case reports. The final diagnosis was included in the top 10 differentials generated by LLaMA3 in 79.6% (312/392) of the cases, compared to 49.7% (195/392) for LLaMA2, indicating a statistically significant improvement (P<.001). Additionally, LLaMA3 showed higher performance in including the final diagnosis in the top 5 differentials, observed in 63% (247/392) of cases, compared to LLaMA2’s 38% (149/392, P<.001). Furthermore, the top diagnosis was accurately identified by LLaMA3 in 33.9% (133/392) of cases, significantly higher than the 22.7% (89/392) achieved by LLaMA2 (P<.001). The analysis across various medical specialties revealed variations in diagnostic performance with LLaMA3 consistently outperforming LLaMA2. Conclusions: The results reveal that the LLaMA3 model significantly outperforms LLaMA2 per diagnostic performance, with a higher percentage of case reports having the final diagnosis listed within the top 10, top 5, and as the top diagnosis. Overall diagnostic performance improved almost 1.5 times from LLaMA2 to LLaMA3. These findings support the rapid development and continuous refinement of generative AI systems to enhance diagnostic processes in medicine. However, these findings should be carefully interpreted for clinical application, as generative AI, including the LLaMA series, has not been approved for medical applications such as AI-enhanced diagnostics. %M 39561356 %R 10.2196/64844 %U https://formative.jmir.org/2024/1/e64844 %U https://doi.org/10.2196/64844 %U http://www.ncbi.nlm.nih.gov/pubmed/39561356 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e64806 %T Predicting Pain Response to a Remote Musculoskeletal Care Program for Low Back Pain Management: Development of a Prediction Tool %A C Areias,Anabela %A G Moulder,Robert %A Molinos,Maria %A Janela,Dora %A Bento,Virgílio %A Moreira,Carolina %A Yanamadala,Vijay %A P Cohen,Steven %A Dias Correia,Fernando %A Costa,Fabíola %+ Sword Health Inc, 13937 Sprague Lane, Suite 100, Draper, UT, 84020, United States, 1 385 308 8034, f.costa@swordhealth.com %K telerehabilitation %K predictive modeling %K personalized medicine %K rehabilitation %K clinical decision support %K machine learning %K artificial intelligence %D 2024 %7 19.11.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Low back pain (LBP) presents with diverse manifestations, necessitating personalized treatment approaches that recognize various phenotypes within the same diagnosis, which could be achieved through precision medicine. Although prediction strategies have been explored, including those employing artificial intelligence (AI), they still lack scalability and real-time capabilities. Digital care programs (DCPs) facilitate seamless data collection through the Internet of Things and cloud storage, creating an ideal environment for developing and implementing an AI predictive tool to assist clinicians in dynamically optimizing treatment. Objective: This study aims to develop an AI tool that continuously assists physical therapists in predicting an individual’s potential for achieving clinically significant pain relief by the end of the program. A secondary aim was to identify predictors of pain nonresponse to guide treatment adjustments. Methods: Data collected actively (eg, demographic and clinical information) and passively in real-time (eg, range of motion, exercise performance, and socioeconomic data from public data sources) from 6125 patients enrolled in a remote digital musculoskeletal intervention program were stored in the cloud. Two machine learning techniques, recurrent neural networks (RNNs) and light gradient boosting machine (LightGBM), continuously analyzed session updates up to session 7 to predict the likelihood of achieving significant pain relief at the program end. Model performance was assessed using the area under the receiver operating characteristic curve (ROC-AUC), precision-recall curves, specificity, and sensitivity. Model explainability was assessed using SHapley Additive exPlanations values. Results: At each session, the model provided a prediction about the potential of being a pain responder, with performance improving over time (P<.001). By session 7, the RNN achieved an ROC-AUC of 0.70 (95% CI 0.65-0.71), and the LightGBM achieved an ROC-AUC of 0.71 (95% CI 0.67-0.72). Both models demonstrated high specificity in scenarios prioritizing high precision. The key predictive features were pain-associated domains, exercise performance, motivation, and compliance, informing continuous treatment adjustments to maximize response rates. Conclusions: This study underscores the potential of an AI predictive tool within a DCP to enhance the management of LBP, supporting physical therapists in redirecting care pathways early and throughout the treatment course. This approach is particularly important for addressing the heterogeneous phenotypes observed in LBP. Trial Registration: ClinicalTrials.gov NCT04092946; https://clinicaltrials.gov/ct2/show/NCT04092946 and NCT05417685; https://clinicaltrials.gov/ct2/show/NCT05417685 %M 39561359 %R 10.2196/64806 %U https://medinform.jmir.org/2024/1/e64806 %U https://doi.org/10.2196/64806 %U http://www.ncbi.nlm.nih.gov/pubmed/39561359 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e63445 %T Using Large Language Models to Abstract Complex Social Determinants of Health From Original and Deidentified Medical Notes: Development and Validation Study %A Ralevski,Alexandra %A Taiyab,Nadaa %A Nossal,Michael %A Mico,Lindsay %A Piekos,Samantha %A Hadlock,Jennifer %+ Institute for Systems Biology, 401 Terry Ave N, Seattle, WA, 98121, United States, 1 732 1359, jhadlock@isbscience.org %K housing instability %K housing insecurity %K housing %K machine learning %K artificial intelligence %K AI %K large language model %K LLM %K natural language processing %K NLP %K electronic health record %K EHR %K electronic medical record %K EMR %K social determinants of health %K exposome %K pregnancy %K obstetric %K deidentification %D 2024 %7 19.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Social determinants of health (SDoH) such as housing insecurity are known to be intricately linked to patients’ health status. More efficient methods for abstracting structured data on SDoH can help accelerate the inclusion of exposome variables in biomedical research and support health care systems in identifying patients who could benefit from proactive outreach. Large language models (LLMs) developed from Generative Pre-trained Transformers (GPTs) have shown potential for performing complex abstraction tasks on unstructured clinical notes. Objective: Here, we assess the performance of GPTs on identifying temporal aspects of housing insecurity and compare results between both original and deidentified notes. Methods: We compared the ability of GPT-3.5 and GPT-4 to identify instances of both current and past housing instability, as well as general housing status, from 25,217 notes from 795 pregnant women. Results were compared with manual abstraction, a named entity recognition model, and regular expressions. Results: Compared with GPT-3.5 and the named entity recognition model, GPT-4 had the highest performance and had a much higher recall (0.924) than human abstractors (0.702) in identifying patients experiencing current or past housing instability, although precision was lower (0.850) compared with human abstractors (0.971). GPT-4’s precision improved slightly (0.936 original, 0.939 deidentified) on deidentified versions of the same notes, while recall dropped (0.781 original, 0.704 deidentified). Conclusions: This work demonstrates that while manual abstraction is likely to yield slightly more accurate results overall, LLMs can provide a scalable, cost-effective solution with the advantage of greater recall. This could support semiautomated abstraction, but given the potential risk for harm, human review would be essential before using results for any patient engagement or care decisions. Furthermore, recall was lower when notes were deidentified prior to LLM abstraction. %M 39561354 %R 10.2196/63445 %U https://www.jmir.org/2024/1/e63445 %U https://doi.org/10.2196/63445 %U http://www.ncbi.nlm.nih.gov/pubmed/39561354 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e59439 %T Mitigating Cognitive Biases in Clinical Decision-Making Through Multi-Agent Conversations Using Large Language Models: Simulation Study %A Ke,Yuhe %A Yang,Rui %A Lie,Sui An %A Lim,Taylor Xin Yi %A Ning,Yilin %A Li,Irene %A Abdullah,Hairil Rizal %A Ting,Daniel Shu Wei %A Liu,Nan %+ Centre for Quantitative Medicine, Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore, 65 66016503, liu.nan@duke-nus.edu.sg %K clinical decision-making %K cognitive bias %K generative artificial intelligence %K large language model %K multi-agent %D 2024 %7 19.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes. Addressing these biases presents a formidable challenge in the medical field. Objective: This study aimed to explore the role of large language models (LLMs) in mitigating these biases through the use of the multi-agent framework. We simulate the clinical decision-making processes through multi-agent conversation and evaluate its efficacy in improving diagnostic accuracy compared with humans. Methods: A total of 16 published and unpublished case reports where cognitive biases have resulted in misdiagnoses were identified from the literature. In the multi-agent framework, we leveraged GPT-4 (OpenAI) to facilitate interactions among different simulated agents to replicate clinical team dynamics. Each agent was assigned a distinct role: (1) making the final diagnosis after considering the discussions, (2) acting as a devil’s advocate to correct confirmation and anchoring biases, (3) serving as a field expert in the required medical subspecialty, (4) facilitating discussions to mitigate premature closure bias, and (5) recording and summarizing findings. We tested varying combinations of these agents within the framework to determine which configuration yielded the highest rate of correct final diagnoses. Each scenario was repeated 5 times for consistency. The accuracy of the initial diagnoses and the final differential diagnoses were evaluated, and comparisons with human-generated answers were made using the Fisher exact test. Results: A total of 240 responses were evaluated (3 different multi-agent frameworks). The initial diagnosis had an accuracy of 0% (0/80). However, following multi-agent discussions, the accuracy for the top 2 differential diagnoses increased to 76% (61/80) for the best-performing multi-agent framework (Framework 4-C). This was significantly higher compared with the accuracy achieved by human evaluators (odds ratio 3.49; P=.002). Conclusions: The multi-agent framework demonstrated an ability to re-evaluate and correct misconceptions, even in scenarios with misleading initial investigations. In addition, the LLM-driven, multi-agent conversation framework shows promise in enhancing diagnostic accuracy in diagnostically challenging medical scenarios. %M 39561363 %R 10.2196/59439 %U https://www.jmir.org/2024/1/e59439 %U https://doi.org/10.2196/59439 %U http://www.ncbi.nlm.nih.gov/pubmed/39561363 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e58892 %T AI-Based Noninvasive Blood Glucose Monitoring: Scoping Review %A Chan,Pin Zhong %A Jin,Eric %A Jansson,Miia %A Chew,Han Shi Jocelyn %+ Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, 10 Medical Drive, Singapore, 117597, Singapore, 65 65168687, jocelyn.chew.hs@nus.edu.sg %K artificial intelligence %K blood glucose %K diabetes %K noninvasive %K self-monitoring %K machine learning %K scoping review %K monitoring %K management %K health informatics %K deep learning %K accuracy %K heterogeneity %K mobile phone %D 2024 %7 19.11.2024 %9 Review %J J Med Internet Res %G English %X Background: Current blood glucose monitoring (BGM) methods are often invasive and require repetitive pricking of a finger to obtain blood samples, predisposing individuals to pain, discomfort, and infection. Noninvasive blood glucose monitoring (NIBGM) is ideal for minimizing discomfort, reducing the risk of infection, and increasing convenience. Objective: This review aimed to map the use cases of artificial intelligence (AI) in NIBGM. Methods: A systematic scoping review was conducted according to the Arksey O’Malley five-step framework. Eight electronic databases (CINAHL, Embase, PubMed, Web of Science, Scopus, The Cochrane-Central Library, ACM Digital Library, and IEEE Xplore) were searched from inception until February 8, 2023. Study selection was conducted by 2 independent reviewers, descriptive analysis was conducted, and findings were presented narratively. Study characteristics (author, country, type of publication, study design, population characteristics, mean age, types of noninvasive techniques used, and application, as well as characteristics of the BGM systems) were extracted independently and cross-checked by 2 investigators. Methodological quality appraisal was conducted using the Checklist for assessment of medical AI. Results: A total of 33 papers were included, representing studies from Asia, the United States, Europe, the Middle East, and Africa published between 2005 and 2023. Most studies used optical techniques (n=19, 58%) to estimate blood glucose levels (n=27, 82%). Others used electrochemical sensors (n=4), imaging (n=2), mixed techniques (n=2), and tissue impedance (n=1). Accuracy ranged from 35.56% to 94.23% and Clarke error grid (A+B) ranged from 86.91% to 100%. The most popular machine learning algorithm used was random forest (n=10) and the most popular deep learning model was the artificial neural network (n=6). The mean overall checklist for assessment of medical AI score on the included papers was 33.5 (SD 3.09), suggesting an average of medium quality. The studies reviewed demonstrate that some AI techniques can accurately predict glucose levels from noninvasive sources while enhancing comfort and ease of use for patients. However, the overall range of accuracy was wide due to the heterogeneity of models and input data. Conclusions: Efforts are needed to standardize and regulate the use of AI technologies in BGM, as well as develop consensus guidelines and protocols to ensure the quality and safety of AI-assisted monitoring systems. The use of AI for NIBGM is a promising area of research that has the potential to revolutionize diabetes management. %M 39561353 %R 10.2196/58892 %U https://www.jmir.org/2024/1/e58892 %U https://doi.org/10.2196/58892 %U http://www.ncbi.nlm.nih.gov/pubmed/39561353 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e57754 %T Data Ownership in the AI-Powered Integrative Health Care Landscape %A Liu,Shuimei %A Guo,L Raymond %+ School of Juris Master, China University of Political Science and Law, 25 Xitucheng Rd, Hai Dian Qu, Beijing, 100088, China, 1 (734) 358 3970, shuiliu0802@alumni.iu.edu %K data ownership %K integrative healthcare %K artificial intelligence %K AI %K ownership %K data science %K governance %K consent %K privacy %K security %K access %K model %K framework %K transparency %D 2024 %7 19.11.2024 %9 Viewpoint %J JMIR Med Inform %G English %X In the rapidly advancing landscape of artificial intelligence (AI) within integrative health care (IHC), the issue of data ownership has become pivotal. This study explores the intricate dynamics of data ownership in the context of IHC and the AI era, presenting the novel Collaborative Healthcare Data Ownership (CHDO) framework. The analysis delves into the multifaceted nature of data ownership, involving patients, providers, researchers, and AI developers, and addresses challenges such as ambiguous consent, attribution of insights, and international inconsistencies. Examining various ownership models, including privatization and communization postulates, as well as distributed access control, data trusts, and blockchain technology, the study assesses their potential and limitations. The proposed CHDO framework emphasizes shared ownership, defined access and control, and transparent governance, providing a promising avenue for responsible and collaborative AI integration in IHC. This comprehensive analysis offers valuable insights into the complex landscape of data ownership in IHC and the AI era, potentially paving the way for ethical and sustainable advancements in data-driven health care. %M 39560980 %R 10.2196/57754 %U https://medinform.jmir.org/2024/1/e57754 %U https://doi.org/10.2196/57754 %U http://www.ncbi.nlm.nih.gov/pubmed/39560980 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e57641 %T Accuracy of Machine Learning in Discriminating Kawasaki Disease and Other Febrile Illnesses: Systematic Review and Meta-Analysis %A Zhu,Jinpu %A Yang,Fushuang %A Wang,Yang %A Wang,Zhongtian %A Xiao,Yao %A Wang,Lie %A Sun,Liping %+ Center of Children's Clinic, The Affiliated Hospital to Changchun University of Chinese Medicine, No. 185, Shenzhen Street, Economic and Technological Development Zone, Jilin, P.R.C., Changchun, 130022, China, 86 15948000551, slpcczyydx@sina.com %K machine learning %K artificial intelligence %K Kawasaki disease %K febrile illness %K coronary artery lesions %K systematic review %K meta-analysis %D 2024 %7 18.11.2024 %9 Review %J J Med Internet Res %G English %X Background: Kawasaki disease (KD) is an acute pediatric vasculitis that can lead to coronary artery aneurysms and severe cardiovascular complications, often presenting with obvious fever in the early stages. In current clinical practice, distinguishing KD from other febrile illnesses remains a significant challenge. In recent years, some researchers have explored the potential of machine learning (ML) methods for the differential diagnosis of KD versus other febrile illnesses, as well as for predicting coronary artery lesions (CALs) in people with KD. However, there is still a lack of systematic evidence to validate their effectiveness. Therefore, we have conducted the first systematic review and meta-analysis to evaluate the accuracy of ML in differentiating KD from other febrile illnesses and in predicting CALs in people with KD, so as to provide evidence-based support for the application of ML in the diagnosis and treatment of KD. Objective: This study aimed to summarize the accuracy of ML in differentiating KD from other febrile illnesses and predicting CALs in people with KD. Methods: PubMed, Cochrane Library, Embase, and Web of Science were systematically searched until September 26, 2023. The risk of bias in the included original studies was appraised using the Prediction Model Risk of Bias Assessment Tool (PROBAST). Stata (version 15.0; StataCorp) was used for the statistical analysis. Results: A total of 29 studies were incorporated. Of them, 20 used ML to differentiate KD from other febrile illnesses. These studies involved a total of 103,882 participants, including 12,541 people with KD. In the validation set, the pooled concordance index, sensitivity, and specificity were 0.898 (95% CI 0.874-0.922), 0.91 (95% CI 0.83-0.95), and 0.86 (95% CI 0.80-0.90), respectively. Meanwhile, 9 studies used ML for early prediction of the risk of CALs in children with KD. These studies involved a total of 6503 people with KD, of whom 986 had CALs. The pooled concordance index in the validation set was 0.787 (95% CI 0.738-0.835). Conclusions: The diagnostic and predictive factors used in the studies we included were primarily derived from common clinical data. The ML models constructed based on these clinical data demonstrated promising effectiveness in differentiating KD from other febrile illnesses and in predicting coronary artery lesions. Therefore, in future research, we can explore the use of ML methods to identify more efficient predictors and develop tools that can be applied on a broader scale for the differentiation of KD and the prediction of CALs. %M 39556821 %R 10.2196/57641 %U https://www.jmir.org/2024/1/e57641 %U https://doi.org/10.2196/57641 %U http://www.ncbi.nlm.nih.gov/pubmed/39556821 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e49724 %T Task-Specific Transformer-Based Language Models in Health Care: Scoping Review %A Cho,Ha Na %A Jun,Tae Joon %A Kim,Young-Hak %A Kang,Heejun %A Ahn,Imjin %A Gwon,Hansle %A Kim,Yunha %A Seo,Jiahn %A Choi,Heejung %A Kim,Minkyoung %A Han,Jiye %A Kee,Gaeun %A Park,Seohyun %A Ko,Soyoung %+ Big Data Research Center, Asan Institute for Life Sciences, Asan Medical Center, 88, Olympicro 43gil, Songpagu, Seoul, 05505, Republic of Korea, 82 10 2956 6101, saigram89@gmail.com %K transformer-based language models %K medicine %K health care %K medical language model %D 2024 %7 18.11.2024 %9 Review %J JMIR Med Inform %G English %X Background: Transformer-based language models have shown great potential to revolutionize health care by advancing clinical decision support, patient interaction, and disease prediction. However, despite their rapid development, the implementation of transformer-based language models in health care settings remains limited. This is partly due to the lack of a comprehensive review, which hinders a systematic understanding of their applications and limitations. Without clear guidelines and consolidated information, both researchers and physicians face difficulties in using these models effectively, resulting in inefficient research efforts and slow integration into clinical workflows. Objective: This scoping review addresses this gap by examining studies on medical transformer-based language models and categorizing them into 6 tasks: dialogue generation, question answering, summarization, text classification, sentiment analysis, and named entity recognition. Methods: We conducted a scoping review following the Cochrane scoping review protocol. A comprehensive literature search was performed across databases, including Google Scholar and PubMed, covering publications from January 2017 to September 2024. Studies involving transformer-derived models in medical tasks were included. Data were categorized into 6 key tasks. Results: Our key findings revealed both advancements and critical challenges in applying transformer-based models to health care tasks. For example, models like MedPIR involving dialogue generation show promise but face privacy and ethical concerns, while question-answering models like BioBERT improve accuracy but struggle with the complexity of medical terminology. The BioBERTSum summarization model aids clinicians by condensing medical texts but needs better handling of long sequences. Conclusions: This review attempted to provide a consolidated understanding of the role of transformer-based language models in health care and to guide future research directions. By addressing current challenges and exploring the potential for real-world applications, we envision significant improvements in health care informatics. Addressing the identified challenges and implementing proposed solutions can enable transformer-based language models to significantly improve health care delivery and patient outcomes. Our review provides valuable insights for future research and practical applications, setting the stage for transformative advancements in medical informatics. %M 39556827 %R 10.2196/49724 %U https://medinform.jmir.org/2024/1/e49724 %U https://doi.org/10.2196/49724 %U http://www.ncbi.nlm.nih.gov/pubmed/39556827 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 13 %N %P e53616 %T Benefits and Risks of AI in Health Care: Narrative Review %A Chustecki,Margaret %+ Department of Internal Medicine, Yale School of Medicine, 1952 Whitney Ave, 3rd Floor, New Haven, CT, 06510, United States, 1 2038091700, margaret.chustecki@imgnh.com %K artificial intelligence %K safety risks %K biases %K AI %K benefit %K risk %K health care %K safety %K ethics %K transparency %K data privacy %K accuracy %D 2024 %7 18.11.2024 %9 Review %J Interact J Med Res %G English %X Background: The integration of artificial intelligence (AI) into health care has the potential to transform the industry, but it also raises ethical, regulatory, and safety concerns. This review paper provides an in-depth examination of the benefits and risks associated with AI in health care, with a focus on issues like biases, transparency, data privacy, and safety. Objective: This study aims to evaluate the advantages and drawbacks of incorporating AI in health care. This assessment centers on the potential biases in AI algorithms, transparency challenges, data privacy issues, and safety risks in health care settings. Methods: Studies included in this review were selected based on their relevance to AI applications in health care, focusing on ethical, regulatory, and safety considerations. Inclusion criteria encompassed peer-reviewed articles, reviews, and relevant research papers published in English. Exclusion criteria included non–peer-reviewed articles, editorials, and studies not directly related to AI in health care. A comprehensive literature search was conducted across 8 databases: OVID MEDLINE, OVID Embase, OVID PsycINFO, EBSCO CINAHL Plus with Full Text, ProQuest Sociological Abstracts, ProQuest Philosopher’s Index, ProQuest Advanced Technologies & Aerospace, and Wiley Cochrane Library. The search was last updated on June 23, 2023. Results were synthesized using qualitative methods to identify key themes and findings related to the benefits and risks of AI in health care. Results: The literature search yielded 8796 articles. After removing duplicates and applying the inclusion and exclusion criteria, 44 studies were included in the qualitative synthesis. This review highlights the significant promise that AI holds in health care, such as enhancing health care delivery by providing more accurate diagnoses, personalized treatment plans, and efficient resource allocation. However, persistent concerns remain, including biases ingrained in AI algorithms, a lack of transparency in decision-making, potential compromises of patient data privacy, and safety risks associated with AI implementation in clinical settings. Conclusions: In conclusion, while AI presents the opportunity for a health care revolution, it is imperative to address the ethical, regulatory, and safety challenges linked to its integration. Proactive measures are required to ensure that AI technologies are developed and deployed responsibly, striking a balance between innovation and the safeguarding of patient well-being. %M 39556817 %R 10.2196/53616 %U https://www.i-jmr.org/2024/1/e53616 %U https://doi.org/10.2196/53616 %U http://www.ncbi.nlm.nih.gov/pubmed/39556817 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e63356 %T EDAI Framework for Integrating Equity, Diversity, and Inclusion Throughout the Lifecycle of AI to Improve Health and Oral Health Care: Qualitative Study %A Abbasgholizadeh Rahimi,Samira %A Shrivastava,Richa %A Brown-Johnson,Anita %A Caidor,Pascale %A Davies,Claire %A Idrissi Janati,Amal %A Kengne Talla,Pascaline %A Madathil,Sreenath %A Willie,Bettina M %A Emami,Elham %+ Department of Family Medicine, McGill University, 5858 Chemin de la Côte-des-Neiges, Montreal, QC, H3S 1Z1, Canada, 1 514 399 9218, samira.rahimi@mcgill.ca %K equity, diversity, and inclusion %K EDI %K health care %K oral health care %K machine learning %K artificial intelligence %K AI %D 2024 %7 15.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Recent studies have identified significant gaps in equity, diversity, and inclusion (EDI) considerations within the lifecycle of artificial intelligence (AI), spanning from data collection and problem definition to implementation stages. Despite the recognized need for integrating EDI principles, there is currently no existing guideline or framework to support this integration in the AI lifecycle. Objective: This study aimed to address this gap by identifying EDI principles and indicators to be integrated into the AI lifecycle. The goal was to develop a comprehensive guiding framework to guide the development and implementation of future AI systems. Methods: This study was conducted in 3 phases. In phase 1, a comprehensive systematic scoping review explored how EDI principles have been integrated into AI in health and oral health care settings. In phase 2, a multidisciplinary team was established, and two 2-day, in-person international workshops with over 60 representatives from diverse backgrounds, expertise, and communities were conducted. The workshops included plenary presentations, round table discussions, and focused group discussions. In phase 3, based on the workshops’ insights, the EDAI framework was developed and refined through iterative feedback from participants. The results of the initial systematic scoping review have been published separately, and this paper focuses on subsequent phases of the project, which is related to framework development. Results: In this study, we developed the EDAI framework, a comprehensive guideline that integrates EDI principles and indicators throughout the entire AI lifecycle. This framework addresses existing gaps at various stages, from data collection to implementation, and focuses on individual, organizational, and systemic levels. Additionally, we identified both the facilitators and barriers to integrating EDI within the AI lifecycle in health and oral health care. Conclusions: The developed EDAI framework provides a comprehensive, actionable guideline for integrating EDI principles into AI development and deployment. By facilitating the systematic incorporation of these principles, the framework supports the creation and implementation of AI systems that are not only technologically advanced but also sensitive to EDI principles. %M 39546793 %R 10.2196/63356 %U https://www.jmir.org/2024/1/e63356 %U https://doi.org/10.2196/63356 %U http://www.ncbi.nlm.nih.gov/pubmed/39546793 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e59225 %T AI for Analyzing Mental Health Disorders Among Social Media Users: Quarter-Century Narrative Review of Progress and Challenges %A Owen,David %A Lynham,Amy J %A Smart,Sophie E %A Pardiñas,Antonio F %A Camacho Collados,Jose %+ School of Computer Science and Informatics, Cardiff University, Abacws, Senghennydd Road, Cardiff, CF24 4AG, United Kingdom, 44 (0)29 2087 4812, owendw1@cardiff.ac.uk %K mental health %K depression %K anxiety %K schizophrenia %K social media %K natural language processing %K narrative review %D 2024 %7 15.11.2024 %9 Review %J J Med Internet Res %G English %X Background: Mental health disorders are currently the main contributor to poor quality of life and years lived with disability. Symptoms common to many mental health disorders lead to impairments or changes in the use of language, which are observable in the routine use of social media. Detection of these linguistic cues has been explored throughout the last quarter century, but interest and methodological development have burgeoned following the COVID-19 pandemic. The next decade may see the development of reliable methods for predicting mental health status using social media data. This might have implications for clinical practice and public health policy, particularly in the context of early intervention in mental health care. Objective: This study aims to examine the state of the art in methods for predicting mental health statuses of social media users. Our focus is the development of artificial intelligence–driven methods, particularly natural language processing, for analyzing large volumes of written text. This study details constraints affecting research in this area. These include the dearth of high-quality public datasets for methodological benchmarking and the need to adopt ethical and privacy frameworks acknowledging the stigma experienced by those with a mental illness. Methods: A Google Scholar search yielded peer-reviewed articles dated between 1999 and 2024. We manually grouped the articles by 4 primary areas of interest: datasets on social media and mental health, methods for predicting mental health status, longitudinal analyses of mental health, and ethical aspects of the data and analysis of mental health. Selected articles from these groups formed our narrative review. Results: Larger datasets with precise dates of participants’ diagnoses are needed to support the development of methods for predicting mental health status, particularly in severe disorders such as schizophrenia. Inviting users to donate their social media data for research purposes could help overcome widespread ethical and privacy concerns. In any event, multimodal methods for predicting mental health status appear likely to provide advancements that may not be achievable using natural language processing alone. Conclusions: Multimodal methods for predicting mental health status from voice, image, and video-based social media data need to be further developed before they may be considered for adoption in health care, medical support, or as consumer-facing products. Such methods are likely to garner greater public confidence in their efficacy than those that rely on text alone. To achieve this, more high-quality social media datasets need to be made available and privacy concerns regarding the use of these data must be formally addressed. A social media platform feature that invites users to share their data upon publication is a possible solution. Finally, a review of literature studying the effects of social media use on a user’s depression and anxiety is merited. %M 39546783 %R 10.2196/59225 %U https://www.jmir.org/2024/1/e59225 %U https://doi.org/10.2196/59225 %U http://www.ncbi.nlm.nih.gov/pubmed/39546783 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e59607 %T Examining the Role of Large Language Models in Orthopedics: Systematic Review %A Zhang,Cheng %A Liu,Shanshan %A Zhou,Xingyu %A Zhou,Siyu %A Tian,Yinglun %A Wang,Shenglin %A Xu,Nanfang %A Li,Weishi %+ Department of Orthopaedics, Peking University Third Hospital, 49 North Garden Road, Beijing, 100191, China, 86 01082267360, puh3liweishi@163.com %K large language model %K LLM %K orthopedics %K generative pretrained transformer %K GPT %K ChatGPT %K digital health %K clinical practice %K artificial intelligence %K AI %K generative AI %K Bard %D 2024 %7 15.11.2024 %9 Review %J J Med Internet Res %G English %X Background: Large language models (LLMs) can understand natural language and generate corresponding text, images, and even videos based on prompts, which holds great potential in medical scenarios. Orthopedics is a significant branch of medicine, and orthopedic diseases contribute to a significant socioeconomic burden, which could be alleviated by the application of LLMs. Several pioneers in orthopedics have conducted research on LLMs across various subspecialties to explore their performance in addressing different issues. However, there are currently few reviews and summaries of these studies, and a systematic summary of existing research is absent. Objective: The objective of this review was to comprehensively summarize research findings on the application of LLMs in the field of orthopedics and explore the potential opportunities and challenges. Methods: PubMed, Embase, and Cochrane Library databases were searched from January 1, 2014, to February 22, 2024, with the language limited to English. The terms, which included variants of “large language model,” “generative artificial intelligence,” “ChatGPT,” and “orthopaedics,” were divided into 2 categories: large language model and orthopedics. After completing the search, the study selection process was conducted according to the inclusion and exclusion criteria. The quality of the included studies was assessed using the revised Cochrane risk-of-bias tool for randomized trials and CONSORT-AI (Consolidated Standards of Reporting Trials–Artificial Intelligence) guidance. Data extraction and synthesis were conducted after the quality assessment. Results: A total of 68 studies were selected. The application of LLMs in orthopedics involved the fields of clinical practice, education, research, and management. Of these 68 studies, 47 (69%) focused on clinical practice, 12 (18%) addressed orthopedic education, 8 (12%) were related to scientific research, and 1 (1%) pertained to the field of management. Of the 68 studies, only 8 (12%) recruited patients, and only 1 (1%) was a high-quality randomized controlled trial. ChatGPT was the most commonly mentioned LLM tool. There was considerable heterogeneity in the definition, measurement, and evaluation of the LLMs’ performance across the different studies. For diagnostic tasks alone, the accuracy ranged from 55% to 93%. When performing disease classification tasks, ChatGPT with GPT-4’s accuracy ranged from 2% to 100%. With regard to answering questions in orthopedic examinations, the scores ranged from 45% to 73.6% due to differences in models and test selections. Conclusions: LLMs cannot replace orthopedic professionals in the short term. However, using LLMs as copilots could be a potential approach to effectively enhance work efficiency at present. More high-quality clinical trials are needed in the future, aiming to identify optimal applications of LLMs and advance orthopedics toward higher efficiency and precision. %M 39546795 %R 10.2196/59607 %U https://www.jmir.org/2024/1/e59607 %U https://doi.org/10.2196/59607 %U http://www.ncbi.nlm.nih.gov/pubmed/39546795 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e60057 %T Decoding the Digital Pulse: Bibliometric Analysis of 25 Years in Digital Health Research Through the Journal of Medical Internet Research %A Kaczmarczyk,Robert %A Wilhelm,Theresa Isabelle %A Roos,Jonas %A Martin,Ron %+ Eye Center—Medical Center, Faculty of Medicine, Albert-Ludwigs-University of Freiburg, Killianstraße 5, Freiburg, 79106, Germany, 49 76127040020, theresa.wilhelm@uniklinik-freiburg.de %K digital health %K JMIR publication analysis %K network analysis %K artificial intelligence %K AI %K large language models %K eHealth %K Claude 3 Opus %K digital %K digital technology %K digital intervention %K machine learning %K natural language processing %K NLP %K deep learning %K algorithm %K model %K analytics %K practical model %K pandemic %K postpandemic era %K mobile phone %D 2024 %7 15.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: As the digital health landscape continues to evolve, analyzing the progress and direction of the field can yield valuable insights. The Journal of Medical Internet Research (JMIR) has been at the forefront of disseminating digital health research since 1999. A comprehensive network analysis of JMIR publications can help illuminate the evolution and trends in digital medicine over the past 25 years. Objective: This study aims to conduct a detailed network analysis of JMIR’s publications to uncover the growth patterns, dominant themes, and potential future trajectories in digital health research. Methods: We retrieved 8068 JMIR papers from PubMed using the Biopython library. Keyword metrics were assessed using accuracy, recall, and F1-scores to evaluate the effectiveness of keyword identification from Claude 3 Opus and Gemini 1.5 Pro in addition to 2 conventional natural language processing methods using key bidirectional encoder representations from transformers. Future trends for 2024-2026 were predicted using Claude 3 Opus, Google’s Time Series Foundation Model, autoregressive integrated moving average, exponential smoothing, and Prophet. Network visualization techniques were used to represent and analyze the complex relationships between collaborating countries, paper types, and keyword co-occurrence. Results: JMIR’s publication volume showed consistent growth, with a peak in 2020. The United States dominated country contributions, with China showing a notable increase in recent years. Keyword analysis from 1999 to 2023 showed significant thematic shifts, from an early internet and digital health focus to the dominance of COVID-19 and advanced technologies such as machine learning. Predictions for 2024-2026 suggest an increased focus on artificial intelligence, digital health, and mental health. Conclusions: Network analysis of JMIR publications provides a macroscopic view of the evolution of the digital health field. The journal’s trajectory reflects broader technological advances and shifting research priorities, including the impact of the COVID-19 pandemic. The predicted trends underscore the growing importance of computational technology in future health care research and practice. The findings from JMIR provide a glimpse into the future of digital medicine, suggesting a robust integration of artificial intelligence and continued emphasis on mental health in the postpandemic era. %M 39546778 %R 10.2196/60057 %U https://www.jmir.org/2024/1/e60057 %U https://doi.org/10.2196/60057 %U http://www.ncbi.nlm.nih.gov/pubmed/39546778 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51432 %T Advancements in Using AI for Dietary Assessment Based on Food Images: Scoping Review %A Chotwanvirat,Phawinpon %A Prachansuwan,Aree %A Sridonpai,Pimnapanut %A Kriengsinyos,Wantanee %+ Human Nutrition Unit, Food and Nutrition Academic and Research Cluster, Institute of Nutrition, Mahidol University, 999 Phutthamonthon 4 Rd., Salaya, Nakhon Pathom, 73170, Thailand, 66 2 800 2380, wantanee.krieng@mahidol.ac.th %K image-assisted dietary assessment %K artificial intelligence %K dietary assessment %K mobile phone %K food intake %K image recognition %K portion size %D 2024 %7 15.11.2024 %9 Review %J J Med Internet Res %G English %X Background: To accurately capture an individual’s food intake, dietitians are often required to ask clients about their food frequencies and portions, and they have to rely on the client’s memory, which can be burdensome. While taking food photos alongside food records can alleviate user burden and reduce errors in self-reporting, this method still requires trained staff to translate food photos into dietary intake data. Image-assisted dietary assessment (IADA) is an innovative approach that uses computer algorithms to mimic human performance in estimating dietary information from food images. This field has seen continuous improvement through advancements in computer science, particularly in artificial intelligence (AI). However, the technical nature of this field can make it challenging for those without a technical background to understand it completely. Objective: This review aims to fill the gap by providing a current overview of AI’s integration into dietary assessment using food images. The content is organized chronologically and presented in an accessible manner for those unfamiliar with AI terminology. In addition, we discuss the systems’ strengths and weaknesses and propose enhancements to improve IADA’s accuracy and adoption in the nutrition community. Methods: This scoping review used PubMed and Google Scholar databases to identify relevant studies. The review focused on computational techniques used in IADA, specifically AI models, devices, and sensors, or digital methods for food recognition and food volume estimation published between 2008 and 2021. Results: A total of 522 articles were initially identified. On the basis of a rigorous selection process, 84 (16.1%) articles were ultimately included in this review. The selected articles reveal that early systems, developed before 2015, relied on handcrafted machine learning algorithms to manage traditional sequential processes, such as segmentation, food identification, portion estimation, and nutrient calculations. Since 2015, these handcrafted algorithms have been largely replaced by deep learning algorithms for handling the same tasks. More recently, the traditional sequential process has been superseded by advanced algorithms, including multitask convolutional neural networks and generative adversarial networks. Most of the systems were validated for macronutrient and energy estimation, while only a few were capable of estimating micronutrients, such as sodium. Notably, significant advancements have been made in the field of IADA, with efforts focused on replicating humanlike performance. Conclusions: This review highlights the progress made by IADA, particularly in the areas of food identification and portion estimation. Advancements in AI techniques have shown great potential to improve the accuracy and efficiency of this field. However, it is crucial to involve dietitians and nutritionists in the development of these systems to ensure they meet the requirements and trust of professionals in the field. %M 39546777 %R 10.2196/51432 %U https://www.jmir.org/2024/1/e51432 %U https://doi.org/10.2196/51432 %U http://www.ncbi.nlm.nih.gov/pubmed/39546777 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 7 %N %P e58175 %T Quantifying the Enhancement of Sarcopenic Skeletal Muscle Preservation Through a Hybrid Exercise Program: Randomized Controlled Trial %A Guo,Hongzhi %A Cao,Jianwei %A He,Shichun %A Wei,Meiqi %A Meng,Deyu %A Yu,Ichen %A Wang,Ziyi %A Chang,Xinyi %A Yang,Guang %A Wang,Ziheng %K sarcopenia %K older adults %K physical exercise program %K explainable artificial intelligence %K tai chi %D 2024 %7 15.11.2024 %9 %J JMIR Aging %G English %X Background: Sarcopenia is characterized by the loss of skeletal muscle mass and muscle function with increasing age. The skeletal muscle mass of older people who endure sarcopenia may be improved via the practice of strength training and tai chi. However, it remains unclear if the hybridization of strength exercise training and traditional Chinese exercise will have a better effect. Objective: We designed a strength training and tai chi exercise hybrid program to improve sarcopenia in older people. Moreover, explainable artificial intelligence was used to predict postintervention sarcopenic status and quantify the feature contribution. Methods: To assess the influence of sarcopenia in the older people group, 93 participated as experimental participants in a 24-week randomized controlled trial and were randomized into 3 intervention groups, namely the tai chi exercise and strength training hybrid group (TCSG; n=33), the strength training group (STG; n=30), and the control group (n=30). Abdominal computed tomography was used to evaluate the skeletal muscle mass at the third lumbar (L3) vertebra. Analysis of demographic characteristics of participants at baseline used 1-way ANOVA and χ2 tests, and repeated-measures ANOVA was used to analyze experimental data. In addition, 10 machine-learning classification models were used to calculate if these participants could reverse the degree of sarcopenia after the intervention. Results: A significant interaction effect was found in skeletal muscle density at the L3 vertebra, skeletal muscle area at the L3 vertebra (L3 SMA), grip strength, muscle fat infiltration, and relative skeletal muscle mass index (all P values were <.05). Grip strength, relative skeletal muscle mass index, and L3 SMA were significantly improved after the intervention for participants in the TCSG and STG (all P values were <.05). After post hoc tests, we found that participants in the TCSG experienced a better effect on L3 SMA than those in the STG and participants in the control group. The LightGBM classification model had the greatest performance in accuracy (88.4%), recall score (74%), and F1-score (76.1%). Conclusions: The skeletal muscle area of older adults with sarcopenia may be improved by a hybrid exercise program composed of strength training and tai chi. In addition, we identified that the LightGBM classification model had the best performance to predict the reversion of sarcopenia. Trial Registration: ClinicalTrials.gov NCT05694117; https://clinicaltrials.gov/study/NCT05694117 %R 10.2196/58175 %U https://aging.jmir.org/2024/1/e58175 %U https://doi.org/10.2196/58175 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e47774 %T Machine Learning Methods to Personalize Persuasive Strategies in mHealth Interventions That Promote Physical Activity: Scoping Review and Categorization Overview %A Brons,Annette %A Wang,Shihan %A Visser,Bart %A Kröse,Ben %A Bakkes,Sander %A Veltkamp,Remco %+ Department of Information and Computing Sciences, Utrecht University, Princetonplein 5, Utrecht, 3584 CC, Netherlands, 31 621156976, R.C.Veltkamp@uu.nl %K artificial intelligence %K exercise %K mobile app %K adaptive %K tailoring %K supervised learning %K reinforcement learning %K recommender system %D 2024 %7 15.11.2024 %9 Review %J J Med Internet Res %G English %X Background: Although physical activity (PA) has positive effects on health and well-being, physical inactivity is a worldwide problem. Mobile health interventions have been shown to be effective in promoting PA. Personalizing persuasive strategies improves intervention success and can be conducted using machine learning (ML). For PA, several studies have addressed personalized persuasive strategies without ML, whereas others have included personalization using ML without focusing on persuasive strategies. An overview of studies discussing ML to personalize persuasive strategies in PA-promoting interventions and corresponding categorizations could be helpful for such interventions to be designed in the future but is still missing. Objective: First, we aimed to provide an overview of implemented ML techniques to personalize persuasive strategies in mobile health interventions promoting PA. Moreover, we aimed to present a categorization overview as a starting point for applying ML techniques in this field. Methods: A scoping review was conducted based on the framework by Arksey and O’Malley and the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) criteria. Scopus, Web of Science, and PubMed were searched for studies that included ML to personalize persuasive strategies in interventions promoting PA. Papers were screened using the ASReview software. From the included papers, categorized by the research project they belonged to, we extracted data regarding general study information, target group, PA intervention, implemented technology, and study details. On the basis of the analysis of these data, a categorization overview was given. Results: In total, 40 papers belonging to 27 different projects were included. These papers could be categorized in 4 groups based on their dimension of personalization. Then, for each dimension, 1 or 2 persuasive strategy categories were found together with a type of ML. The overview resulted in a categorization consisting of 3 levels: dimension of personalization, persuasive strategy, and type of ML. When personalizing the timing of the messages, most projects implemented reinforcement learning to personalize the timing of reminders and supervised learning (SL) to personalize the timing of feedback, monitoring, and goal-setting messages. Regarding the content of the messages, most projects implemented SL to personalize PA suggestions and feedback or educational messages. For personalizing PA suggestions, SL can be implemented either alone or combined with a recommender system. Finally, reinforcement learning was mostly used to personalize the type of feedback messages. Conclusions: The overview of all implemented persuasive strategies and their corresponding ML methods is insightful for this interdisciplinary field. Moreover, it led to a categorization overview that provides insights into the design and development of personalized persuasive strategies to promote PA. In future papers, the categorization overview might be expanded with additional layers to specify ML methods or additional dimensions of personalization and persuasive strategies. %M 39546334 %R 10.2196/47774 %U https://www.jmir.org/2024/1/e47774 %U https://doi.org/10.2196/47774 %U http://www.ncbi.nlm.nih.gov/pubmed/39546334 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e56762 %T Evaluating AI Competence in Specialized Medicine: Comparative Analysis of ChatGPT and Neurologists in a Neurology Specialist Examination in Spain %A Ros-Arlanzón,Pablo %A Perez-Sempere,Angel %K artificial intelligence %K ChatGPT %K clinical decision-making %K medical education %K medical knowledge assessment %K OpenAI %D 2024 %7 14.11.2024 %9 %J JMIR Med Educ %G English %X Background: With the rapid advancement of artificial intelligence (AI) in various fields, evaluating its application in specialized medical contexts becomes crucial. ChatGPT, a large language model developed by OpenAI, has shown potential in diverse applications, including medicine. Objective: This study aims to compare the performance of ChatGPT with that of attending neurologists in a real neurology specialist examination conducted in the Valencian Community, Spain, assessing the AI’s capabilities and limitations in medical knowledge. Methods: We conducted a comparative analysis using the 2022 neurology specialist examination results from 120 neurologists and responses generated by ChatGPT versions 3.5 and 4. The examination consisted of 80 multiple-choice questions, with a focus on clinical neurology and health legislation. Questions were classified according to Bloom’s Taxonomy. Statistical analysis of performance, including the κ coefficient for response consistency, was performed. Results: Human participants exhibited a median score of 5.91 (IQR: 4.93-6.76), with 32 neurologists failing to pass. ChatGPT-3.5 ranked 116th out of 122, answering 54.5% of questions correctly (score 3.94). ChatGPT-4 showed marked improvement, ranking 17th with 81.8% of correct answers (score 7.57), surpassing several human specialists. No significant variations were observed in the performance on lower-order questions versus higher-order questions. Additionally, ChatGPT-4 demonstrated increased interrater reliability, as reflected by a higher κ coefficient of 0.73, compared to ChatGPT-3.5’s coefficient of 0.69. Conclusions: This study underscores the evolving capabilities of AI in medical knowledge assessment, particularly in specialized fields. ChatGPT-4’s performance, outperforming the median score of human participants in a rigorous neurology examination, represents a significant milestone in AI development, suggesting its potential as an effective tool in specialized medical education and assessment. %R 10.2196/56762 %U https://mededu.jmir.org/2024/1/e56762 %U https://doi.org/10.2196/56762 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e64226 %T Economics and Equity of Large Language Models: Health Care Perspective %A Nagarajan,Radha %A Kondo,Midori %A Salas,Franz %A Sezgin,Emre %A Yao,Yuan %A Klotzman,Vanessa %A Godambe,Sandip A %A Khan,Naqi %A Limon,Alfonso %A Stephenson,Graham %A Taraman,Sharief %A Walton,Nephi %A Ehwerhemuepha,Louis %A Pandit,Jay %A Pandita,Deepti %A Weiss,Michael %A Golden,Charles %A Gold,Adam %A Henderson,John %A Shippy,Angela %A Celi,Leo Anthony %A Hogan,William R %A Oermann,Eric K %A Sanger,Terence %A Martel,Steven %+ Children's Hospital of Orange County, 1201 W. La Veta Ave, Orange, CA, 92868, United States, 1 714 997 3000, Radha.Nagarajan@choc.org %K large language model %K LLM %K health care %K economics %K equity %K cloud service providers %K cloud %K health outcome %K implementation %K democratization %D 2024 %7 14.11.2024 %9 Viewpoint %J J Med Internet Res %G English %X Large language models (LLMs) continue to exhibit noteworthy capabilities across a spectrum of areas, including emerging proficiencies across the health care continuum. Successful LLM implementation and adoption depend on digital readiness, modern infrastructure, a trained workforce, privacy, and an ethical regulatory landscape. These factors can vary significantly across health care ecosystems, dictating the choice of a particular LLM implementation pathway. This perspective discusses 3 LLM implementation pathways—training from scratch pathway (TSP), fine-tuned pathway (FTP), and out-of-the-box pathway (OBP)—as potential onboarding points for health systems while facilitating equitable adoption. The choice of a particular pathway is governed by needs as well as affordability. Therefore, the risks, benefits, and economics of these pathways across 4 major cloud service providers (Amazon, Microsoft, Google, and Oracle) are presented. While cost comparisons, such as on-demand and spot pricing across the cloud service providers for the 3 pathways, are presented for completeness, the usefulness of managed services and cloud enterprise tools is elucidated. Managed services can complement the traditional workforce and expertise, while enterprise tools, such as federated learning, can overcome sample size challenges when implementing LLMs using health care data. Of the 3 pathways, TSP is expected to be the most resource-intensive regarding infrastructure and workforce while providing maximum customization, enhanced transparency, and performance. Because TSP trains the LLM using enterprise health care data, it is expected to harness the digital signatures of the population served by the health care system with the potential to impact outcomes. The use of pretrained models in FTP is a limitation. It may impact its performance because the training data used in the pretrained model may have hidden bias and may not necessarily be health care–related. However, FTP provides a balance between customization, cost, and performance. While OBP can be rapidly deployed, it provides minimal customization and transparency without guaranteeing long-term availability. OBP may also present challenges in interfacing seamlessly with downstream applications in health care settings with variations in pricing and use over time. Lack of customization in OBP can significantly limit its ability to impact outcomes. Finally, potential applications of LLMs in health care, including conversational artificial intelligence, chatbots, summarization, and machine translation, are highlighted. While the 3 implementation pathways discussed in this perspective have the potential to facilitate equitable adoption and democratization of LLMs, transitions between them may be necessary as the needs of health systems evolve. Understanding the economics and trade-offs of these onboarding pathways can guide their strategic adoption and demonstrate value while impacting health care outcomes favorably. %M 39541580 %R 10.2196/64226 %U https://www.jmir.org/2024/1/e64226 %U https://doi.org/10.2196/64226 %U http://www.ncbi.nlm.nih.gov/pubmed/39541580 %0 Journal Article %@ 2817-092X %I JMIR Publications %V 3 %N %P e59556 %T Twenty-Five Years of AI in Neurology: The Journey of Predictive Medicine and Biological Breakthroughs %A Gutman,Barak %A Shmilovitch,Amit-Haim %A Aran,Dvir %A Shelly,Shahar %+ Department of Neurology, Rambam Medical Center, HaAliya HaShniya St 8, Haifa, 3109601, Israel, 972 4 777 3568, shahar.shell@technion.ac.il %K neurology %K artificial intelligence %K telemedicine %K clinical advancements %K mobile phone %D 2024 %7 8.11.2024 %9 Viewpoint %J JMIR Neurotech %G English %X Neurological disorders are the leading cause of physical and cognitive disability across the globe, currently affecting up to 15% of the world population, with the burden of chronic neurodegenerative diseases having doubled over the last 2 decades. Two decades ago, neurologists relying solely on clinical signs and basic imaging faced challenges in diagnosis and treatment. Today, the integration of artificial intelligence (AI) and bioinformatic methods is changing this landscape. This paper explores this transformative journey, emphasizing the critical role of AI in neurology, aiming to integrate a multitude of methods and thereby enhance the field of neurology. Over the past 25 years, integrating biomedical data science into medicine, particularly neurology, has fundamentally transformed how we understand, diagnose, and treat neurological diseases. Advances in genomics sequencing, the introduction of new imaging methods, the discovery of novel molecular biomarkers for nervous system function, a comprehensive understanding of immunology and neuroimmunology shaping disease subtypes, and the advent of advanced electrophysiological recording methods, alongside the digitalization of medical records and the rise of AI, all led to an unparalleled surge in data within neurology. In addition, telemedicine and web-based interactive health platforms, accelerated by the COVID-19 pandemic, have become integral to neurology practice. The real-world impact of these advancements is evident, with AI-driven analysis of imaging and genetic data leading to earlier and more accurate diagnoses of conditions such as multiple sclerosis, Parkinson disease, amyotrophic lateral sclerosis, Alzheimer disease, and more. Neuroinformatics is the key component connecting all these advances. By harnessing the power of IT and computational methods to efficiently organize, analyze, and interpret vast datasets, we can extract meaningful insights from complex neurological data, contributing to a deeper understanding of the intricate workings of the brain. In this paper, we describe the large-scale datasets that have emerged in neurology over the last 25 years and showcase the major advancements made by integrating these datasets with advanced neuroinformatic approaches for the diagnosis and treatment of neurological disorders. We further discuss challenges in integrating AI into neurology, including ethical considerations in data use, the need for further personalization of treatment, and embracing new emerging technologies like quantum computing. These developments are shaping a future where neurological care is more precise, accessible, and tailored to individual patient needs. We believe further advancements in AI will bridge traditional medical disciplines and cutting-edge technology, navigating the complexities of neurological data and steering medicine toward a future of more precise, accessible, and patient-centric health care. %R 10.2196/59556 %U https://neuro.jmir.org/2024/1/e59556 %U https://doi.org/10.2196/59556 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e58039 %T Pioneering Klebsiella Pneumoniae Antibiotic Resistance Prediction With Artificial Intelligence-Clinical Decision Support System–Enhanced Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry: Retrospective Study %A Jian,Ming-Jr %A Lin,Tai-Han %A Chung,Hsing-Yi %A Chang,Chih-Kai %A Perng,Cherng-Lih %A Chang,Feng-Yee %A Shang,Hung-Sheng %+ Division of Clinical Pathology, Department of Pathology, Tri-Service General Hospital, National Defense Medical Center, No. 161, Sec. 6, Minquan E. Rd., Neihu Dist., Division of Clinical Pathology, Taipei City, 11490, Taiwan, 886 920713130, iamkeith001@gmail.com %K Klebsiella pneumoniae %K multidrug resistance %K AI-CDSS %K quinolone %K ciprofloxacin %K levofloxacin %D 2024 %7 7.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: The rising prevalence and swift spread of multidrug-resistant gram-negative bacteria (MDR-GNB), especially Klebsiella pneumoniae (KP), present a critical global health threat highlighted by the World Health Organization, with mortality rates soaring approximately 50% with inappropriate antimicrobial treatment. Objective: This study aims to advance a novel strategy to develop an artificial intelligence-clinical decision support system (AI-CDSS) that combines machine learning (ML) with matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS), aiming to significantly improve the accuracy and speed of diagnosing antibiotic resistance, directly addressing the grave health risks posed by the widespread dissemination of pan drug-resistant gram-negative bacteria across numerous countries. Methods: A comprehensive dataset comprising 165,299 bacterial specimens and 11,996 KP isolates was meticulously analyzed using MALDI-TOF MS technology. Advanced ML algorithms were harnessed to sculpt predictive models that ascertain resistance to quintessential antibiotics, particularly levofloxacin and ciprofloxacin, by using the amassed spectral data. Results: Our ML models revealed remarkable proficiency in forecasting antibiotic resistance, with the random forest classifier emerging as particularly effective in predicting resistance to both levofloxacin and ciprofloxacin, achieving the highest area under the curve of 0.95. Performance metrics across different models, including accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1-score, were detailed, underlining the potential of these algorithms in aiding the development of precision treatment strategies. Conclusions: This investigation highlights the synergy between MALDI-TOF MS and ML as a beacon of hope against the escalating threat of antibiotic resistance. The advent of AI-CDSS heralds a new era in clinical diagnostics, promising a future in which rapid and accurate resistance prediction becomes a cornerstone in combating infectious diseases. Through this innovative approach, we answered the challenge posed by KP and other multidrug-resistant pathogens, marking a significant milestone in our journey toward global health security. %M 39509693 %R 10.2196/58039 %U https://www.jmir.org/2024/1/e58039 %U https://doi.org/10.2196/58039 %U http://www.ncbi.nlm.nih.gov/pubmed/39509693 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e58413 %T Development and Validation of Deep Learning–Based Infectivity Prediction in Pulmonary Tuberculosis Through Chest Radiography: Retrospective Study %A Chung,Wou young %A Yoon,Jinsik %A Yoon,Dukyong %A Kim,Songsoo %A Kim,Yujeong %A Park,Ji Eun %A Kang,Young Ae %+ Department of Internal Medicine, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea, 82 2 2228 1954, mdkang@yuhs.ac %K pulmonary tuberculosis %K chest radiography %K artificial intelligence %K tuberculosis %K TB %K smear %K smear test %K culture test %K diagnosis %K treatment %K deep learning %K CXR %K PTB %K management %K cost effective %K asymptomatic infection %K diagnostic tools %K infectivity %K AI tool %K cohort %D 2024 %7 7.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Pulmonary tuberculosis (PTB) poses a global health challenge owing to the time-intensive nature of traditional diagnostic tests such as smear and culture tests, which can require hours to weeks to yield results. Objective: This study aimed to use artificial intelligence (AI)–based chest radiography (CXR) to evaluate the infectivity of patients with PTB more quickly and accurately compared with traditional methods such as smear and culture tests. Methods: We used DenseNet121 and visualization techniques such as gradient-weighted class activation mapping and local interpretable model-agnostic explanations to demonstrate the decision-making process of the model. We analyzed 36,142 CXR images of 4492 patients with PTB obtained from Severance Hospital, focusing specifically on the lung region through segmentation and cropping with TransUNet. We used data from 2004 to 2020 to train the model, data from 2021 for testing, and data from 2022 to 2023 for internal validation. In addition, we used 1978 CXR images of 299 patients with PTB obtained from Yongin Severance Hospital for external validation. Results: In the internal validation, the model achieved an accuracy of 73.27%, an area under the receiver operating characteristic curve of 0.79, and an area under the precision-recall curve of 0.77. In the external validation, it exhibited an accuracy of 70.29%, an area under the receiver operating characteristic curve of 0.77, and an area under the precision-recall curve of 0.8. In addition, gradient-weighted class activation mapping and local interpretable model-agnostic explanations provided insights into the decision-making process of the AI model. Conclusions: This proposed AI tool offers a rapid and accurate alternative for evaluating PTB infectivity through CXR, with significant implications for enhancing screening efficiency by evaluating infectivity before sputum test results in clinical settings, compared with traditional smear and culture tests. %M 39509691 %R 10.2196/58413 %U https://www.jmir.org/2024/1/e58413 %U https://doi.org/10.2196/58413 %U http://www.ncbi.nlm.nih.gov/pubmed/39509691 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e60441 %T The Journey From Nonimmersive to Immersive Multiuser Applications in Mental Health Care: Systematic Review %A Fajnerova,Iveta %A Hejtmánek,Lukáš %A Sedlák,Michal %A Jablonská,Markéta %A Francová,Anna %A Stopková,Pavla %+ Research Center for Virtual Reality in Mental Health and Neuroscience, National Institute of Mental Health, Topolová 748, Klecany, 250 67, Czech Republic, 420 283 088 478, Iveta.fajnerova@nudz.cz %K digital health %K mental health care %K clinical interventions %K multiuser %K immersive %K virtual reality %K VR %K app %K mental health %K online tools %K synthesis %K mobile phone %K PRISMA %D 2024 %7 7.11.2024 %9 Review %J J Med Internet Res %G English %X Background: Over the past 25 years, the development of multiuser applications has seen considerable advancements and challenges. The technological development in this field has emerged from simple chat rooms through videoconferencing tools to the creation of complex, interactive, and often multisensory virtual worlds. These multiuser technologies have gradually found their way into mental health care, where they are used in both dyadic counseling and group interventions. However, some limitations in hardware capabilities, user experience designs, and scalability may have hindered the effectiveness of these applications. Objective: This systematic review aims at summarizing the progress made and the potential future directions in this field while evaluating various factors and perspectives relevant to remote multiuser interventions. Methods: The systematic review was performed based on a Web of Science and PubMed database search covering articles in English, published from January 1999 to March 2024, related to multiuser mental health interventions. Several inclusion and exclusion criteria were determined before and during the records screening process, which was performed in several steps. Results: We identified 49 records exploring multiuser applications in mental health care, ranging from text-based interventions to interventions set in fully immersive environments. The number of publications exploring this topic has been growing since 2015, with a large increase during the COVID-19 pandemic. Most digital interventions were delivered in the form of videoconferencing, with only a few implementing immersive environments. The studies used professional or peer-supported group interventions or a combination of both approaches. The research studies targeted diverse groups and topics, from nursing mothers to psychiatric disorders or various minority groups. Most group sessions occurred weekly, or in the case of the peer-support groups, often with a flexible schedule. Conclusions: We identified many benefits to multiuser digital interventions for mental health care. These approaches provide distributed, always available, and affordable peer support that can be used to deliver necessary help to people living outside of areas where in-person interventions are easily available. While immersive virtual environments have become a common tool in many areas of psychiatric care, such as exposure therapy, our results suggest that this technology in multiuser settings is still in its early stages. Most identified studies investigated mainstream technologies, such as videoconferencing or text-based support, substituting the immersive experience for convenience and ease of use. While many studies discuss useful features of virtual environments in group interventions, such as anonymity or stronger engagement with the group, we discuss persisting issues with these technologies, which currently prevent their full adoption. %M 39509153 %R 10.2196/60441 %U https://www.jmir.org/2024/1/e60441 %U https://doi.org/10.2196/60441 %U http://www.ncbi.nlm.nih.gov/pubmed/39509153 %0 Journal Article %@ 2563-3570 %I JMIR Publications %V 5 %N %P e64406 %T Ethical Considerations in Human-Centered AI: Advancing Oncology Chatbots Through Large Language Models %A Chow,James C L %A Li,Kay %+ Princess Margaret Cancer Centre, University Health Network, 7/F, Rm 7-606, 700 University Ave, Toronto, ON, M5G 1X6, Canada, 1 4169464501, james.chow@uhn.ca %K artificial intelligence %K humanistic AI %K ethical AI %K human-centered AI %K machine learning %K large language models %K natural language processing %K oncology chatbot %K transformer-based model %K ChatGPT %K health care %D 2024 %7 6.11.2024 %9 Viewpoint %J JMIR Bioinform Biotech %G English %X The integration of chatbots in oncology underscores the pressing need for human-centered artificial intelligence (AI) that addresses patient and family concerns with empathy and precision. Human-centered AI emphasizes ethical principles, empathy, and user-centric approaches, ensuring technology aligns with human values and needs. This review critically examines the ethical implications of using large language models (LLMs) like GPT-3 and GPT-4 (OpenAI) in oncology chatbots. It examines how these models replicate human-like language patterns, impacting the design of ethical AI systems. The paper identifies key strategies for ethically developing oncology chatbots, focusing on potential biases arising from extensive datasets and neural networks. Specific datasets, such as those sourced from predominantly Western medical literature and patient interactions, may introduce biases by overrepresenting certain demographic groups. Moreover, the training methodologies of LLMs, including fine-tuning processes, can exacerbate these biases, leading to outputs that may disproportionately favor affluent or Western populations while neglecting marginalized communities. By providing examples of biased outputs in oncology chatbots, the review highlights the ethical challenges LLMs present and the need for mitigation strategies. The study emphasizes integrating human-centric values into AI to mitigate these biases, ultimately advocating for the development of oncology chatbots that are aligned with ethical principles and capable of serving diverse patient populations equitably. %M 39321336 %R 10.2196/64406 %U https://bioinform.jmir.org/2024/1/e64406 %U https://doi.org/10.2196/64406 %U http://www.ncbi.nlm.nih.gov/pubmed/39321336 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e55218 %T A Food Intake Estimation System Using an Artificial Intelligence–Based Model for Estimating Leftover Hospital Liquid Food in Clinical Environments: Development and Validation Study %A Tagi,Masato %A Hamada,Yasuhiro %A Shan,Xiao %A Ozaki,Kazumi %A Kubota,Masanori %A Amano,Sosuke %A Sakaue,Hiroshi %A Suzuki,Yoshiko %A Konishi,Takeshi %A Hirose,Jun %+ Medical Informatics, Institute of Biomedical Sciences, Tokushima University Graduate School, 3-18-15, Kuramoto-cho, Tokushima, 7708503, Japan, 81 88 633 9178, tagi@tokushima-u.ac.jp %K artificial intelligence %K machine learning %K system development %K food intake %K dietary intake %K dietary assessment %K food consumption %K image visual estimation %K AI estimation %K direct visual estimation %D 2024 %7 5.11.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Medical staff often conduct assessments, such as food intake and nutrient sufficiency ratios, to accurately evaluate patients’ food consumption. However, visual estimations to measure food intake are difficult to perform with numerous patients. Hence, the clinical environment requires a simple and accurate method to measure dietary intake. Objective: This study aims to develop a food intake estimation system through an artificial intelligence (AI) model to estimate leftover food. The accuracy of the AI’s estimation was compared with that of visual estimation for liquid foods served to hospitalized patients. Methods: The estimations were evaluated by a dietitian who looked at the food photo (image visual estimation) and visual measurement evaluation was carried out by a nurse who looked directly at the food (direct visual estimation) based on actual measurements. In total, 300 dishes of liquid food (100 dishes of thin rice gruel, 100 of vegetable soup, 31 of fermented milk, and 18, 12, 13, and 26 of peach, grape, orange, and mixed juices, respectively) were used. The root-mean-square error (RMSE) and coefficient of determination (R2) were used as metrics to determine the accuracy of the evaluation process. Corresponding t tests and Spearman rank correlation coefficients were used to verify the accuracy of the measurements by each estimation method with the weighing method. Results: The RMSE obtained by the AI estimation approach was 8.12 for energy. This tended to be smaller and larger than that obtained by the image visual estimation approach (8.49) and direct visual estimation approach (4.34), respectively. In addition, the R2 value for the AI estimation tended to be larger and smaller than the image and direct visual estimations, respectively. There was no difference between the AI estimation (mean 71.7, SD 23.9 kcal, P=.82) and actual values with the weighing method. However, the mean nutrient intake from the image visual estimation (mean 75.5, SD 23.2 kcal, P<.001) and direct visual estimation (mean 73.1, SD 26.4 kcal, P=.007) were significantly different from the actual values. Spearman rank correlation coefficients were high for energy (ρ=0.89-0.97), protein (ρ=0.94-0.97), fat (ρ=0.91-0.94), and carbohydrate (ρ=0.89-0.97). Conclusions: The measurement from the food intake estimation system by an AI-based model to estimate leftover liquid food intake in patients showed a high correlation with the actual values with the weighing method. Furthermore, it also showed a higher accuracy than the image visual estimation. The errors of the AI estimation method were within the acceptable range of the weighing method, which indicated that the AI-based food intake estimation system could be applied in clinical environments. However, its lower accuracy than that of direct visual estimation was still an issue. %M 39500491 %R 10.2196/55218 %U https://formative.jmir.org/2024/1/e55218 %U https://doi.org/10.2196/55218 %U http://www.ncbi.nlm.nih.gov/pubmed/39500491 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e56532 %T The Accuracy and Capability of Artificial Intelligence Solutions in Health Care Examinations and Certificates: Systematic Review and Meta-Analysis %A Waldock,William J %A Zhang,Joe %A Guni,Ahmad %A Nabeel,Ahmad %A Darzi,Ara %A Ashrafian,Hutan %+ Institute of Global Health Innovation, Imperial College London, 10th Floor, Queen Elizabeth Queen Mother Building, Praed Street, London, United Kingdom, 44 07799871597, h.ashrafian@imperial.ac.uk %K large language model %K LLM %K artificial intelligence %K AI %K health care exam %K narrative medical response %K health care examination %K clinical commissioning %K health services %K safety %D 2024 %7 5.11.2024 %9 Review %J J Med Internet Res %G English %X Background: Large language models (LLMs) have dominated public interest due to their apparent capability to accurately replicate learned knowledge in narrative text. However, there is a lack of clarity about the accuracy and capability standards of LLMs in health care examinations. Objective: We conducted a systematic review of LLM accuracy, as tested under health care examination conditions, as compared to known human performance standards. Methods: We quantified the accuracy of LLMs in responding to health care examination questions and evaluated the consistency and quality of study reporting. The search included all papers up until September 10, 2023, with all LLMs published in English journals that report clear LLM accuracy standards. The exclusion criteria were as follows: the assessment was not a health care exam, there was no LLM, there was no evaluation of comparable success accuracy, and the literature was not original research.The literature search included the following Medical Subject Headings (MeSH) terms used in all possible combinations: “artificial intelligence,” “ChatGPT,” “GPT,” “LLM,” “large language model,” “machine learning,” “neural network,” “Generative Pre-trained Transformer,” “Generative Transformer,” “Generative Language Model,” “Generative Model,” “medical exam,” “healthcare exam,” and “clinical exam.” Sensitivity, accuracy, and precision data were extracted, including relevant CIs. Results: The search identified 1673 relevant citations. After removing duplicate results, 1268 (75.8%) papers were screened for titles and abstracts, and 32 (2.5%) studies were included for full-text review. Our meta-analysis suggested that LLMs are able to perform with an overall medical examination accuracy of 0.61 (CI 0.58-0.64) and a United States Medical Licensing Examination (USMLE) accuracy of 0.51 (CI 0.46-0.56), while Chat Generative Pretrained Transformer (ChatGPT) can perform with an overall medical examination accuracy of 0.64 (CI 0.6-0.67). Conclusions: LLMs offer promise to remediate health care demand and staffing challenges by providing accurate and efficient context-specific information to critical decision makers. For policy and deployment decisions about LLMs to advance health care, we proposed a new framework called RUBRICC (Regulatory, Usability, Bias, Reliability [Evidence and Safety], Interoperability, Cost, and Codesign–Patient and Public Involvement and Engagement [PPIE]). This presents a valuable opportunity to direct the clinical commissioning of new LLM capabilities into health services, while respecting patient safety considerations. Trial Registration: OSF Registries osf.io/xqzkw; https://osf.io/xqzkw %M 39499913 %R 10.2196/56532 %U https://www.jmir.org/2024/1/e56532 %U https://doi.org/10.2196/56532 %U http://www.ncbi.nlm.nih.gov/pubmed/39499913 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52794 %T Machine Learning–Based Prediction for Incident Hypertension Based on Regular Health Checkup Data: Derivation and Validation in 2 Independent Nationwide Cohorts in South Korea and Japan %A Hwang,Seung Ha %A Lee,Hayeon %A Lee,Jun Hyuk %A Lee,Myeongcheol %A Koyanagi,Ai %A Smith,Lee %A Rhee,Sang Youl %A Yon,Dong Keon %A Lee,Jinseok %+ Department of Biomedical Engineering, Kyung Hee University, 1732 Deogyeong-daero, Yongin, 17104, Republic of Korea, 82 312012570, gonasago@khu.ac.kr %K machine learning %K hypertension %K cardiovascular disease %K artificial intelligence %K cause of death %K cardiovascular risk %K predictive analytics %D 2024 %7 5.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Worldwide, cardiovascular diseases are the primary cause of death, with hypertension as a key contributor. In 2019, cardiovascular diseases led to 17.9 million deaths, predicted to reach 23 million by 2030. Objective: This study presents a new method to predict hypertension using demographic data, using 6 machine learning models for enhanced reliability and applicability. The goal is to harness artificial intelligence for early and accurate hypertension diagnosis across diverse populations. Methods: Data from 2 national cohort studies, National Health Insurance Service-National Sample Cohort (South Korea, n=244,814), conducted between 2002 and 2013 were used to train and test machine learning models designed to anticipate incident hypertension within 5 years of a health checkup involving those aged ≥20 years, and Japanese Medical Data Center cohort (Japan, n=1,296,649) were used for extra validation. An ensemble from 6 diverse machine learning models was used to identify the 5 most salient features contributing to hypertension by presenting a feature importance analysis to confirm the contribution of each future. Results: The Adaptive Boosting and logistic regression ensemble showed superior balanced accuracy (0.812, sensitivity 0.806, specificity 0.818, and area under the receiver operating characteristic curve 0.901). The 5 key hypertension indicators were age, diastolic blood pressure, BMI, systolic blood pressure, and fasting blood glucose. The Japanese Medical Data Center cohort dataset (extra validation set) corroborated these findings (balanced accuracy 0.741 and area under the receiver operating characteristic curve 0.824). The ensemble model was integrated into a public web portal for predicting hypertension onset based on health checkup data. Conclusions: Comparative evaluation of our machine learning models against classical statistical models across 2 distinct studies emphasized the former’s enhanced stability, generalizability, and reproducibility in predicting hypertension onset. %M 39499554 %R 10.2196/52794 %U https://www.jmir.org/2024/1/e52794 %U https://doi.org/10.2196/52794 %U http://www.ncbi.nlm.nih.gov/pubmed/39499554 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e51446 %T The Potential of Artificial Intelligence Tools for Reducing Uncertainty in Medicine and Directions for Medical Education %A Alli,Sauliha Rabia %A Hossain,Soaad Qahhār %A Das,Sunit %A Upshur,Ross %K artificial intelligence %K machine learning %K uncertainty %K clinical decision-making %K medical education %K generative AI %K generative artificial intelligence %D 2024 %7 4.11.2024 %9 %J JMIR Med Educ %G English %X In the field of medicine, uncertainty is inherent. Physicians are asked to make decisions on a daily basis without complete certainty, whether it is in understanding the patient’s problem, performing the physical examination, interpreting the findings of diagnostic tests, or proposing a management plan. The reasons for this uncertainty are widespread, including the lack of knowledge about the patient, individual physician limitations, and the limited predictive power of objective diagnostic tools. This uncertainty poses significant problems in providing competent patient care. Research efforts and teaching are attempts to reduce uncertainty that have now become inherent to medicine. Despite this, uncertainty is rampant. Artificial intelligence (AI) tools, which are being rapidly developed and integrated into practice, may change the way we navigate uncertainty. In their strongest forms, AI tools may have the ability to improve data collection on diseases, patient beliefs, values, and preferences, thereby allowing more time for physician-patient communication. By using methods not previously considered, these tools hold the potential to reduce the uncertainty in medicine, such as those arising due to the lack of clinical information and provider skill and bias. Despite this possibility, there has been considerable resistance to the implementation of AI tools in medical practice. In this viewpoint article, we discuss the impact of AI on medical uncertainty and discuss practical approaches to teaching the use of AI tools in medical schools and residency training programs, including AI ethics, real-world skills, and technological aptitude. %R 10.2196/51446 %U https://mededu.jmir.org/2024/1/e51446 %U https://doi.org/10.2196/51446 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e60291 %T Accuracy of Prospective Assessments of 4 Large Language Model Chatbot Responses to Patient Questions About Emergency Care: Experimental Comparative Study %A Yau,Jonathan Yi-Shin %A Saadat,Soheil %A Hsu,Edmund %A Murphy,Linda Suk-Ling %A Roh,Jennifer S %A Suchard,Jeffrey %A Tapia,Antonio %A Wiechmann,Warren %A Langdorf,Mark I %+ Department of Emergency Medicine, University of California - Irvine, 101 the City Drive, Route 128-01, Orange, CA, 92868, United States, 1 7147452663, milangdo@hs.uci.edu %K artificial intelligence %K AI %K chatbots %K generative AI %K natural language processing %K consumer health information %K patient education %K literacy %K emergency care information %K chatbot %K misinformation %K health care %K medical consultation %D 2024 %7 4.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Recent surveys indicate that 48% of consumers actively use generative artificial intelligence (AI) for health-related inquiries. Despite widespread adoption and the potential to improve health care access, scant research examines the performance of AI chatbot responses regarding emergency care advice. Objective: We assessed the quality of AI chatbot responses to common emergency care questions. We sought to determine qualitative differences in responses from 4 free-access AI chatbots, for 10 different serious and benign emergency conditions. Methods: We created 10 emergency care questions that we fed into the free-access versions of ChatGPT 3.5 (OpenAI), Google Bard, Bing AI Chat (Microsoft), and Claude AI (Anthropic) on November 26, 2023. Each response was graded by 5 board-certified emergency medicine (EM) faculty for 8 domains of percentage accuracy, presence of dangerous information, factual accuracy, clarity, completeness, understandability, source reliability, and source relevancy. We determined the correct, complete response to the 10 questions from reputable and scholarly emergency medical references. These were compiled by an EM resident physician. For the readability of the chatbot responses, we used the Flesch-Kincaid Grade Level of each response from readability statistics embedded in Microsoft Word. Differences between chatbots were determined by the chi-square test. Results: Each of the 4 chatbots’ responses to the 10 clinical questions were scored across 8 domains by 5 EM faculty, for 400 assessments for each chatbot. Together, the 4 chatbots had the best performance in clarity and understandability (both 85%), intermediate performance in accuracy and completeness (both 50%), and poor performance (10%) for source relevance and reliability (mostly unreported). Chatbots contained dangerous information in 5% to 35% of responses, with no statistical difference between chatbots on this metric (P=.24). ChatGPT, Google Bard, and Claud AI had similar performances across 6 out of 8 domains. Only Bing AI performed better with more identified or relevant sources (40%; the others had 0%-10%). Flesch-Kincaid Reading level was 7.7-8.9 grade for all chatbots, except ChatGPT at 10.8, which were all too advanced for average emergency patients. Responses included both dangerous (eg, starting cardiopulmonary resuscitation with no pulse check) and generally inappropriate advice (eg, loosening the collar to improve breathing without evidence of airway compromise). Conclusions: AI chatbots, though ubiquitous, have significant deficiencies in EM patient advice, despite relatively consistent performance. Information for when to seek urgent or emergent care is frequently incomplete and inaccurate, and patients may be unaware of misinformation. Sources are not generally provided. Patients who use AI to guide health care decisions assume potential risks. AI chatbots for health should be subject to further research, refinement, and regulation. We strongly recommend proper medical consultation to prevent potential adverse outcomes. %R 10.2196/60291 %U https://www.jmir.org/2024/1/e60291 %U https://doi.org/10.2196/60291 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e58149 %T AI-Supported Digital Microscopy Diagnostics in Primary Health Care Laboratories: Protocol for a Scoping Review %A von Bahr,Joar %A Diwan,Vinod %A Mårtensson,Andreas %A Linder,Nina %A Lundin,Johan %+ Department of Global Public Health, Karolinska Institutet, Tomtebodavägen 18A, Stockholm, 17177, Sweden, 46 708561007, joar.von.bahr@ki.se %K AI %K artificial intelligence %K convolutional neural network %K deep learning %K diagnosis %K digital diagnostics %K machine learning %K pathology %K primary health care %K whole slide images %D 2024 %7 1.11.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Digital microscopy combined with artificial intelligence (AI) is increasingly being implemented in health care, predominantly in advanced laboratory settings. However, AI-supported digital microscopy could be especially advantageous in primary health care settings, since such methods could improve access to diagnostics via automation and lead to a decreased need for experts on site. To our knowledge, no scoping or systematic review had been published on the use of AI-supported digital microscopy within primary health care laboratories when this scoping review was initiated. A scoping review can guide future research by providing insights to help navigate the challenges of implementing these novel methods in primary health care laboratories. Objective: The objective of this scoping review is to map peer-reviewed studies on AI-supported digital microscopy in primary health care laboratories to generate an overview of the subject. Methods: A systematic search of the databases PubMed, Web of Science, Embase, and IEEE will be conducted. Only peer-reviewed articles in English will be considered, and no limit on publication year will be applied. The concept inclusion criteria in the scoping review include studies that have applied AI-supported digital microscopy with the aim of achieving a diagnosis on the subject level. In addition, the studies must have been performed in the context of primary health care laboratories, as defined by the criteria of not having a pathologist on site and using simple sample preparations. The study selection and data extraction will be performed by 2 independent researchers, and in the case of disagreements, a third researcher will be involved. The results will be presented in a table developed by the researchers, including information on investigated diseases, sample collection, preparation and digitization, AI model used, and results. Furthermore, the results will be described narratively to provide an overview of the studies included. The proposed methodology is in accordance with the JBI methodology for scoping reviews. Results: The scoping review was initiated in January 2023, and a protocol was published in the Open Science Framework in January 2024. The protocol was completed in March 2024, and the systematic search will be performed after the protocol has been peer reviewed. The scoping review is expected to be finalized by the end of 2024. Conclusions: A systematic review of studies on AI-supported digital microscopy in primary health care laboratories is anticipated to identify the diseases where these novel methods could be advantageous, along with the shared challenges encountered and approaches taken to address them. International Registered Report Identifier (IRRID): PRR1-10.2196/58149 %M 39486020 %R 10.2196/58149 %U https://www.researchprotocols.org/2024/1/e58149 %U https://doi.org/10.2196/58149 %U http://www.ncbi.nlm.nih.gov/pubmed/39486020 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e50631 %T Use of Artificial Intelligence in Cobb Angle Measurement for Scoliosis: Retrospective Reliability and Accuracy Study of a Mobile App %A Li,Haodong %A Qian,Chuang %A Yan,Weili %A Fu,Dong %A Zheng,Yiming %A Zhang,Zhiqiang %A Meng,Junrong %A Wang,Dahui %+ Department of Orthopedics, Children’s Hospital of Fudan University, National Children’s Medical Center, Wanyuan Rd, Minhang District, Shanghai, 201102, China, 86 02164931101, wangdahui@fudan.edu.cn %K scoliosis %K photogrammetry %K artificial intelligence %K deep learning %D 2024 %7 1.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Scoliosis is a spinal deformity in which one or more spinal segments bend to the side or show vertebral rotation. Some artificial intelligence (AI) apps have already been developed for measuring the Cobb angle in patients with scoliosis. These apps still require doctors to perform certain measurements, which can lead to interobserver variability. The AI app (cobbAngle pro) in this study will eliminate the need for doctor measurements, achieving complete automation. Objective: We aimed to evaluate the reliability and accuracy of our new AI app that is based on deep learning to automatically measure the Cobb angle in patients with scoliosis. Methods: A retrospective analysis was conducted on the clinical data of children with scoliosis who were treated at the Pediatric Orthopedics Department of the Children’s Hospital affiliated with Fudan University from July 2019 to July 2022. Three measurers used the Picture Archiving and Communication System (PACS) to measure the coronal main curve Cobb angle in 802 full-length anteroposterior and lateral spine X-rays of 601 children with scoliosis, and recorded the results of each measurement. After an interval of 2 weeks, the mobile AI app was used to remeasure the Cobb angle once. The Cobb angle measurements from the PACS were used as the reference standard, and the accuracy of the Cobb angle measurements by the app was analyzed through the Bland-Altman test. The intraclass correlation coefficient (ICC) was used to compare the repeatability within measurers and the consistency between measurers. Results: Among 601 children with scoliosis, 89 were male and 512 were female (age range: 10-17 years), and 802 full-length spinal X-rays were analyzed. Two functionalities of the app (photography and photo upload) were compared with the PACS for measuring the Cobb angle. The consistency was found to be excellent. The average absolute errors of the Cobb angle measured by the photography and upload methods were 2.00 and 2.08, respectively. Using a clinical allowance maximum error of 5°, the 95% limits of agreement (LoAs) for Cobb angle measurements by the photography and upload methods were –4.7° to 4.9° and –4.9° to 4.9°, respectively. For the photography and upload methods, the 95% LoAs for measuring Cobb angles were –4.3° to 4.6° and –4.4° to 4.7°, respectively, in mild scoliosis patients; –4.9° to 5.2° and –5.1° to 5.1°, respectively, in moderate scoliosis patients; and –5.2° to 5.0° and –6.0° to 4.8°, respectively, in severe scoliosis patients. The Cobb angle measured by the 3 observers twice before and after using the photography method had good repeatability (P<.001). The consistency between the observers was excellent (P<.001). Conclusions: The new AI platform is accurate and repeatable in the automatic measurement of the Cobb angle of the main curvature in patients with scoliosis. %M 39486021 %R 10.2196/50631 %U https://www.jmir.org/2024/1/e50631 %U https://doi.org/10.2196/50631 %U http://www.ncbi.nlm.nih.gov/pubmed/39486021 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e58572 %T Automated Speech Analysis for Risk Detection of Depression, Anxiety, Insomnia, and Fatigue: Algorithm Development and Validation Study %A Riad,Rachid %A Denais,Martin %A de Gennes,Marc %A Lesage,Adrien %A Oustric,Vincent %A Cao,Xuan Nga %A Mouchabac,Stéphane %A Bourla,Alexis %+ Callyope, 5 Parvis Alan Turing, Paris, 75013, France, 33 666522141, rachid@callyope.com %K speech analysis %K voice detection %K voice analysis %K speech biomarkers %K speech-based systems %K computer-aided diagnosis %K mental health symptom detection %K machine learning %K mental health %K fatigue %K anxiety %K depression %D 2024 %7 31.10.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: While speech analysis holds promise for mental health assessment, research often focuses on single symptoms, despite symptom co-occurrences and interactions. In addition, predictive models in mental health do not properly assess the limitations of speech-based systems, such as uncertainty, or fairness for a safe clinical deployment. Objective: We investigated the predictive potential of mobile-collected speech data for detecting and estimating depression, anxiety, fatigue, and insomnia, focusing on other factors than mere accuracy, in the general population. Methods: We included 865 healthy adults and recorded their answers regarding their perceived mental and sleep states. We asked how they felt and if they had slept well lately. Clinically validated questionnaires measuring depression, anxiety, insomnia, and fatigue severity were also used. We developed a novel speech and machine learning pipeline involving voice activity detection, feature extraction, and model training. We automatically modeled speech with pretrained deep learning models that were pretrained on a large, open, and free database, and we selected the best one on the validation set. Based on the best speech modeling approach, clinical threshold detection, individual score prediction, model uncertainty estimation, and performance fairness across demographics (age, sex, and education) were evaluated. We used a train-validation-test split for all evaluations: to develop our models, select the best ones, and assess the generalizability of held-out data. Results: The best model was Whisper M with a max pooling and oversampling method. Our methods achieved good detection performance for all symptoms, depression (Patient Health Questionnaire-9: area under the curve [AUC]=0.76; F1-score=0.49 and Beck Depression Inventory: AUC=0.78; F1-score=0.65), anxiety (Generalized Anxiety Disorder 7-item scale: AUC=0.77; F1-score=0.50), insomnia (Athens Insomnia Scale: AUC=0.73; F1-score=0.62), and fatigue (Multidimensional Fatigue Inventory total score: AUC=0.68; F1-score=0.88). The system performed well when it needed to abstain from making predictions, as demonstrated by low abstention rates in depression detection with the Beck Depression Inventory and fatigue, with risk-coverage AUCs below 0.4. Individual symptom scores were accurately predicted (correlations were all significant with Pearson strengths between 0.31 and 0.49). Fairness analysis revealed that models were consistent for sex (average disparity ratio [DR] 0.86, SD 0.13), to a lesser extent for education level (average DR 0.47, SD 0.30), and worse for age groups (average DR 0.33, SD 0.30). Conclusions: This study demonstrates the potential of speech-based systems for multifaceted mental health assessment in the general population, not only for detecting clinical thresholds but also for estimating their severity. Addressing fairness and incorporating uncertainty estimation with selective classification are key contributions that can enhance the clinical utility and responsible implementation of such systems. %M 39324329 %R 10.2196/58572 %U https://www.jmir.org/2024/1/e58572 %U https://doi.org/10.2196/58572 %U http://www.ncbi.nlm.nih.gov/pubmed/39324329 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51095 %T Assessing the Role of the Generative Pretrained Transformer (GPT) in Alzheimer’s Disease Management: Comparative Study of Neurologist- and Artificial Intelligence–Generated Responses %A Zeng,Jiaqi %A Zou,Xiaoyi %A Li,Shirong %A Tang,Yao %A Teng,Sisi %A Li,Huanhuan %A Wang,Changyu %A Wu,Yuxuan %A Zhang,Luyao %A Zhong,Yunheng %A Liu,Jialin %A Liu,Siru %+ Department of Medical Informatics, West China Medical School, No 37 Guoxue Road, Chengdu, 610041, China, 86 28 85422306, Dljl8@163.com %K Alzheimer's disease %K artificial intelligence %K AI %K large language model %K LLM %K Generative Pretrained Transformer %K GPT %K ChatGPT %K patient information %D 2024 %7 31.10.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Alzheimer’s disease (AD) is a progressive neurodegenerative disorder posing challenges to patients, caregivers, and society. Accessible and accurate information is crucial for effective AD management. Objective: This study aimed to evaluate the accuracy, comprehensibility, clarity, and usefulness of the Generative Pretrained Transformer’s (GPT) answers concerning the management and caregiving of patients with AD. Methods: In total, 14 questions related to the prevention, treatment, and care of AD were identified and posed to GPT-3.5 and GPT-4 in Chinese and English, respectively, and 4 respondent neurologists were asked to answer them. We generated 8 sets of responses (total 112) and randomly coded them in answer sheets. Next, 5 evaluator neurologists and 5 family members of patients were asked to rate the 112 responses using separate 5-point Likert scales. We evaluated the quality of the responses using a set of 8 questions rated on a 5-point Likert scale. To gauge comprehensibility and participant satisfaction, we included 3 questions dedicated to each aspect within the same set of 8 questions. Results: As of April 10, 2023, the 5 evaluator neurologists and 5 family members of patients with AD rated the 112 responses: GPT-3.5: n=28, 25%, responses; GPT-4: n=28, 25%, responses; respondent neurologists: 56 (50%) responses. The top 5 (4.5%) responses rated by evaluator neurologists had 4 (80%) GPT (GPT-3.5+GPT-4) responses and 1 (20%) respondent neurologist’s response. For the top 5 (4.5%) responses rated by patients’ family members, all but the third response were GPT responses. Based on the evaluation by neurologists, the neurologist-generated responses achieved a mean score of 3.9 (SD 0.7), while the GPT-generated responses scored significantly higher (mean 4.4, SD 0.6; P<.001). Language and model analyses revealed no significant differences in response quality between the GPT-3.5 and GPT-4 models (GPT-3.5: mean 4.3, SD 0.7; GPT-4: mean 4.4, SD 0.5; P=.51). However, English responses outperformed Chinese responses in terms of comprehensibility (Chinese responses: mean 4.1, SD 0.7; English responses: mean 4.6, SD 0.5; P=.005) and participant satisfaction (Chinese responses: mean 4.2, SD 0.8; English responses: mean 4.5, SD 0.5; P=.04). According to the evaluator neurologists’ review, Chinese responses had a mean score of 4.4 (SD 0.6), whereas English responses had a mean score of 4.5 (SD 0.5; P=.002). As for the family members of patients with AD, no significant differences were observed between GPT and neurologists, GPT-3.5 and GPT-4, or Chinese and English responses. Conclusions: GPT can provide patient education materials on AD for patients, their families and caregivers, nurses, and neurologists. This capability can contribute to the effective health care management of patients with AD, leading to enhanced patient outcomes. %M 39481104 %R 10.2196/51095 %U https://www.jmir.org/2024/1/e51095 %U https://doi.org/10.2196/51095 %U http://www.ncbi.nlm.nih.gov/pubmed/39481104 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e64593 %T Understanding AI’s Role in Endometriosis Patient Education and Evaluating Its Information and Accuracy: Systematic Review %A Oliveira,Juliana Almeida %A Eskandar,Karine %A Kar,Emre %A de Oliveira,Flávia Ribeiro %A Filho,Agnaldo Lopes da Silva %+ Department of Women's Health, Federal University of Minas Gerais, Av Prof Alfredo Balena 190, Belo Horizonte, 30130-100, Brazil, 55 31975806261, julianaoliveira_md@outlook.com %K endometriosis %K gynecology %K machine learning %K artificial intelligence %K large language models %K natural language processing %K patient-generated health data %K health knowledge %K information seeking %K patient education %D 2024 %7 30.10.2024 %9 Review %J JMIR AI %G English %X Background: Endometriosis is a chronic gynecological condition that affects a significant portion of women of reproductive age, leading to debilitating symptoms such as chronic pelvic pain and infertility. Despite advancements in diagnosis and management, patient education remains a critical challenge. With the rapid growth of digital platforms, artificial intelligence (AI) has emerged as a potential tool to enhance patient education and access to information. Objective: This systematic review aims to explore the role of AI in facilitating education and improving information accessibility for individuals with endometriosis. Methods: This review followed the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines to ensure rigorous and transparent reporting. We conducted a comprehensive search of PubMed; Embase; the Regional Online Information System for Scientific Journals of Latin America, the Caribbean, Spain and Portugal (LATINDEX); Latin American and Caribbean Literature in Health Sciences (LILACS); Institute of Electrical and Electronics Engineers (IEEE) Xplore, and the Cochrane Central Register of Controlled Trials using the terms “endometriosis” and “artificial intelligence.” Studies were selected based on their focus on AI applications in patient education or information dissemination regarding endometriosis. We included studies that evaluated AI-driven tools for assessing patient knowledge and addressed frequently asked questions related to endometriosis. Data extraction and quality assessment were conducted independently by 2 authors, with discrepancies resolved through consensus. Results: Out of 400 initial search results, 11 studies met the inclusion criteria and were fully reviewed. We ultimately included 3 studies, 1 of which was an abstract. The studies examined the use of AI models, such as ChatGPT (OpenAI), machine learning, and natural language processing, in providing educational resources and answering common questions about endometriosis. The findings indicated that AI tools, particularly large language models, offer accurate responses to frequently asked questions with varying degrees of sufficiency across different categories. AI’s integration with social media platforms also highlights its potential to identify patients’ needs and enhance information dissemination. Conclusions: AI holds promise in advancing patient education and information access for endometriosis, providing accurate and comprehensive answers to common queries, and facilitating a better understanding of the condition. However, challenges remain in ensuring ethical use, equitable access, and maintaining accuracy across diverse patient populations. Future research should focus on developing standardized approaches for evaluating AI’s impact on patient education and exploring its integration into clinical practice to enhance support for individuals with endometriosis. %M 39476855 %R 10.2196/64593 %U https://ai.jmir.org/2024/1/e64593 %U https://doi.org/10.2196/64593 %U http://www.ncbi.nlm.nih.gov/pubmed/39476855 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e57124 %T Leveraging Artificial Intelligence and Data Science for Integration of Social Determinants of Health in Emergency Medicine: Scoping Review %A Abbott,Ethan E %A Apakama,Donald %A Richardson,Lynne D %A Chan,Lili %A Nadkarni,Girish N %K data science %K social determinants of health %K natural language processing %K artificial intelligence %K NLP %K machine learning %K review methods %K review methodology %K scoping review %K emergency medicine %K PRISMA %D 2024 %7 30.10.2024 %9 %J JMIR Med Inform %G English %X Background: Social determinants of health (SDOH) are critical drivers of health disparities and patient outcomes. However, accessing and collecting patient-level SDOH data can be operationally challenging in the emergency department (ED) clinical setting, requiring innovative approaches. Objective: This scoping review examines the potential of AI and data science for modeling, extraction, and incorporation of SDOH data specifically within EDs, further identifying areas for advancement and investigation. Methods: We conducted a standardized search for studies published between 2015 and 2022, across Medline (Ovid), Embase (Ovid), CINAHL, Web of Science, and ERIC databases. We focused on identifying studies using AI or data science related to SDOH within emergency care contexts or conditions. Two specialized reviewers in emergency medicine (EM) and clinical informatics independently assessed each article, resolving discrepancies through iterative reviews and discussion. We then extracted data covering study details, methodologies, patient demographics, care settings, and principal outcomes. Results: Of the 1047 studies screened, 26 met the inclusion criteria. Notably, 9 out of 26 (35%) studies were solely concentrated on ED patients. Conditions studied spanned broad EM complaints and included sepsis, acute myocardial infarction, and asthma. The majority of studies (n=16) explored multiple SDOH domains, with homelessness/housing insecurity and neighborhood/built environment predominating. Machine learning (ML) techniques were used in 23 of 26 studies, with natural language processing (NLP) being the most commonly used approach (n=11). Rule-based NLP (n=5), deep learning (n=2), and pattern matching (n=4) were the most commonly used NLP techniques. NLP models in the reviewed studies displayed significant predictive performance with outcomes, with F1-scores ranging between 0.40 and 0.75 and specificities nearing 95.9%. Conclusions: Although in its infancy, the convergence of AI and data science techniques, especially ML and NLP, with SDOH in EM offers transformative possibilities for better usage and integration of social data into clinical care and research. With a significant focus on the ED and notable NLP model performance, there is an imperative to standardize SDOH data collection, refine algorithms for diverse patient groups, and champion interdisciplinary synergies. These efforts aim to harness SDOH data optimally, enhancing patient care and mitigating health disparities. Our research underscores the vital need for continued investigation in this domain. %R 10.2196/57124 %U https://medinform.jmir.org/2024/1/e57124 %U https://doi.org/10.2196/57124 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e60939 %T Ensuring Accuracy and Equity in Vaccination Information From ChatGPT and CDC: Mixed-Methods Cross-Language Evaluation %A Joshi,Saubhagya %A Ha,Eunbin %A Amaya,Andee %A Mendoza,Melissa %A Rivera,Yonaira %A Singh,Vivek K %+ School of Communication & Information, Rutgers University, 4 Huntington Street, New Brunswick, NJ, 08901, United States, 1 848 932 7588, v.singh@rutgers.edu %K vaccination %K health equity %K multilingualism %K language equity %K health literacy %K online health information %K conversational agents %K artificial intelligence %K large language models %K health information %K public health %D 2024 %7 30.10.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: In the digital age, large language models (LLMs) like ChatGPT have emerged as important sources of health care information. Their interactive capabilities offer promise for enhancing health access, particularly for groups facing traditional barriers such as insurance and language constraints. Despite their growing public health use, with millions of medical queries processed weekly, the quality of LLM-provided information remains inconsistent. Previous studies have predominantly assessed ChatGPT’s English responses, overlooking the needs of non–English speakers in the United States. This study addresses this gap by evaluating the quality and linguistic parity of vaccination information from ChatGPT and the Centers for Disease Control and Prevention (CDC), emphasizing health equity. Objective: This study aims to assess the quality and language equity of vaccination information provided by ChatGPT and the CDC in English and Spanish. It highlights the critical need for cross-language evaluation to ensure equitable health information access for all linguistic groups. Methods: We conducted a comparative analysis of ChatGPT’s and CDC’s responses to frequently asked vaccination-related questions in both languages. The evaluation encompassed quantitative and qualitative assessments of accuracy, readability, and understandability. Accuracy was gauged by the perceived level of misinformation; readability, by the Flesch-Kincaid grade level and readability score; and understandability, by items from the National Institutes of Health’s Patient Education Materials Assessment Tool (PEMAT) instrument. Results: The study found that both ChatGPT and CDC provided mostly accurate and understandable (eg, scores over 95 out of 100) responses. However, Flesch-Kincaid grade levels often exceeded the American Medical Association’s recommended levels, particularly in English (eg, average grade level in English for ChatGPT=12.84, Spanish=7.93, recommended=6). CDC responses outperformed ChatGPT in readability across both languages. Notably, some Spanish responses appeared to be direct translations from English, leading to unnatural phrasing. The findings underscore the potential and challenges of using ChatGPT for health care access. Conclusions: ChatGPT holds potential as a health information resource but requires improvements in readability and linguistic equity to be truly effective for diverse populations. Crucially, the default user experience with ChatGPT, typically encountered by those without advanced language and prompting skills, can significantly shape health perceptions. This is vital from a public health standpoint, as the majority of users will interact with LLMs in their most accessible form. Ensuring that default responses are accurate, understandable, and equitable is imperative for fostering informed health decisions across diverse communities. %M 39476380 %R 10.2196/60939 %U https://formative.jmir.org/2024/1/e60939 %U https://doi.org/10.2196/60939 %U http://www.ncbi.nlm.nih.gov/pubmed/39476380 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e55766 %T Health Care Professionals’ Experience of Using AI: Systematic Review With Narrative Synthesis %A Ayorinde,Abimbola %A Mensah,Daniel Opoku %A Walsh,Julia %A Ghosh,Iman %A Ibrahim,Siti Aishah %A Hogg,Jeffry %A Peek,Niels %A Griffiths,Frances %+ Division of Health Sciences, Warwick Medical School, University of Warwick, Medical School Building, Gibbet Hill Road, Coventry, CV4 7AL, United Kingdom, 44 2476151098, a.ayorinde.1@warwick.ac.uk %K artificial intelligence %K clinical decision support systems %K CDSS %K decision-making %K quality assessment %K clinician experience %K health care professionals %K health care delivery %D 2024 %7 30.10.2024 %9 Review %J J Med Internet Res %G English %X Background: There has been a substantial increase in the development of artificial intelligence (AI) tools for clinical decision support. Historically, these were mostly knowledge-based systems, but recent advances include non–knowledge-based systems using some form of machine learning. The ability of health care professionals to trust technology and understand how it benefits patients or improves care delivery is known to be important for their adoption of that technology. For non–knowledge-based AI tools for clinical decision support, these issues are poorly understood. Objective: The aim of this study is to qualitatively synthesize evidence on the experiences of health care professionals in routinely using non–knowledge-based AI tools to support their clinical decision-making. Methods: In June 2023, we searched 4 electronic databases, MEDLINE, Embase, CINAHL, and Web of Science, with no language or date limit. We also contacted relevant experts and searched reference lists of the included studies. We included studies of any design that reported the experiences of health care professionals using non–knowledge-based systems for clinical decision support in their work settings. We completed double independent quality assessment for all included studies using the Mixed Methods Appraisal Tool. We used a theoretically informed thematic approach to synthesize the findings. Results: After screening 7552 titles and 182 full-text articles, we included 25 studies conducted in 9 different countries. Most of the included studies were qualitative (n=13), and the remaining were quantitative (n=9) and mixed methods (n=3). Overall, we identified 7 themes: health care professionals’ understanding of AI applications, level of trust and confidence in AI tools, judging the value added by AI, data availability and limitations of AI, time and competing priorities, concern about governance, and collaboration to facilitate the implementation and use of AI. The most frequently occurring are the first 3 themes. For example, many studies reported that health care professionals were concerned about not understanding the AI outputs or the rationale behind them. There were issues with confidence in the accuracy of the AI applications and their recommendations. Some health care professionals believed that AI provided added value and improved decision-making, and some reported that it only served as a confirmation of their clinical judgment, while others did not find it useful at all. Conclusions: Our review identified several important issues documented in various studies on health care professionals’ use of AI tools in real-world health care settings. Opinions of health care professionals regarding the added value of AI tools for supporting clinical decision-making varied widely, and many professionals had concerns about their understanding of and trust in this technology. The findings of this review emphasize the need for concerted efforts to optimize the integration of AI tools in real-world health care settings. Trial Registration: PROSPERO CRD42022336359; https://tinyurl.com/2yunvkmb %M 39476382 %R 10.2196/55766 %U https://www.jmir.org/2024/1/e55766 %U https://doi.org/10.2196/55766 %U http://www.ncbi.nlm.nih.gov/pubmed/39476382 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e53207 %T How Explainable Artificial Intelligence Can Increase or Decrease Clinicians’ Trust in AI Applications in Health Care: Systematic Review %A Rosenbacke,Rikard %A Melhus,Åsa %A McKee,Martin %A Stuckler,David %+ Centre for Corporate Governance, Department of Accounting, Copenhagen Business School, Solbjerg Plads 3, Frederiksberg, DK-2000, Denmark, 45 709990907, rikard@rosenbacke.com %K explainable artificial intelligence %K XAI %K trustworthy AI %K clinician trust %K affect-based measures %K cognitive measures %K clinical use %K clinical decision-making %K clinical informatics %D 2024 %7 30.10.2024 %9 Review %J JMIR AI %G English %X Background: Artificial intelligence (AI) has significant potential in clinical practice. However, its “black box” nature can lead clinicians to question its value. The challenge is to create sufficient trust for clinicians to feel comfortable using AI, but not so much that they defer to it even when it produces results that conflict with their clinical judgment in ways that lead to incorrect decisions. Explainable AI (XAI) aims to address this by providing explanations of how AI algorithms reach their conclusions. However, it remains unclear whether such explanations foster an appropriate degree of trust to ensure the optimal use of AI in clinical practice. Objective: This study aims to systematically review and synthesize empirical evidence on the impact of XAI on clinicians’ trust in AI-driven clinical decision-making. Methods: A systematic review was conducted in accordance with PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, searching PubMed and Web of Science databases. Studies were included if they empirically measured the impact of XAI on clinicians’ trust using cognition- or affect-based measures. Out of 778 articles screened, 10 met the inclusion criteria. We assessed the risk of bias using standard tools appropriate to the methodology of each paper. Results: The risk of bias in all papers was moderate or moderate to high. All included studies operationalized trust primarily through cognitive-based definitions, with 2 also incorporating affect-based measures. Out of these, 5 studies reported that XAI increased clinicians’ trust compared with standard AI, particularly when the explanations were clear, concise, and relevant to clinical practice. In addition, 3 studies found no significant effect of XAI on trust, and the presence of explanations does not automatically improve trust. Notably, 2 studies highlighted that XAI could either enhance or diminish trust, depending on the complexity and coherence of the provided explanations. The majority of studies suggest that XAI has the potential to enhance clinicians’ trust in recommendations generated by AI. However, complex or contradictory explanations can undermine this trust. More critically, trust in AI is not inherently beneficial, as AI recommendations are not infallible. These findings underscore the nuanced role of explanation quality and suggest that trust can be modulated through the careful design of XAI systems. Conclusions: Excessive trust in incorrect advice generated by AI can adversely impact clinical accuracy, just as can happen when correct advice is distrusted. Future research should focus on refining both cognitive and affect-based measures of trust and on developing strategies to achieve an appropriate balance in terms of trust, preventing both blind trust and undue skepticism. Optimizing trust in AI systems is essential for their effective integration into clinical practice. %M 39476365 %R 10.2196/53207 %U https://ai.jmir.org/2024/1/e53207 %U https://doi.org/10.2196/53207 %U http://www.ncbi.nlm.nih.gov/pubmed/39476365 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51711 %T Evolution of the “Internet Plus Health Care” Mode Enabled by Artificial Intelligence: Development and Application of an Outpatient Triage System %A Yang,Lingrui %A Pang,Jiali %A Zuo,Song %A Xu,Jian %A Jin,Wei %A Zuo,Feng %A Xue,Kui %A Xiao,Zhongzhou %A Peng,Xinwei %A Xu,Jie %A Zhang,Xiaofan %A Chen,Ruiyao %A Luo,Shuqing %A Zhang,Shaoting %A Sun,Xin %+ Clinical Research and Innovation Unit, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, 1665 Kongjiang Road, Shanghai, China, 86 02125077480, sunxin@xinhuamed.com.cn %K artificial intelligence %K triage system %K all department recommendation %K subspecialty department recommendation %K “internet plus healthcare” %K “internet plus health care” %D 2024 %7 30.10.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Although new technologies have increased the efficiency and convenience of medical care, patients still struggle to identify specialized outpatient departments in Chinese tertiary hospitals due to a lack of medical knowledge. Objective: The objective of our study was to develop a precise and subdividable outpatient triage system to improve the experiences and convenience of patient care. Methods: We collected 395,790 electronic medical records (EMRs) and 500 medical dialogue groups. The EMRs were divided into 3 data sets to design and train the triage model (n=387,876, 98%) and test (n=3957, 1%) and validate (n=3957, 1%) it. The triage system was altered based on the current BERT (Bidirectional Encoder Representations from Transformers) framework and evaluated by recommendation accuracies in Xinhua Hospital using the cancellation rates in 2021 and 2022, from October 29 to December 5. Finally, a prospective observational study containing 306 samples was conducted to compare the system’s performance with that of triage nurses, which was evaluated by calculating precision, accuracy, recall of the top 3 recommended departments (recall@3), and time consumption. Results: With 3957 (1%) records each, the testing and validation data sets achieved an accuracy of 0.8945 and 0.8941, respectively. Implemented in Xinhua Hospital, our triage system could accurately recommend 79 subspecialty departments and reduce the number of registration cancellations from 16,037 (3.83%) of the total 418,714 to 15,338 (3.53%) of the total 434200 (P<.05). In comparison to the triage system, the performance of the triage nurses was more accurate (0.9803 vs 0.9153) and precise (0.9213 vs 0.9049) since the system could identify subspecialty departments, whereas triage nurses or even general physicians can only recommend main departments. In addition, our triage system significantly outperformed triage nurses in recall@3 (0.6230 vs 0.5266; P<.001) and time consumption (10.11 vs 14.33 seconds; P<.001). Conclusions: The triage system demonstrates high accuracy in outpatient triage of all departments and excels in subspecialty department recommendations, which could decrease the cancellation rate and time consumption. It also improves the efficiency and convenience of clinical care to fulfill better the usage of medical resources, expand hospital effectiveness, and improve patient satisfaction in Chinese tertiary hospitals. %M 39476375 %R 10.2196/51711 %U https://www.jmir.org/2024/1/e51711 %U https://doi.org/10.2196/51711 %U http://www.ncbi.nlm.nih.gov/pubmed/39476375 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e52897 %T Multifaceted Natural Language Processing Task–Based Evaluation of Bidirectional Encoder Representations From Transformers Models for Bilingual (Korean and English) Clinical Notes: Algorithm Development and Validation %A Kim,Kyungmo %A Park,Seongkeun %A Min,Jeongwon %A Park,Sumin %A Kim,Ju Yeon %A Eun,Jinsu %A Jung,Kyuha %A Park,Yoobin Elyson %A Kim,Esther %A Lee,Eun Young %A Lee,Joonhwan %A Choi,Jinwook %K natural language processing %K NLP %K natural language inference %K reading comprehension %K large language models %K transformer %D 2024 %7 30.10.2024 %9 %J JMIR Med Inform %G English %X Background: The bidirectional encoder representations from transformers (BERT) model has attracted considerable attention in clinical applications, such as patient classification and disease prediction. However, current studies have typically progressed to application development without a thorough assessment of the model’s comprehension of clinical context. Furthermore, limited comparative studies have been conducted on BERT models using medical documents from non–English-speaking countries. Therefore, the applicability of BERT models trained on English clinical notes to non-English contexts is yet to be confirmed. To address these gaps in literature, this study focused on identifying the most effective BERT model for non-English clinical notes. Objective: In this study, we evaluated the contextual understanding abilities of various BERT models applied to mixed Korean and English clinical notes. The objective of this study was to identify the BERT model that excels in understanding the context of such documents. Methods: Using data from 164,460 patients in a South Korean tertiary hospital, we pretrained BERT-base, BERT for Biomedical Text Mining (BioBERT), Korean BERT (KoBERT), and Multilingual BERT (M-BERT) to improve their contextual comprehension capabilities and subsequently compared their performances in 7 fine-tuning tasks. Results: The model performance varied based on the task and token usage. First, BERT-base and BioBERT excelled in tasks using classification ([CLS]) token embeddings, such as document classification. BioBERT achieved the highest F1-score of 89.32. Both BERT-base and BioBERT demonstrated their effectiveness in document pattern recognition, even with limited Korean tokens in the dictionary. Second, M-BERT exhibited a superior performance in reading comprehension tasks, achieving an F1-score of 93.77. Better results were obtained when fewer words were replaced with unknown ([UNK]) tokens. Third, M-BERT excelled in the knowledge inference task in which correct disease names were inferred from 63 candidate disease names in a document with disease names replaced with [MASK] tokens. M-BERT achieved the highest hit@10 score of 95.41. Conclusions: This study highlighted the effectiveness of various BERT models in a multilingual clinical domain. The findings can be used as a reference in clinical and language-based applications. %R 10.2196/52897 %U https://medinform.jmir.org/2024/1/e52897 %U https://doi.org/10.2196/52897 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e59811 %T Perceptions Toward Using Artificial Intelligence and Technology for Asthma Attack Risk Prediction: Qualitative Exploration of Māori Views %A Jayamini,Widana Kankanamge Darsha %A Mirza,Farhaan %A Bidois-Putt,Marie-Claire %A Naeem,M Asif %A Chan,Amy Hai Yan %+ Department of Computer Science, School of Engineering, Computer and Mathematical Sciences, Auckland University of Technology, Building WZ, Level 8, 6th St Paul Street, Auckland, 1010, New Zealand, 64 210504680, darsha.jayamini@autuni.ac.nz %K asthma risk prediction %K artificial intelligence %K machine learning %K māori perceptions %K health system development %K mobile phone %D 2024 %7 30.10.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Asthma is a significant global health issue, impacting over 500,000 individuals in New Zealand and disproportionately affecting Māori communities in New Zealand, who experience worse asthma symptoms and attacks. Digital technologies, including artificial intelligence (AI) and machine learning (ML) models, are increasingly popular for asthma risk prediction. However, these AI models may underrepresent minority ethnic groups and introduce bias, potentially exacerbating disparities. Objective: This study aimed to explore the views and perceptions that Māori have toward using AI and ML technologies for asthma self-management, identify key considerations for developing asthma attack risk prediction models, and ensure Māori are represented in ML models without worsening existing health inequities. Methods: Semistructured interviews were conducted with 20 Māori participants with asthma, 3 male and 17 female, aged 18-76 years. All the interviews were conducted one-on-one, except for 1 interview, which was conducted with 2 participants. Altogether, 10 web-based interviews were conducted, while the rest were kanohi ki te kanohi (face-to-face). A thematic analysis was conducted to identify the themes. Further, sentiment analysis was carried out to identify the sentiments using a pretrained Bidirectional Encoder Representations from Transformers model. Results: We identified four key themes: (1) concerns about AI use, (2) interest in using technology to support asthma, (3) desired characteristics of AI-based systems, and (4) experience with asthma management and opportunities for technology to improve care. AI was relatively unfamiliar to many participants, and some of them expressed concerns about whether AI technology could be trusted, kanohi ki te kanohi interaction, and inadequate knowledge of AI and technology. These concerns are exacerbated by the Māori experience of colonization. Most of the participants were interested in using technology to support their asthma management, and we gained insights into user preferences regarding computer-based health care applications. Participants discussed their experiences, highlighting problems with health care quality and limited access to resources. They also mentioned the factors that trigger their asthma control level. Conclusions: The exploration revealed that there is a need for greater information about AI and technology for Māori communities and a need to address trust issues relating to the use of technology. Expectations in relation to computer-based applications for health purposes were expressed. The research outcomes will inform future investigations on AI and technology to enhance the health of people with asthma, in particular those designed for Indigenous populations in New Zealand. %M 39475765 %R 10.2196/59811 %U https://formative.jmir.org/2024/1/e59811 %U https://doi.org/10.2196/59811 %U http://www.ncbi.nlm.nih.gov/pubmed/39475765 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54710 %T Implications of Big Data Analytics, AI, Machine Learning, and Deep Learning in the Health Care System of Bangladesh: Scoping Review %A Alam,Md Ashraful %A Sajib,Md Refat Uz Zaman %A Rahman,Fariya %A Ether,Saraban %A Hanson,Molly %A Sayeed,Abu %A Akter,Ema %A Nusrat,Nowrin %A Islam,Tanjeena Tahrin %A Raza,Sahar %A Tanvir,K M %A Chisti,Mohammod Jobayer %A Rahman,Qazi Sadeq-ur %A Hossain,Akm %A Layek,MA %A Zaman,Asaduz %A Rana,Juwel %A Rahman,Syed Moshfiqur %A Arifeen,Shams El %A Rahman,Ahmed Ehsanur %A Ahmed,Anisuddin %+ Department of Women's and Children's Health, Uppsala University, Akademiska sjukhuset 751 85, Uppsala, 751 85, Sweden, 46 73 041 98 48, anisuddin.ahmed@kbh.uu.se %K machine learning %K deep learning %K artificial intelligence %K big data analytics %K public health %K health care %K mobile phone %K Bangladesh %D 2024 %7 28.10.2024 %9 Review %J J Med Internet Res %G English %X Background: The rapid advancement of digital technologies, particularly in big data analytics (BDA), artificial intelligence (AI), machine learning (ML), and deep learning (DL), is reshaping the global health care system, including in Bangladesh. The increased adoption of these technologies in health care delivery within Bangladesh has sparked their integration into health care and public health research, resulting in a noticeable surge in related studies. However, a critical gap exists, as there is a lack of comprehensive evidence regarding the research landscape; regulatory challenges; use cases; and the application and adoption of BDA, AI, ML, and DL in the health care system of Bangladesh. This gap impedes the attainment of optimal results. As Bangladesh is a leading implementer of digital technologies, bridging this gap is urgent for the effective use of these advancing technologies. Objective: This scoping review aims to collate (1) the existing research in Bangladesh’s health care system, using the aforementioned technologies and synthesizing their findings, and (2) the limitations faced by researchers in integrating the aforementioned technologies into health care research. Methods: MEDLINE (via PubMed), IEEE Xplore, Scopus, and Embase databases were searched to identify published research articles between January 1, 2000, and September 10, 2023, meeting the following inclusion criteria: (1) any study using any of the BDA, AI, ML, and DL technologies and health care and public health datasets for predicting health issues and forecasting any kind of outbreak; (2) studies primarily focusing on health care and public health issues in Bangladesh; and (3) original research articles published in peer-reviewed journals and conference proceedings written in English. Results: With the initial search, we identified 1653 studies. Following the inclusion and exclusion criteria and full-text review, 4.66% (77/1653) of the articles were finally included in this review. There was a substantial increase in studies over the last 5 years (2017-2023). Among the 77 studies, the majority (n=65, 84%) used ML models. A smaller proportion of studies incorporated AI (4/77, 5%), DL (7/77, 9%), and BDA (1/77, 1%) technologies. Among the reviewed articles, 52% (40/77) relied on primary data, while the remaining 48% (37/77) used secondary data. The primary research areas of focus were infectious diseases (15/77, 19%), noncommunicable diseases (23/77, 30%), child health (11/77, 14%), and mental health (9/77, 12%). Conclusions: This scoping review highlights remarkable progress in leveraging BDA, AI, ML, and DL within Bangladesh’s health care system. The observed surge in studies over the last 5 years underscores the increasing significance of AI and related technologies in health care research. Notably, most (65/77, 84%) studies focused on ML models, unveiling opportunities for advancements in predictive modeling. This review encapsulates the current state of technological integration and propels us into a promising era for the future of digital Bangladesh. %M 39466315 %R 10.2196/54710 %U https://www.jmir.org/2024/1/e54710 %U https://doi.org/10.2196/54710 %U http://www.ncbi.nlm.nih.gov/pubmed/39466315 %0 Journal Article %@ 1947-2579 %I JMIR Publications %V 16 %N %P e59906 %T Data Analytics to Support Policy Making for Noncommunicable Diseases: Scoping Review %A Dritsakis,Giorgos %A Gallos,Ioannis %A Psomiadi,Maria-Elisavet %A Amditis,Angelos %A Dionysiou,Dimitra %+ Institute of Communication and Computer Systems, National Technical University of Athens, 9, Iroon Politechniou Str. Zografou, Athens, 15773, Greece, 30 210772246, giorgos.dritsakis@iccs.gr %K policy making %K public health %K noncommunicable diseases %K data analytics %K digital tools %K descriptive %K predictive %K decision support %K implementation %D 2024 %7 25.10.2024 %9 Review %J Online J Public Health Inform %G English %X Background: There is an emerging need for evidence-based approaches harnessing large amounts of health care data and novel technologies (such as artificial intelligence) to optimize public health policy making. Objective: The aim of this review was to explore the data analytics tools designed specifically for policy making in noncommunicable diseases (NCDs) and their implementation. Methods: A scoping review was conducted after searching the PubMed and IEEE databases for articles published in the last 10 years. Results: Nine articles that presented 7 data analytics tools designed to inform policy making for NCDs were reviewed. The tools incorporated descriptive and predictive analytics. Some tools were designed to include recommendations for decision support, but no pilot studies applying prescriptive analytics have been published. The tools were piloted with various conditions, with cancer being the least studied condition. Implementation of the tools included use cases, pilots, or evaluation workshops that involved policy makers. However, our findings demonstrate very limited real-world use of analytics by policy makers, which is in line with previous studies. Conclusions: Despite the availability of tools designed for different purposes and conditions, data analytics is not widely used to support policy making for NCDs. However, the review demonstrates the value and potential use of data analytics to support policy making. Based on the findings, we make suggestions for researchers developing digital tools to support public health policy making. The findings will also serve as input for the European Union–funded research project ONCODIR developing a policy analytics dashboard for the prevention of colorectal cancer as part of an integrated platform. %M 39454197 %R 10.2196/59906 %U https://ojphi.jmir.org/2024/1/e59906 %U https://doi.org/10.2196/59906 %U http://www.ncbi.nlm.nih.gov/pubmed/39454197 %0 Journal Article %@ 2562-7600 %I JMIR Publications %V 7 %N %P e62678 %T Advancing AI Data Ethics in Nursing: Future Directions for Nursing Practice, Research, and Education %A Ball Dunlap,Patricia A %A Michalowski,Martin %K artificial intelligence %K AI data ethics %K data-centric AI %K nurses %K nursing informatics %K machine learning %K data literacy %K health care AI %K responsible AI %D 2024 %7 25.10.2024 %9 %J JMIR Nursing %G English %X The ethics of artificial intelligence (AI) are increasingly recognized due to concerns such as algorithmic bias, opacity, trust issues, data security, and fairness. Specifically, machine learning algorithms, central to AI technologies, are essential in striving for ethically sound systems that mimic human intelligence. These technologies rely heavily on data, which often remain obscured within complex systems and must be prioritized for ethical collection, processing, and usage. The significance of data ethics in achieving responsible AI was first highlighted in the broader context of health care and subsequently in nursing. This viewpoint explores the principles of data ethics, drawing on relevant frameworks and strategies identified through a formal literature review. These principles apply to real-world and synthetic data in AI and machine-learning contexts. Additionally, the data-centric AI paradigm is briefly examined, emphasizing its focus on data quality and the ethical development of AI solutions that integrate human-centered domain expertise. The ethical considerations specific to nursing are addressed, including 4 recommendations for future directions in nursing practice, research, and education and 2 hypothetical nurse-focused ethical case studies. The primary objectives are to position nurses to actively participate in AI and data ethics, thereby contributing to creating high-quality and relevant data for machine learning applications. %R 10.2196/62678 %U https://nursing.jmir.org/2024/1/e62678 %U https://doi.org/10.2196/62678 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e57750 %T Studies of Artificial Intelligence/Machine Learning Registered on ClinicalTrials.gov: Cross-Sectional Study With Temporal Trends, 2010-2023 %A Maru,Shoko %A Matthias,Michael D %A Kuwatsuru,Ryohei %A Simpson Jr,Ross J %+ Graduate School of Medicine, Juntendo University, 2-1-1 Hongo, Bunkyo-ku, Tokyo, 113‑8421, Japan, 81 338133111, shoko.maru@alumni.griffithuni.edu.au %K artificial intelligence %K machine learning %K deep learning %K trends %K health care %K cross-sectional study %K health disparities %K data-source disparities %K publication bias %K registry %K ClinicalTrials.gov %D 2024 %7 25.10.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: The rapid growth of research in artificial intelligence (AI) and machine learning (ML) continues. However, it is unclear whether this growth reflects an increase in desirable study attributes or merely perpetuates the same issues previously raised in the literature. Objective: This study aims to evaluate temporal trends in AI/ML studies over time and identify variations that are not apparent from aggregated totals at a single point in time. Methods: We identified AI/ML studies registered on ClinicalTrials.gov with start dates between January 1, 2010, and December 31, 2023. Studies were included if AI/ML-specific terms appeared in the official title, detailed description, brief summary, intervention, primary outcome, or sponsors’ keywords. Studies registered as systematic reviews and meta-analyses were excluded. We reported trends in AI/ML studies over time, along with study characteristics that were fast-growing and those that remained unchanged during 2010-2023. Results: Of 3106 AI/ML studies, only 7.6% (n=235) were regulated by the US Food and Drug Administration. The most common study characteristics were randomized (56.2%; 670/1193; interventional) and prospective (58.9%; 1126/1913; observational) designs; a focus on diagnosis (28.2%; 335/1190) and treatment (24.4%; 290/1190); hospital/clinic (44.2%; 1373/3106) or academic (28%; 869/3106) sponsorship; and neoplasm (12.9%; 420/3245), nervous system (12.2%; 395/3245), cardiovascular (11.1%; 356/3245) or pathological conditions (10%; 325/3245; multiple counts per study possible). Enrollment data were skewed to the right: maximum 13,977,257; mean 16,962 (SD 288,155); median 255 (IQR 80-1000). The most common size category was 101-1000 (44.8%; 1372/3061; excluding withdrawn or missing), but large studies (n>1000) represented 24.1% (738/3061) of all studies: 29% (551/1898) of observational studies and 16.1% (187/1163) of trials. Study locations were predominantly in high-income countries (75.3%; 2340/3106), followed by upper-middle-income (21.7%; 675/3106), lower-middle-income (2.8%; 88/3106), and low-income countries (0.1%; 3/3106). The fastest-growing characteristics over time were high-income countries (location); Europe, Asia, and North America (location); diagnosis and treatment (primary purpose); hospital/clinic and academia (lead sponsor); randomized and prospective designs; and the 1-100 and 101-1000 size categories. Only 5.6% (47/842) of completed studies had results available on ClinicalTrials.gov, and this pattern persisted. Over time, there was an increase in not only the number of newly initiated studies, but also the number of completed studies without posted results. Conclusions: Much of the rapid growth in AI/ML studies comes from high-income countries in high-resource settings, albeit with a modest increase in upper-middle-income countries (mostly China). Lower-middle-income or low-income countries remain poorly represented. The increase in randomized or prospective designs, along with 738 large studies (n>1000), mostly ongoing, may indicate that enough studies are shifting from an in silico evaluation stage toward a prospective comparative evaluation stage. However, the ongoing limited availability of basic results on ClinicalTrials.gov contrasts with this field’s rapid advancements and the public registry’s role in reducing publication and outcome reporting biases. %M 39454187 %R 10.2196/57750 %U https://www.jmir.org/2024/1/e57750 %U https://doi.org/10.2196/57750 %U http://www.ncbi.nlm.nih.gov/pubmed/39454187 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e59501 %T Role of Synchronous, Moderated, and Anonymous Peer Support Chats on Reducing Momentary Loneliness in Older Adults: Retrospective Observational Study %A Dana,Zara %A Nagra,Harpreet %A Kilby,Kimberly %+ Supportiv, 2222 Harold Way, Berkeley, CA, 94704, United States, 1 800 845 0015, harpreet@supportiv.com %K digital peer support %K social loneliness %K chat-based interactions %K older adults %D 2024 %7 25.10.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Older adults have a high rate of loneliness, which contributes to increased psychosocial risk, medical morbidity, and mortality. Digital emotional support interventions provide a convenient and rapid avenue for additional support. Digital peer support interventions for emotional struggles contrast the usual provider-based clinical care models because they offer more accessible, direct support for empowerment, highlighting the users’ autonomy, competence, and relatedness. Objective: This study aims to examine a novel anonymous and synchronous peer-to-peer digital chat service facilitated by trained human moderators. The experience of a cohort of 699 adults aged ≥65 years was analyzed to determine (1) if participation, alone, led to measurable aggregate change in momentary loneliness and optimism and (2) the impact of peers on momentary loneliness and optimism. Methods: Participants were each prompted with a single question: “What’s your struggle?” Using a proprietary artificial intelligence model, the free-text response automatched the respondent based on their self-expressed emotional struggle to peers and a chat moderator. Exchanged messages were analyzed to quantitatively measure the change in momentary loneliness and optimism using a third-party, public, natural language processing model (GPT-4 [OpenAI]). The sentiment change analysis was initially performed at the individual level and then averaged across all users with similar emotion types to produce a statistically significant (P<.05) collective trend per emotion. To evaluate the peer impact on momentary loneliness and optimism, we performed propensity matching to align the moderator+single user and moderator+small group chat cohorts and then compare the emotion trends between the matched cohorts. Results: Loneliness and optimism trends significantly improved after 8 (P=.02) to 9 minutes (P=.03) into the chat. We observed a significant improvement in the momentary loneliness and optimism trends between the moderator+small group compared to the moderator+single user chat cohort after 19 (P=.049) and 21 minutes (P=.04) for optimism and loneliness, respectively. Conclusions: Chat-based peer support may be a viable intervention to help address momentary loneliness in older adults and present an alternative to traditional care. The promising results support the need for further study to expand the evidence for such cost-effective options. %M 39453688 %R 10.2196/59501 %U https://formative.jmir.org/2024/1/e59501 %U https://doi.org/10.2196/59501 %U http://www.ncbi.nlm.nih.gov/pubmed/39453688 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e58418 %T Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study %A So,Jae-hee %A Chang,Joonhwan %A Kim,Eunji %A Na,Junho %A Choi,JiYeon %A Sohn,Jy-yong %A Kim,Byung-Hoon %A Chu,Sang Hui %+ Department of Applied Statistics, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea, 82 2 2123 2472, jysohn1108@gmail.com %K large language model %K psychiatric interview %K interview summarization %K symptom delineation %D 2024 %7 24.10.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Recent advancements in large language models (LLMs) have accelerated their use across various domains. Psychiatric interviews, which are goal-oriented and structured, represent a significantly underexplored area where LLMs can provide substantial value. In this study, we explore the application of LLMs to enhance psychiatric interviews by analyzing counseling data from North Korean defectors who have experienced traumatic events and mental health issues. Objective: This study aims to investigate whether LLMs can (1) delineate parts of the conversation that suggest psychiatric symptoms and identify those symptoms, and (2) summarize stressors and symptoms based on the interview dialogue transcript. Methods: Given the interview transcripts, we align the LLMs to perform 3 tasks: (1) extracting stressors from the transcripts, (2) delineating symptoms and their indicative sections, and (3) summarizing the patients based on the extracted stressors and symptoms. These 3 tasks address the 2 objectives, where delineating symptoms is based on the output from the second task, and generating the summary of the interview incorporates the outputs from all 3 tasks. In this context, the transcript data were labeled by mental health experts for the training and evaluation of the LLMs. Results: First, we present the performance of LLMs in estimating (1) the transcript sections related to psychiatric symptoms and (2) the names of the corresponding symptoms. In the zero-shot inference setting using the GPT-4 Turbo model, 73 out of 102 transcript segments demonstrated a recall mid-token distance d<20 for estimating the sections associated with the symptoms. For evaluating the names of the corresponding symptoms, the fine-tuning method demonstrates a performance advantage over the zero-shot inference setting of the GPT-4 Turbo model. On average, the fine-tuning method achieves an accuracy of 0.82, a precision of 0.83, a recall of 0.82, and an F1-score of 0.82. Second, the transcripts are used to generate summaries for each interviewee using LLMs. This generative task was evaluated using metrics such as Generative Evaluation (G-Eval) and Bidirectional Encoder Representations from Transformers Score (BERTScore). The summaries generated by the GPT-4 Turbo model, utilizing both symptom and stressor information, achieve high average G-Eval scores: coherence of 4.66, consistency of 4.73, fluency of 2.16, and relevance of 4.67. Furthermore, it is noted that the use of retrieval-augmented generation did not lead to a significant improvement in performance. Conclusions: LLMs, using either (1) appropriate prompting techniques or (2) fine-tuning methods with data labeled by mental health experts, achieved an accuracy of over 0.8 for the symptom delineation task when measured across all segments in the transcript. Additionally, they attained a G-Eval score of over 4.6 for coherence in the summarization task. This research contributes to the emerging field of applying LLMs in psychiatric interviews and demonstrates their potential effectiveness in assisting mental health practitioners. %M 39447159 %R 10.2196/58418 %U https://formative.jmir.org/2024/1/e58418 %U https://doi.org/10.2196/58418 %U http://www.ncbi.nlm.nih.gov/pubmed/39447159 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54242 %T Gender Bias in AI's Perception of Cardiovascular Risk %A Achtari,Margaux %A Salihu,Adil %A Muller,Olivier %A Abbé,Emmanuel %A Clair,Carole %A Schwarz,Joëlle %A Fournier,Stephane %+ Department of Cardiology, Lausanne University Hospital and University of Lausanne, 21 Rue Du Bugnon, Lausanne, CH-1011, Switzerland, 41 21 314 00 12, stephane.fournier@chuv.ch %K artificial intelligence %K gender equity %K coronary artery disease %K AI %K cardiovascular %K risk %K CAD %K artery %K coronary %K chatbot: health care %K men: women %K gender bias %K gender %D 2024 %7 22.10.2024 %9 Research Letter %J J Med Internet Res %G English %X The study investigated gender bias in GPT-4’s assessment of coronary artery disease risk by presenting identical clinical vignettes of men and women with and without psychiatric comorbidities. Results suggest that psychiatric conditions may influence GPT-4’s coronary artery disease risk assessment among men and women. %M 39437384 %R 10.2196/54242 %U https://www.jmir.org/2024/1/e54242 %U https://doi.org/10.2196/54242 %U http://www.ncbi.nlm.nih.gov/pubmed/39437384 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e60164 %T Health Care Language Models and Their Fine-Tuning for Information Extraction: Scoping Review %A Nunes,Miguel %A Bone,Joao %A Ferreira,Joao C %A Elvas,Luis B %+ Department of Logistics, Molde, University College, Britvegen 2, Noruega, Molde, 6410, Norway, 47 969152334, luis.m.elvas@himolde.no %K language model %K information extraction %K healthcare %K PRISMA-ScR %K scoping literature review %K transformers %K natural language processing %K European Portuguese %D 2024 %7 21.10.2024 %9 Review %J JMIR Med Inform %G English %X Background: In response to the intricate language, specialized terminology outside everyday life, and the frequent presence of abbreviations and acronyms inherent in health care text data, domain adaptation techniques have emerged as crucial to transformer-based models. This refinement in the knowledge of the language models (LMs) allows for a better understanding of the medical textual data, which results in an improvement in medical downstream tasks, such as information extraction (IE). We have identified a gap in the literature regarding health care LMs. Therefore, this study presents a scoping literature review investigating domain adaptation methods for transformers in health care, differentiating between English and non-English languages, focusing on Portuguese. Most specifically, we investigated the development of health care LMs, with the aim of comparing Portuguese with other more developed languages to guide the path of a non–English-language with fewer resources. Objective: This study aimed to research health care IE models, regardless of language, to understand the efficacy of transformers and what are the medical entities most commonly extracted. Methods: This scoping review was conducted using the PRISMA-ScR (Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews) methodology on Scopus and Web of Science Core Collection databases. Only studies that mentioned the creation of health care LMs or health care IE models were included, while large language models (LLMs) were excluded. The latest were not included since we wanted to research LMs and not LLMs, which are architecturally different and have distinct purposes. Results: Our search query retrieved 137 studies, 60 of which met the inclusion criteria, and none of them were systematic literature reviews. English and Chinese are the languages with the most health care LMs developed. These languages already have disease-specific LMs, while others only have general–health care LMs. European Portuguese does not have any public health care LM and should take examples from other languages to develop, first, general-health care LMs and then, in an advanced phase, disease-specific LMs. Regarding IE models, transformers were the most commonly used method, and named entity recognition was the most popular topic, with only a few studies mentioning Assertion Status or addressing medical lexical problems. The most extracted entities were diagnosis, posology, and symptoms. Conclusions: The findings indicate that domain adaptation is beneficial, achieving better results in downstream tasks. Our analysis allowed us to understand that the use of transformers is more developed for the English and Chinese languages. European Portuguese lacks relevant studies and should draw examples from other non-English languages to develop these models and drive progress in AI. Health care professionals could benefit from highlighting medically relevant information and optimizing the reading of the textual data, or this information could be used to create patient medical timelines, allowing for profiling. %M 39432345 %R 10.2196/60164 %U https://medinform.jmir.org/2024/1/e60164 %U https://doi.org/10.2196/60164 %U http://www.ncbi.nlm.nih.gov/pubmed/39432345 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 11 %N %P e57400 %T Large Language Models for Mental Health Applications: Systematic Review %A Guo,Zhijun %A Lai,Alvina %A Thygesen,Johan H %A Farrington,Joseph %A Keen,Thomas %A Li,Kezhi %+ Institute of Health Informatics University College, London, 222 Euston Road, London, NW1 2DA, United Kingdom, 44 7859 995590, ken.li@ucl.ac.uk %K large language models %K mental health %K digital health care %K ChatGPT %K Bidirectional Encoder Representations from Transformers %K BERT %D 2024 %7 18.10.2024 %9 Review %J JMIR Ment Health %G English %X Background: Large language models (LLMs) are advanced artificial neural networks trained on extensive datasets to accurately understand and generate natural language. While they have received much attention and demonstrated potential in digital health, their application in mental health, particularly in clinical settings, has generated considerable debate. Objective: This systematic review aims to critically assess the use of LLMs in mental health, specifically focusing on their applicability and efficacy in early screening, digital interventions, and clinical settings. By systematically collating and assessing the evidence from current studies, our work analyzes models, methodologies, data sources, and outcomes, thereby highlighting the potential of LLMs in mental health, the challenges they present, and the prospects for their clinical use. Methods: Adhering to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, this review searched 5 open-access databases: MEDLINE (accessed by PubMed), IEEE Xplore, Scopus, JMIR, and ACM Digital Library. Keywords used were (mental health OR mental illness OR mental disorder OR psychiatry) AND (large language models). This study included articles published between January 1, 2017, and April 30, 2024, and excluded articles published in languages other than English. Results: In total, 40 articles were evaluated, including 15 (38%) articles on mental health conditions and suicidal ideation detection through text analysis, 7 (18%) on the use of LLMs as mental health conversational agents, and 18 (45%) on other applications and evaluations of LLMs in mental health. LLMs show good effectiveness in detecting mental health issues and providing accessible, destigmatized eHealth services. However, assessments also indicate that the current risks associated with clinical use might surpass their benefits. These risks include inconsistencies in generated text; the production of hallucinations; and the absence of a comprehensive, benchmarked ethical framework. Conclusions: This systematic review examines the clinical applications of LLMs in mental health, highlighting their potential and inherent risks. The study identifies several issues: the lack of multilingual datasets annotated by experts, concerns regarding the accuracy and reliability of generated content, challenges in interpretability due to the “black box” nature of LLMs, and ongoing ethical dilemmas. These ethical concerns include the absence of a clear, benchmarked ethical framework; data privacy issues; and the potential for overreliance on LLMs by both physicians and patients, which could compromise traditional medical practices. As a result, LLMs should not be considered substitutes for professional mental health services. However, the rapid development of LLMs underscores their potential as valuable clinical aids, emphasizing the need for continued research and development in this area. Trial Registration: PROSPERO CRD42024508617; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=508617 %M 39423368 %R 10.2196/57400 %U https://mental.jmir.org/2024/1/e57400 %U https://doi.org/10.2196/57400 %U http://www.ncbi.nlm.nih.gov/pubmed/39423368 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e62963 %T Describing the Framework for AI Tool Assessment in Mental Health and Applying It to a Generative AI Obsessive-Compulsive Disorder Platform: Tutorial %A Golden,Ashleigh %A Aboujaoude,Elias %+ Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, 401 Quarry Rd, Stanford, CA, 94304, United States, 1 650 498 9111, eaboujaoude@stanford.edu %K artificial intelligence %K ChatGPT %K generative artificial intelligence %K generative AI %K large language model %K chatbots %K machine learning %K digital health %K telemedicine %K psychotherapy %K obsessive-compulsive disorder %D 2024 %7 18.10.2024 %9 Tutorial %J JMIR Form Res %G English %X As artificial intelligence (AI) technologies occupy a bigger role in psychiatric and psychological care and become the object of increased research attention, industry investment, and public scrutiny, tools for evaluating their clinical, ethical, and user-centricity standards have become essential. In this paper, we first review the history of rating systems used to evaluate AI mental health interventions. We then describe the recently introduced Framework for AI Tool Assessment in Mental Health (FAITA-Mental Health), whose scoring system allows users to grade AI mental health platforms on key domains, including credibility, user experience, crisis management, user agency, health equity, and transparency. Finally, we demonstrate the use of FAITA-Mental Health scale by systematically applying it to OCD Coach, a generative AI tool readily available on the ChatGPT store and designed to help manage the symptoms of obsessive-compulsive disorder. The results offer insights into the utility and limitations of FAITA-Mental Health when applied to “real-world” generative AI platforms in the mental health space, suggesting that the framework effectively identifies key strengths and gaps in AI-driven mental health tools, particularly in areas such as credibility, user experience, and acute crisis management. The results also highlight the need for stringent standards to guide AI integration into mental health care in a manner that is not only effective but also safe and protective of the users’ rights and welfare. %M 39423001 %R 10.2196/62963 %U https://formative.jmir.org/2024/1/e62963 %U https://doi.org/10.2196/62963 %U http://www.ncbi.nlm.nih.gov/pubmed/39423001 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e47814 %T Fine-Tuned Bidirectional Encoder Representations From Transformers Versus ChatGPT for Text-Based Outpatient Department Recommendation: Comparative Study %A Jo,Eunbeen %A Yoo,Hakje %A Kim,Jong-Ho %A Kim,Young-Min %A Song,Sanghoun %A Joo,Hyung Joon %+ Department of Medical Informatics, Korea University College of Medicine, 73, Inchon-ro, Seoul, 02841, Republic of Korea, 82 2 920 5445, drjoohj@gmail.com %K natural language processing %K bidirectional encoder representations from transformers %K large language model %K generative pretrained transformer %K medical specialty prediction %K quality of care %K health care application %K ChatGPT %K BERT %K AI technology %K conversational agent %K AI %K artificial intelligence %K chatbot %K application %K health care %D 2024 %7 18.10.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Patients often struggle with determining which outpatient specialist to consult based on their symptoms. Natural language processing models in health care offer the potential to assist patients in making these decisions before visiting a hospital. Objective: This study aimed to evaluate the performance of ChatGPT in recommending medical specialties for medical questions. Methods: We used a dataset of 31,482 medical questions, each answered by doctors and labeled with the appropriate medical specialty from the health consultation board of NAVER (NAVER Corp), a major Korean portal. This dataset includes 27 distinct medical specialty labels. We compared the performance of the fine-tuned Korean Medical bidirectional encoder representations from transformers (KM-BERT) and ChatGPT models by analyzing their ability to accurately recommend medical specialties. We categorized responses from ChatGPT into those matching the 27 predefined specialties and those that did not. Both models were evaluated using performance metrics of accuracy, precision, recall, and F1-score. Results: ChatGPT demonstrated an answer avoidance rate of 6.2% but provided accurate medical specialty recommendations with explanations that elucidated the underlying pathophysiology of the patient’s symptoms. It achieved an accuracy of 0.939, precision of 0.219, recall of 0.168, and an F1-score of 0.134. In contrast, the KM-BERT model, fine-tuned for the same task, outperformed ChatGPT with an accuracy of 0.977, precision of 0.570, recall of 0.652, and an F1-score of 0.587. Conclusions: Although ChatGPT did not surpass the fine-tuned KM-BERT model in recommending the correct medical specialties, it showcased notable advantages as a conversational artificial intelligence model. By providing detailed, contextually appropriate explanations, ChatGPT has the potential to significantly enhance patient comprehension of medical information, thereby improving the medical referral process. %M 39423004 %R 10.2196/47814 %U https://formative.jmir.org/2024/1/e47814 %U https://doi.org/10.2196/47814 %U http://www.ncbi.nlm.nih.gov/pubmed/39423004 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e57727 %T Semiology Extraction and Machine Learning–Based Classification of Electronic Health Records for Patients With Epilepsy: Retrospective Analysis %A Xia,Yilin %A He,Mengqiao %A Basang,Sijia %A Sha,Leihao %A Huang,Zijie %A Jin,Ling %A Duan,Yifei %A Tang,Yusha %A Li,Hua %A Lai,Wanlin %A Chen,Lei %K epilepsy %K natural language processing %K machine learning %K electronic health record %K unstructured text %K semiology %K health records %K retrospective analysis %K diagnosis %K treatment %K decision support tools %K symptom %K ontology %K China %K Chinese %K seizure %D 2024 %7 17.10.2024 %9 %J JMIR Med Inform %G English %X Background: Obtaining and describing semiology efficiently and classifying seizure types correctly are crucial for the diagnosis and treatment of epilepsy. Nevertheless, there exists an inadequacy in related informatics resources and decision support tools. Objective: We developed a symptom entity extraction tool and an epilepsy semiology ontology (ESO) and used machine learning to achieve an automated binary classification of epilepsy in this study. Methods: Using present history data of electronic health records from the Southwest Epilepsy Center in China, we constructed an ESO and a symptom-entity extraction tool to extract seizure duration, seizure symptoms, and seizure frequency from the unstructured text by combining manual annotation with natural language processing techniques. In addition, we achieved automatic classification of patients in the study cohort with high accuracy based on the extracted seizure feature data using multiple machine learning methods. Results: Data included present history from 10,925 cases between 2010 and 2020. Six annotators labeled a total of 2500 texts to obtain 5844 words of semiology and construct an ESO with 702 terms. Based on the ontology, the extraction tool achieved an accuracy rate of 85% in symptom extraction. Furthermore, we trained a stacking ensemble learning model combining XGBoost and random forest with an F1-score of 75.03%. The random forest model had the highest area under the curve (0.985). Conclusions: This work demonstrated the feasibility of natural language processing–assisted structural extraction of epilepsy medical record texts and downstream tasks, providing open ontology resources for subsequent related work. %R 10.2196/57727 %U https://medinform.jmir.org/2024/1/e57727 %U https://doi.org/10.2196/57727 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e59782 %T Evaluating Medical Entity Recognition in Health Care: Entity Model Quantitative Study %A Liu,Shengyu %A Wang,Anran %A Xiu,Xiaolei %A Zhong,Ming %A Wu,Sizhu %+ Department of Medical Data Sharing, Institute of Medical Information & Library, Chinese Academy of Medical Sciences & Peking Union Medical College, 3 Yabao Road, Chaoyang District, Beijing, 100020, China, 86 10 5232 8760, wu.sizhu@imicams.ac.cn %K natural language processing %K NLP %K model evaluation %K macrofactors %K medical named entity recognition models %D 2024 %7 17.10.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Named entity recognition (NER) models are essential for extracting structured information from unstructured medical texts by identifying entities such as diseases, treatments, and conditions, enhancing clinical decision-making and research. Innovations in machine learning, particularly those involving Bidirectional Encoder Representations From Transformers (BERT)–based deep learning and large language models, have significantly advanced NER capabilities. However, their performance varies across medical datasets due to the complexity and diversity of medical terminology. Previous studies have often focused on overall performance, neglecting specific challenges in medical contexts and the impact of macrofactors like lexical composition on prediction accuracy. These gaps hinder the development of optimized NER models for medical applications. Objective: This study aims to meticulously evaluate the performance of various NER models in the context of medical text analysis, focusing on how complex medical terminology affects entity recognition accuracy. Additionally, we explored the influence of macrofactors on model performance, seeking to provide insights for refining NER models and enhancing their reliability for medical applications. Methods: This study comprehensively evaluated 7 NER models—hidden Markov models, conditional random fields, BERT for Biomedical Text Mining, Big Transformer Models for Efficient Long-Sequence Attention, Decoding-enhanced BERT with Disentangled Attention, Robustly Optimized BERT Pretraining Approach, and Gemma—across 3 medical datasets: Revised Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA), BioCreative V CDR, and Anatomical Entity Mention (AnatEM). The evaluation focused on prediction accuracy, resource use (eg, central processing unit and graphics processing unit use), and the impact of fine-tuning hyperparameters. The macrofactors affecting model performance were also screened using the multilevel factor elimination algorithm. Results: The fine-tuned BERT for Biomedical Text Mining, with balanced resource use, generally achieved the highest prediction accuracy across the Revised JNLPBA and AnatEM datasets, with microaverage (AVG_MICRO) scores of 0.932 and 0.8494, respectively, highlighting its superior proficiency in identifying medical entities. Gemma, fine-tuned using the low-rank adaptation technique, achieved the highest accuracy on the BioCreative V CDR dataset with an AVG_MICRO score of 0.9962 but exhibited variability across the other datasets (AVG_MICRO scores of 0.9088 on the Revised JNLPBA and 0.8029 on AnatEM), indicating a need for further optimization. In addition, our analysis revealed that 2 macrofactors, entity phrase length and the number of entity words in each entity phrase, significantly influenced model performance. Conclusions: This study highlights the essential role of NER models in medical informatics, emphasizing the imperative for model optimization via precise data targeting and fine-tuning. The insights from this study will notably improve clinical decision-making and facilitate the creation of more sophisticated and effective medical NER models. %M 39419501 %R 10.2196/59782 %U https://medinform.jmir.org/2024/1/e59782 %U https://doi.org/10.2196/59782 %U http://www.ncbi.nlm.nih.gov/pubmed/39419501 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e58987 %T Mapping the Evolution of Digital Health Research: Bibliometric Overview of Research Hotspots, Trends, and Collaboration of Publications in JMIR (1999-2024) %A Hu,Jing %A Li,Chong %A Ge,Yanlei %A Yang,Jingyi %A Zhu,Siyi %A He,Chengqi %+ Rehabilitation Medicine Center, West China Hospital, Sichuan University, #37 Guoxue Alley, Wuhou District, Chengdu, 610041, China, 86 28 8542 2847, hxkfhcq2015@126.com %K JMIR %K bibliometric analysis %K ehealth %K digital health %K medical informatics %K health informatics %K open science %K publishing %D 2024 %7 17.10.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: While bibliometric studies of individual journals have been conducted, to the best of our knowledge, bibliometric mapping has not yet been utilized to analyze the literature published by the Journal of Medical Internet Research (JMIR). Objective: In celebration of the journal’s 25th anniversary, this study aimed to review the entire collection of JMIR publications from 1999 to 2024 and provide a comprehensive overview of the main publication characteristics. Methods: This study included papers published in JMIR during the 25-year period from 1999 to 2024. The data were analyzed using CiteSpace, VOSviewer, and the “Bibliometrix” package in R. Through descriptive bibliometrics, we examined the dynamics and trend patterns of JMIR literature production and identified the most prolific authors, papers, institutions, and countries. Bibliometric maps were used to visualize the content of published articles and to identify the most prominent research terms and topics, along with their evolution. A bibliometric network map was constructed to determine the hot research topics over the past 25 years. Results: This study revealed positive trends in literature production, with both the total number of publications and the average number of citations increasing over the years. And the global COVID-19 pandemic induced an explosive rise in the number of publications in JMIR. The most productive institutions were predominantly from the United States, which ranked highest in successful publications within the journal. The editor-in-chief of JMIR was identified as a pioneer in this field. The thematic analysis indicated that the most prolific topics aligned with the primary aims and scope of the journal. Currently and in the foreseeable future, the main themes of JMIR include “artificial intelligence,” “patient empowerment,” and “victimization.” Conclusions: This bibliometric study highlighted significant contributions to digital health by identifying key research trends, themes, influential authors, and collaborations. The findings underscore the necessity to enhance publications from developing countries, improve gender diversity among authors, and expand the range of research topics explored in the journal. %M 39419496 %R 10.2196/58987 %U https://www.jmir.org/2024/1/e58987 %U https://doi.org/10.2196/58987 %U http://www.ncbi.nlm.nih.gov/pubmed/39419496 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 11 %N %P e58011 %T An Ethical Perspective on the Democratization of Mental Health With Generative AI %A Elyoseph,Zohar %A Gur,Tamar %A Haber,Yuval %A Simon,Tomer %A Angert,Tal %A Navon,Yuval %A Tal,Amir %A Asman,Oren %K ethics %K generative artificial intelligence %K generative AI %K mental health %K ChatGPT %K large language model %K LLM %K digital mental health %K machine learning %K AI %K technology %K accessibility %K knowledge %K GenAI %D 2024 %7 17.10.2024 %9 %J JMIR Ment Health %G English %X Knowledge has become more open and accessible to a large audience with the “democratization of information” facilitated by technology. This paper provides a sociohistorical perspective for the theme issue “Responsible Design, Integration, and Use of Generative AI in Mental Health.” It evaluates ethical considerations in using generative artificial intelligence (GenAI) for the democratization of mental health knowledge and practice. It explores the historical context of democratizing information, transitioning from restricted access to widespread availability due to the internet, open-source movements, and most recently, GenAI technologies such as large language models. The paper highlights why GenAI technologies represent a new phase in the democratization movement, offering unparalleled access to highly advanced technology as well as information. In the realm of mental health, this requires delicate and nuanced ethical deliberation. Including GenAI in mental health may allow, among other things, improved accessibility to mental health care, personalized responses, and conceptual flexibility, and could facilitate a flattening of traditional hierarchies between health care providers and patients. At the same time, it also entails significant risks and challenges that must be carefully addressed. To navigate these complexities, the paper proposes a strategic questionnaire for assessing artificial intelligence–based mental health applications. This tool evaluates both the benefits and the risks, emphasizing the need for a balanced and ethical approach to GenAI integration in mental health. The paper calls for a cautious yet positive approach to GenAI in mental health, advocating for the active engagement of mental health professionals in guiding GenAI development. It emphasizes the importance of ensuring that GenAI advancements are not only technologically sound but also ethically grounded and patient-centered. %R 10.2196/58011 %U https://mental.jmir.org/2024/1/e58011 %U https://doi.org/10.2196/58011 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e44494 %T Reinforcement Learning to Optimize Ventilator Settings for Patients on Invasive Mechanical Ventilation: Retrospective Study %A Liu,Siqi %A Xu,Qianyi %A Xu,Zhuoyang %A Liu,Zhuo %A Sun,Xingzhi %A Xie,Guotong %A Feng,Mengling %A See,Kay Choong %+ Saw Swee Hock School of Public Health, National University of Singapore, 12 Science Drive 2, Singapore, 117549, Singapore, 65 65164984, ephfm@nus.edu.sg %K mechanical ventilation %K reinforcement learning %K artificial intelligence %K validation study %K critical care %K treatment %K intensive care unit %K critically ill %K patient %K monitoring %K database %K mortality rate %K decision support %K support tool %K survival %K prognosis %K respiratory support %D 2024 %7 16.10.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: One of the significant changes in intensive care medicine over the past 2 decades is the acknowledgment that improper mechanical ventilation settings substantially contribute to pulmonary injury in critically ill patients. Artificial intelligence (AI) solutions can optimize mechanical ventilation settings in intensive care units (ICUs) and improve patient outcomes. Specifically, machine learning algorithms can be trained on large datasets of patient information and mechanical ventilation settings. These algorithms can then predict patient responses to different ventilation strategies and suggest personalized ventilation settings for individual patients. Objective: In this study, we aimed to design and evaluate an AI solution that could tailor an optimal ventilator strategy for each critically ill patient who requires mechanical ventilation. Methods: We proposed a reinforcement learning–based AI solution using observational data from multiple ICUs in the United States. The primary outcome was hospital mortality. Secondary outcomes were the proportion of optimal oxygen saturation and the proportion of optimal mean arterial blood pressure. We trained our AI agent to recommend low, medium, and high levels of 3 ventilator settings—positive end-expiratory pressure, fraction of inspired oxygen, and ideal body weight–adjusted tidal volume—according to patients’ health conditions. We defined a policy as rules guiding ventilator setting changes given specific clinical scenarios. Off-policy evaluation metrics were applied to evaluate the AI policy. Results: We studied 21,595 and 5105 patients’ ICU stays from the e-Intensive Care Unit Collaborative Research (eICU) and Medical Information Mart for Intensive Care IV (MIMIC-IV) databases, respectively. Using the learned AI policy, we estimated the hospital mortality rate (eICU 12.1%, SD 3.1%; MIMIC-IV 29.1%, SD 0.9%), the proportion of optimal oxygen saturation (eICU 58.7%, SD 4.7%; MIMIC-IV 49%, SD 1%), and the proportion of optimal mean arterial blood pressure (eICU 31.1%, SD 4.5%; MIMIC-IV 41.2%, SD 1%). Based on multiple quantitative and qualitative evaluation metrics, our proposed AI solution outperformed observed clinical practice. Conclusions: Our study found that customizing ventilation settings for individual patients led to lower estimated hospital mortality rates compared to actual rates. This highlights the potential effectiveness of using reinforcement learning methodology to develop AI models that analyze complex clinical data for optimizing treatment parameters. Additionally, our findings suggest the integration of this model into a clinical decision support system for refining ventilation settings, supporting the need for prospective validation trials. %M 39219230 %R 10.2196/44494 %U https://www.jmir.org/2024/1/e44494 %U https://doi.org/10.2196/44494 %U http://www.ncbi.nlm.nih.gov/pubmed/39219230 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e53505 %T The Dual Nature of AI in Information Dissemination: Ethical Considerations %A Germani,Federico %A Spitale,Giovanni %A Biller-Andorno,Nikola %+ Institute of Biomedical Ethics and History of Medicine, University of Zurich, Switzerland, Winterthurerstrasse 30, Zurich, 8006, Switzerland, 41 44 634 40 81, biller-andorno@ibme.uzh.ch %K AI %K bioethics %K infodemic management %K disinformation %K artificial intelligence %K ethics %K ethical %K infodemic %K infodemics %K public health %K misinformation %K information dissemination %K information literacy %D 2024 %7 15.10.2024 %9 Viewpoint %J JMIR AI %G English %X Infodemics pose significant dangers to public health and to the societal fabric, as the spread of misinformation can have far-reaching consequences. While artificial intelligence (AI) systems have the potential to craft compelling and valuable information campaigns with positive repercussions for public health and democracy, concerns have arisen regarding the potential use of AI systems to generate convincing disinformation. The consequences of this dual nature of AI, capable of both illuminating and obscuring the information landscape, are complex and multifaceted. We contend that the rapid integration of AI into society demands a comprehensive understanding of its ethical implications and the development of strategies to harness its potential for the greater good while mitigating harm. Thus, in this paper we explore the ethical dimensions of AI’s role in information dissemination and impact on public health, arguing that potential strategies to deal with AI and disinformation encompass generating regulated and transparent data sets used to train AI models, regulating content outputs, and promoting information literacy. %M 39405099 %R 10.2196/53505 %U https://ai.jmir.org/2024/1/e53505 %U https://doi.org/10.2196/53505 %U http://www.ncbi.nlm.nih.gov/pubmed/39405099 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 11 %N %P e60589 %T Use of AI in Mental Health Care: Community and Mental Health Professionals Survey %A Cross,Shane %A Bell,Imogen %A Nicholas,Jennifer %A Valentine,Lee %A Mangelsdorf,Shaminka %A Baker,Simon %A Titov,Nick %A Alvarez-Jimenez,Mario %K mental health %K health care %K AI %K community members %K mental health professional %K web-based survey %K Australia %K descriptive statistic %K thematic analysis %K cost reduction %K data security %K digital health %K digital intervention %K artificial intelligence %D 2024 %7 11.10.2024 %9 %J JMIR Ment Health %G English %X Background: Artificial intelligence (AI) has been increasingly recognized as a potential solution to address mental health service challenges by automating tasks and providing new forms of support. Objective: This study is the first in a series which aims to estimate the current rates of AI technology use as well as perceived benefits, harms, and risks experienced by community members (CMs) and mental health professionals (MHPs). Methods: This study involved 2 web-based surveys conducted in Australia. The surveys collected data on demographics, technology comfort, attitudes toward AI, specific AI use cases, and experiences of benefits and harms from AI use. Descriptive statistics were calculated, and thematic analysis of open-ended responses were conducted. Results: The final sample consisted of 107 CMs and 86 MHPs. General attitudes toward AI varied, with CMs reporting neutral and MHPs reporting more positive attitudes. Regarding AI usage, 28% (30/108) of CMs used AI, primarily for quick support (18/30, 60%) and as a personal therapist (14/30, 47%). Among MHPs, 43% (37/86) used AI; mostly for research (24/37, 65%) and report writing (20/37, 54%). While the majority found AI to be generally beneficial (23/30, 77% of CMs and 34/37, 92% of MHPs), specific harms and concerns were experienced by 47% (14/30) of CMs and 51% (19/37) of MHPs. There was an equal mix of positive and negative sentiment toward the future of AI in mental health care in open feedback. Conclusions: Commercial AI tools are increasingly being used by CMs and MHPs. Respondents believe AI will offer future advantages for mental health care in terms of accessibility, cost reduction, personalization, and work efficiency. However, they were equally concerned about reducing human connection, ethics, privacy and regulation, medical errors, potential for misuse, and data security. Despite the immense potential, integration into mental health systems must be approached with caution, addressing legal and ethical concerns while developing safeguards to mitigate potential harms. Future surveys are planned to track use and acceptability of AI and associated issues over time. %R 10.2196/60589 %U https://mental.jmir.org/2024/1/e60589 %U https://doi.org/10.2196/60589 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e60712 %T Leveraging Chatbots to Combat Health Misinformation for Older Adults: Participatory Design Study %A Peng,Wei %A Lee,Hee Rin %A Lim,Sue %+ Department of Media and Information, Michigan State University, 404 Wilson Room 409, East Lansing, MI, 48824, United States, 1 5174328235, pengwei@msu.edu %K chatbot %K conversational agent %K older adults %K health misinformation %K participatory design %D 2024 %7 11.10.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Older adults, a population particularly susceptible to misinformation, may experience attempts at health-related scams or defrauding, and they may unknowingly spread misinformation. Previous research has investigated managing misinformation through media literacy education or supporting users by fact-checking information and cautioning for potential misinformation content, yet studies focusing on older adults are limited. Chatbots have the potential to educate and support older adults in misinformation management. However, many studies focusing on designing technology for older adults use the needs-based approach and consider aging as a deficit, leading to issues in technology adoption. Instead, we adopted the asset-based approach, inviting older adults to be active collaborators in envisioning how intelligent technologies can enhance their misinformation management practices. Objective: This study aims to understand how older adults may use chatbots’ capabilities for misinformation management. Methods: We conducted 5 participatory design workshops with a total of 17 older adult participants to ideate ways in which chatbots can help them manage misinformation. The workshops included 3 stages: developing scenarios reflecting older adults’ encounters with misinformation in their lives, understanding existing chatbot platforms, and envisioning how chatbots can help intervene in the scenarios from stage 1. Results: We found that issues with older adults’ misinformation management arose more from interpersonal relationships than individuals’ ability to detect misinformation in pieces of information. This finding underscored the importance of chatbots to act as mediators that facilitate communication and help resolve conflict. In addition, participants emphasized the importance of autonomy. They desired chatbots to teach them to navigate the information landscape and come to conclusions about misinformation on their own. Finally, we found that older adults’ distrust in IT companies and governments’ ability to regulate the IT industry affected their trust in chatbots. Thus, chatbot designers should consider using well-trusted sources and practicing transparency to increase older adults’ trust in the chatbot-based tools. Overall, our results highlight the need for chatbot-based misinformation tools to go beyond fact checking. Conclusions: This study provides insights for how chatbots can be designed as part of technological systems for misinformation management among older adults. Our study underscores the importance of inviting older adults to be active co-designers of chatbot-based interventions. %M 39393065 %R 10.2196/60712 %U https://formative.jmir.org/2024/1/e60712 %U https://doi.org/10.2196/60712 %U http://www.ncbi.nlm.nih.gov/pubmed/39393065 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e53465 %T Automated Detection of Neurodevelopmental Disorders Using Face-to-Face Mobile Technology Among Typically Developing Greek Children: Randomized Controlled Trial %A Toki,Eugenia I %A Zakopoulou,Victoria %A Tatsis,Giorgos %A Pange,Jenny %+ Department of Speech and Language Therapy, School of Health Sciences, University of Ioannina, Panepistimioupoli B, Rm 148, Ioannina, 45500, Greece, 30 2651050720, toki@uoi.gr %K main principles %K automated detection %K neurodevelopmental disorders %K principal component analysis %K early screening %K early intervention %K detection %K screening %K assessment %K digital tool %K serious game %K child %K Greece %K speech %K psychomotor %K cognitive %K psychoemotional %K hearing %K machine learning %K apps %K predictions %K screening %K prognosis %D 2024 %7 11.10.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Neurodevelopmental disorders (NDs) are characterized by heterogeneity, complexity, and interactions among multiple domains with long-lasting effects in adulthood. Early and accurate identification of children at risk for NDs is crucial for timely intervention, yet many cases remain undiagnosed, leading to missed opportunities for effective interventions. Digital tools can help clinicians assist and identify NDs. The concept of using serious games to enhance health care has gained attention among a growing group of scientists, entrepreneurs, and clinicians. Objective: This study aims to explore the core principles of automated mobile detection of NDs in typically developing Greek children, using a serious game developed within the SmartSpeech project, designed to evaluate multiple developmental domains through principal component analysis (PCA). Methods: A total of 229 typically developing children aged 4 to 12 years participated in the study. The recruitment process involved open calls through public and private health and educational institutions across Greece. Parents were thoroughly informed about the study’s objectives and procedures, and written consent was obtained. Children engaged under the clinician’s face-to-face supervision with the serious game “Apsou,” which assesses 18 developmental domains, including speech, language, psychomotor, cognitive, psychoemotional, and hearing abilities. Data from the children’s interactions were analyzed using PCA to identify key components and underlying principles of ND detection. Results: A sample of 229 typically developing preschoolers and early school-aged children played the Apsou mobile serious game for automated detection of NDs. Performing a PCA, the findings identified 5 main components accounting for about 80% of the data variability that potentially have significant prognostic implications for a safe diagnosis of NDs. Varimax rotation explained 61.44% of the total variance. The results underscore key theoretical principles crucial for the automated detection of NDs. These principles encompass communication skills, speech and language development, vocal processing, cognitive skills and sensory functions, and visual-spatial skills. These components align with the theoretical principles of child development and provide a robust framework for automated ND detection. Conclusions: The study highlights the feasibility and effectiveness of using serious games for early ND detection in children. The identified principal components offer valuable insights into critical developmental domains, paving the way for the development of advanced machine learning applications to support highly accurate predictions and classifications for automated screening, diagnosis, prognosis, or intervention planning in ND clinical decision-making. Future research should focus on validating these findings across diverse populations integrating additional features such as biometric data and longitudinal tracking to enhance the accuracy and reliability of automated detection systems. Trial Registration: ClinicalTrials.gov NCT06633874; https://clinicaltrials.gov/study/NCT06633874 International Registered Report Identifier (IRRID): RR2-https://doi.org/10.3390/signals4020021 %M 39393054 %R 10.2196/53465 %U https://formative.jmir.org/2024/1/e53465 %U https://doi.org/10.2196/53465 %U http://www.ncbi.nlm.nih.gov/pubmed/39393054 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 10 %N %P e52639 %T Artificial Intelligence for Optimizing Cancer Imaging: User Experience Study %A Hesso,Iman %A Zacharias,Lithin %A Kayyali,Reem %A Charalambous,Andreas %A Lavdaniti,Maria %A Stalika,Evangelia %A Ajami,Tarek %A Acampa,Wanda %A Boban,Jasmina %A Nabhani-Gebara,Shereen %+ Pharmacy Department, Faculty of Health, Science, Social Care and Education, Kingston University London, Penrhyn Road, Kingston Upon Thames, KT1 2EE, United Kingdom, 44 2084177413, S.Nabhani@kingston.ac.uk %K artificial intelligence %K cancer %K cancer imaging %K UX design workshops %K Delphi method %K INCISIVE AI toolbox %K user experience %D 2024 %7 10.10.2024 %9 Original Paper %J JMIR Cancer %G English %X Background: The need for increased clinical efficacy and efficiency has been the main force in developing artificial intelligence (AI) tools in medical imaging. The INCISIVE project is a European Union–funded initiative aiming to revolutionize cancer imaging methods using AI technology. It seeks to address limitations in imaging techniques by developing an AI-based toolbox that improves accuracy, specificity, sensitivity, interpretability, and cost-effectiveness. Objective: To ensure the successful implementation of the INCISIVE AI service, a study was conducted to understand the needs, challenges, and expectations of health care professionals (HCPs) regarding the proposed toolbox and any potential implementation barriers. Methods: A mixed methods study consisting of 2 phases was conducted. Phase 1 involved user experience (UX) design workshops with users of the INCISIVE AI toolbox. Phase 2 involved a Delphi study conducted through a series of sequential questionnaires. To recruit, a purposive sampling strategy based on the project’s consortium network was used. In total, 16 HCPs from Serbia, Italy, Greece, Cyprus, Spain, and the United Kingdom participated in the UX design workshops and 12 completed the Delphi study. Descriptive statistics were performed using SPSS (IBM Corp), enabling the calculation of mean rank scores of the Delphi study’s lists. The qualitative data collected via the UX design workshops was analyzed using NVivo (version 12; Lumivero) software. Results: The workshops facilitated brainstorming and identification of the INCISIVE AI toolbox’s desired features and implementation barriers. Subsequently, the Delphi study was instrumental in ranking these features, showing a strong consensus among HCPs (W=0.741, P<.001). Additionally, this study also identified implementation barriers, revealing a strong consensus among HCPs (W=0.705, P<.001). Key findings indicated that the INCISIVE AI toolbox could assist in areas such as misdiagnosis, overdiagnosis, delays in diagnosis, detection of minor lesions, decision-making in disagreement, treatment allocation, disease prognosis, prediction, treatment response prediction, and care integration throughout the patient journey. Limited resources, lack of organizational and managerial support, and data entry variability were some of the identified barriers. HCPs also had an explicit interest in AI explainability, desiring feature relevance explanations or a combination of feature relevance and visual explanations within the toolbox. Conclusions: The results provide a thorough examination of the INCISIVE AI toolbox’s design elements as required by the end users and potential barriers to its implementation, thus guiding the design and implementation of the INCISIVE technology. The outcome offers information about the degree of AI explainability required of the INCISIVE AI toolbox across the three services: (1) initial diagnosis; (2) disease staging, differentiation, and characterization; and (3) treatment and follow-up indicated for the toolbox. By considering the perspective of end users, INCISIVE aims to develop a solution that effectively meets their needs and drives adoption. %M 39388693 %R 10.2196/52639 %U https://cancer.jmir.org/2024/1/e52639 %U https://doi.org/10.2196/52639 %U http://www.ncbi.nlm.nih.gov/pubmed/39388693 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e58195 %T A Novel Cognitive Behavioral Therapy–Based Generative AI Tool (Socrates 2.0) to Facilitate Socratic Dialogue: Protocol for a Mixed Methods Feasibility Study %A Held,Philip %A Pridgen,Sarah A %A Chen,Yaozhong %A Akhtar,Zuhaib %A Amin,Darpan %A Pohorence,Sean %+ Department of Psychiatry and Behavioral Sciences, Rush University Medical Center, 1645 W. Jackson Blvd, Suite 602, Chicago, IL, 60612, United States, 1 3129421423, philip_held@rush.edu %K generative artificial intelligence %K mental health %K feasibility %K cognitive restructuring %K Socratic dialogue %K mobile phone %D 2024 %7 10.10.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Digital mental health tools, designed to augment traditional mental health treatments, are becoming increasingly important due to a wide range of barriers to accessing mental health care, including a growing shortage of clinicians. Most existing tools use rule-based algorithms, often leading to interactions that feel unnatural compared with human therapists. Large language models (LLMs) offer a solution for the development of more natural, engaging digital tools. In this paper, we detail the development of Socrates 2.0, which was designed to engage users in Socratic dialogue surrounding unrealistic or unhelpful beliefs, a core technique in cognitive behavioral therapies. The multiagent LLM-based tool features an artificial intelligence (AI) therapist, Socrates, which receives automated feedback from an AI supervisor and an AI rater. The combination of multiple agents appeared to help address common LLM issues such as looping, and it improved the overall dialogue experience. Initial user feedback from individuals with lived experiences of mental health problems as well as cognitive behavioral therapists has been positive. Moreover, tests in approximately 500 scenarios showed that Socrates 2.0 engaged in harmful responses in under 1% of cases, with the AI supervisor promptly correcting the dialogue each time. However, formal feasibility studies with potential end users are needed. Objective: This mixed methods study examines the feasibility of Socrates 2.0. Methods: On the basis of the initial data, we devised a formal feasibility study of Socrates 2.0 to gather qualitative and quantitative data about users’ and clinicians’ experience of interacting with the tool. Using a mixed method approach, the goal is to gather feasibility and acceptability data from 100 users and 50 clinicians to inform the eventual implementation of generative AI tools, such as Socrates 2.0, in mental health treatment. We designed this study to better understand how users and clinicians interact with the tool, including the frequency, length, and time of interactions, users’ satisfaction with the tool overall, quality of each dialogue and individual responses, as well as ways in which the tool should be improved before it is used in efficacy trials. Descriptive and inferential analyses will be performed on data from validated usability measures. Thematic analysis will be performed on the qualitative data. Results: Recruitment will begin in February 2024 and is expected to conclude by February 2025. As of September 25, 2024, overall, 55 participants have been recruited. Conclusions: The development of Socrates 2.0 and the outlined feasibility study are important first steps in applying generative AI to mental health treatment delivery and lay the foundation for formal feasibility studies. International Registered Report Identifier (IRRID): DERR1-10.2196/58195 %M 39388255 %R 10.2196/58195 %U https://www.researchprotocols.org/2024/1/e58195 %U https://doi.org/10.2196/58195 %U http://www.ncbi.nlm.nih.gov/pubmed/39388255 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e56353 %T Investigating Data Diversity and Model Robustness of AI Applications in Palliative Care and Hospice: Protocol for Scoping Review %A Bozkurt,Selen %A Fereydooni,Soraya %A Kar,Irem %A Diop Chalmers,Catherine %A Leslie,Sharon L %A Pathak,Ravi %A Walling,Anne %A Lindvall,Charlotta %A Lorenz,Karl %A Quest,Tammie %A Giannitrapani,Karleen %A Kavalieratos,Dio %+ Department of Biomedical Informatics, Emory University, Woodruff Memorial Research Building, 101 Woodruff Cir, Atlanta, GA, 30322, United States, 1 404 727 0229, selen.bozkurt@emory.edu %K palliative care %K artificial intelligence %K ethical frameworks %K AI %K data diversity %K model robustness %K decision support %K clinical settings %K end-of-life care %K hospice environments %K hospice %K methodology %K thematic analysis %K dissemination %D 2024 %7 8.10.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Artificial intelligence (AI) has become a pivotal element in health care, leading to significant advancements across various medical domains, including palliative care and hospice services. These services focus on improving the quality of life for patients with life-limiting illnesses, and AI’s ability to process complex datasets can enhance decision-making and personalize care in these sensitive settings. However, incorporating AI into palliative and hospice care requires careful examination to ensure it reflects the multifaceted nature of these settings. Objective: This scoping review aims to systematically map the landscape of AI in palliative care and hospice settings, focusing on the data diversity and model robustness. The goal is to understand AI’s role, its clinical integration, and the transparency of its development, ultimately providing a foundation for developing AI applications that adhere to established ethical guidelines and principles. Methods: Our scoping review involves six stages: (1) identifying the research question; (2) identifying relevant studies; (3) study selection; (4) charting the data; (5) collating, summarizing, and reporting the results; and (6) consulting with stakeholders. Searches were conducted across databases including MEDLINE through PubMed, Embase.com, IEEE Xplore, ClinicalTrials.gov, and Web of Science Core Collection, covering studies from the inception of each database up to November 1, 2023. We used a comprehensive set of search terms to capture relevant studies, and non-English records were excluded if their abstracts were not in English. Data extraction will follow a systematic approach, and stakeholder consultations will refine the findings. Results: The electronic database searches conducted in November 2023 resulted in 4614 studies. After removing duplicates, 330 studies were selected for full-text review to determine their eligibility based on predefined criteria. The extracted data will be organized into a table to aid in crafting a narrative summary. The review is expected to be completed by May 2025. Conclusions: This scoping review will advance the understanding of AI in palliative care and hospice, focusing on data diversity and model robustness. It will identify gaps and guide future research, contributing to the development of ethically responsible and effective AI applications in these settings. International Registered Report Identifier (IRRID): DERR1-10.2196/56353 %M 39378420 %R 10.2196/56353 %U https://www.researchprotocols.org/2024/1/e56353 %U https://doi.org/10.2196/56353 %U http://www.ncbi.nlm.nih.gov/pubmed/39378420 %0 Journal Article %@ 2561-3278 %I JMIR Publications %V 9 %N %P e56980 %T Classifying Residual Stroke Severity Using Robotics-Assisted Stroke Rehabilitation: Machine Learning Approach %A Jeter,Russell %A Greenfield,Raymond %A Housley,Stephen N %A Belykh,Igor %+ Department of Mathematics and Statistics, Georgia State University, PO Box 4110, Atlanta, GA, 30302 410, United States, 1 404 413 6411, ibelykh@gsu.edu %K stroke %K rehabilitation robotics %K machine learning %K artificial intelligence %K physical therapy %K neuroplasticity %D 2024 %7 7.10.2024 %9 Original Paper %J JMIR Biomed Eng %G English %X Background: Stroke therapy is essential to reduce impairments and improve motor movements by engaging autogenous neuroplasticity. Traditionally, stroke rehabilitation occurs in inpatient and outpatient rehabilitation facilities. However, recent literature increasingly explores moving the recovery process into the home and integrating technology-based interventions. This study advances this goal by promoting in-home, autonomous recovery for patients who experienced a stroke through robotics-assisted rehabilitation and classifying stroke residual severity using machine learning methods. Objective: Our main objective is to use kinematics data collected during in-home, self-guided therapy sessions to develop supervised machine learning methods, to address a clinician’s autonomous classification of stroke residual severity–labeled data toward improving in-home, robotics-assisted stroke rehabilitation. Methods: In total, 33 patients who experienced a stroke participated in in-home therapy sessions using Motus Nova robotics rehabilitation technology to capture upper and lower body motion. During each therapy session, the Motus Hand and Motus Foot devices collected movement data, assistance data, and activity-specific data. We then synthesized, processed, and summarized these data. Next, the therapy session data were paired with clinician-informed, discrete stroke residual severity labels: “no range of motion (ROM),” “low ROM,” and “high ROM.” Afterward, an 80%:20% split was performed to divide the dataset into a training set and a holdout test set. We used 4 machine learning algorithms to classify stroke residual severity: light gradient boosting (LGB), extra trees classifier, deep feed-forward neural network, and classical logistic regression. We selected models based on 10-fold cross-validation and measured their performance on a holdout test dataset using F1-score to identify which model maximizes stroke residual severity classification accuracy. Results: We demonstrated that the LGB method provides the most reliable autonomous detection of stroke severity. The trained model is a consensus model that consists of 139 decision trees with up to 115 leaves each. This LGB model boasts a 96.70% F1-score compared to logistic regression (55.82%), extra trees classifier (94.81%), and deep feed-forward neural network (70.11%). Conclusions: We showed how objectively measured rehabilitation training paired with machine learning methods can be used to identify the residual stroke severity class, with efforts to enhance in-home self-guided, individualized stroke rehabilitation. The model we trained relies only on session summary statistics, meaning it can potentially be integrated into similar settings for real-time classification, such as outpatient rehabilitation facilities. %M 39374054 %R 10.2196/56980 %U https://biomedeng.jmir.org/2024/1/e56980 %U https://doi.org/10.2196/56980 %U http://www.ncbi.nlm.nih.gov/pubmed/39374054 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e57673 %T The Utility and Implications of Ambient Scribes in Primary Care %A Seth,Puneet %A Carretas,Romina %A Rudzicz,Frank %+ Department of Family Medicine, McMaster University, 100 Main Street West, Hamilton, ON, L8P 1H6, Canada, 1 416 671 5114, sethp1@mcmaster.ca %K artificial intelligence %K AI %K large language model %K LLM %K digital scribe %K ambient scribe %K organizational efficiency %K electronic health record %K documentation burden %K administrative burden %D 2024 %7 4.10.2024 %9 Viewpoint %J JMIR AI %G English %X Ambient scribe technology, utilizing large language models, represents an opportunity for addressing several current pain points in the delivery of primary care. We explore the evolution of ambient scribes and their current use in primary care. We discuss the suitability of primary care for ambient scribe integration, considering the varied nature of patient presentations and the emphasis on comprehensive care. We also propose the stages of maturation in the use of ambient scribes in primary care and their impact on care delivery. Finally, we call for focused research on safety, bias, patient impact, and privacy in ambient scribe technology, emphasizing the need for early training and education of health care providers in artificial intelligence and digital health tools. %M 39365655 %R 10.2196/57673 %U https://ai.jmir.org/2024/1/e57673 %U https://doi.org/10.2196/57673 %U http://www.ncbi.nlm.nih.gov/pubmed/39365655 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51635 %T Engine of Innovation in Hospital Pharmacy: Applications and Reflections of ChatGPT %A Li,Xingang %A Guo,Heng %A Li,Dandan %A Zheng,Yingming %+ Department of Pharmacy, Beijing Friendship Hospital, Capital Medical University, No 95, Yongan Road, Xicheng District, Beijing, 100050, China, 86 1081608511, lxg198320022003@163.com %K ChatGPT %K hospital pharmacy %K natural language processing %K drug information %K drug therapy %K drug interaction %K scientific research %K innovation %K pharmacy %K quality %K safety %K pharmaceutical care %K tool %K medical care quality %D 2024 %7 4.10.2024 %9 Viewpoint %J J Med Internet Res %G English %X Hospital pharmacy plays an important role in ensuring medical care quality and safety, especially in the area of drug information retrieval, therapy guidance, and drug-drug interaction management. ChatGPT is a powerful artificial intelligence language model that can generate natural-language texts. Here, we explored the applications and reflections of ChatGPT in hospital pharmacy, where it may enhance the quality and efficiency of pharmaceutical care. We also explored ChatGPT’s prospects in hospital pharmacy and discussed its working principle, diverse applications, and practical cases in daily operations and scientific research. Meanwhile, the challenges and limitations of ChatGPT, such as data privacy, ethical issues, bias and discrimination, and human oversight, are discussed. ChatGPT is a promising tool for hospital pharmacy, but it requires careful evaluation and validation before it can be integrated into clinical practice. Some suggestions for future research and development of ChatGPT in hospital pharmacy are provided. %M 39365643 %R 10.2196/51635 %U https://www.jmir.org/2024/1/e51635 %U https://doi.org/10.2196/51635 %U http://www.ncbi.nlm.nih.gov/pubmed/39365643 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e56922 %T Machine Learning–Based Prediction of Neurodegenerative Disease in Patients With Type 2 Diabetes by Derivation and Validation in 2 Independent Korean Cohorts: Model Development and Validation Study %A Sang,Hyunji %A Lee,Hojae %A Park,Jaeyu %A Kim,Sunyoung %A Woo,Ho Geol %A Koyanagi,Ai %A Smith,Lee %A Lee,Sihoon %A Hwang,You-Cheol %A Park,Tae Sun %A Lim,Hyunjung %A Yon,Dong Keon %A Rhee,Sang Youl %+ Department of Endocrinology and Metabolism, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, 23 Kyungheedae-ro, Dongdaemun-gu, Seoul, 02447, Republic of Korea, 82 2 958 8200, bard95@hanmail.net %K machine learning %K neurodegenerative disease %K diabetes mellitus %K prediction %K AdaBoost %D 2024 %7 3.10.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Several machine learning (ML) prediction models for neurodegenerative diseases (NDs) in type 2 diabetes mellitus (T2DM) have recently been developed. However, the predictive power of these models is limited by the lack of multiple risk factors. Objective: This study aimed to assess the validity and use of an ML model for predicting the 3-year incidence of ND in patients with T2DM. Methods: We used data from 2 independent cohorts—the discovery cohort (1 hospital; n=22,311) and the validation cohort (2 hospitals; n=2915)—to predict ND. The outcome of interest was the presence or absence of ND at 3 years. We selected different ML-based models with hyperparameter tuning in the discovery cohort and conducted an area under the receiver operating characteristic curve (AUROC) analysis in the validation cohort. Results: The study dataset included 22,311 (discovery) and 2915 (validation) patients with T2DM recruited between 2008 and 2022. ND was observed in 133 (0.6%) and 15 patients (0.5%) in the discovery and validation cohorts, respectively. The AdaBoost model had a mean AUROC of 0.82 (95% CI 0.79-0.85) in the discovery dataset. When this result was applied to the validation dataset, the AdaBoost model exhibited the best performance among the models, with an AUROC of 0.83 (accuracy of 78.6%, sensitivity of 78.6%, specificity of 78.6%, and balanced accuracy of 78.6%). The most influential factors in the AdaBoost model were age and cardiovascular disease. Conclusions: This study shows the use and feasibility of ML for assessing the incidence of ND in patients with T2DM and suggests its potential for use in screening patients. Further international studies are required to validate these findings. %M 39361401 %R 10.2196/56922 %U https://www.jmir.org/2024/1/e56922 %U https://doi.org/10.2196/56922 %U http://www.ncbi.nlm.nih.gov/pubmed/39361401 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e63010 %T Comparative Study to Evaluate the Accuracy of Differential Diagnosis Lists Generated by Gemini Advanced, Gemini, and Bard for a Case Report Series Analysis: Cross-Sectional Study %A Hirosawa,Takanobu %A Harada,Yukinori %A Tokumasu,Kazuki %A Ito,Takahiro %A Suzuki,Tomoharu %A Shimizu,Taro %+ Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, 880 Kitakobayashi, Mibu-cho, Shimotsuga, 321-0293, Japan, 81 282861111, hirosawa@dokkyomed.ac.jp %K artificial intelligence %K clinical decision support %K diagnostic excellence %K generative artificial intelligence %K large language models %K natural language processing %D 2024 %7 2.10.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Generative artificial intelligence (GAI) systems by Google have recently been updated from Bard to Gemini and Gemini Advanced as of December 2023. Gemini is a basic, free-to-use model after a user’s login, while Gemini Advanced operates on a more advanced model requiring a fee-based subscription. These systems have the potential to enhance medical diagnostics. However, the impact of these updates on comprehensive diagnostic accuracy remains unknown. Objective: This study aimed to compare the accuracy of the differential diagnosis lists generated by Gemini Advanced, Gemini, and Bard across comprehensive medical fields using case report series. Methods: We identified a case report series with relevant final diagnoses published in the American Journal Case Reports from January 2022 to March 2023. After excluding nondiagnostic cases and patients aged 10 years and younger, we included the remaining case reports. After refining the case parts as case descriptions, we input the same case descriptions into Gemini Advanced, Gemini, and Bard to generate the top 10 differential diagnosis lists. In total, 2 expert physicians independently evaluated whether the final diagnosis was included in the lists and its ranking. Any discrepancies were resolved by another expert physician. Bonferroni correction was applied to adjust the P values for the number of comparisons among 3 GAI systems, setting the corrected significance level at P value <.02. Results: In total, 392 case reports were included. The inclusion rates of the final diagnosis within the top 10 differential diagnosis lists were 73% (286/392) for Gemini Advanced, 76.5% (300/392) for Gemini, and 68.6% (269/392) for Bard. The top diagnoses matched the final diagnoses in 31.6% (124/392) for Gemini Advanced, 42.6% (167/392) for Gemini, and 31.4% (123/392) for Bard. Gemini demonstrated higher diagnostic accuracy than Bard both within the top 10 differential diagnosis lists (P=.02) and as the top diagnosis (P=.001). In addition, Gemini Advanced achieved significantly lower accuracy than Gemini in identifying the most probable diagnosis (P=.002). Conclusions: The results of this study suggest that Gemini outperformed Bard in diagnostic accuracy following the model update. However, Gemini Advanced requires further refinement to optimize its performance for future artificial intelligence–enhanced diagnostics. These findings should be interpreted cautiously and considered primarily for research purposes, as these GAI systems have not been adjusted for medical diagnostics nor approved for clinical use. %M 39357052 %R 10.2196/63010 %U https://medinform.jmir.org/2024/1/e63010 %U https://doi.org/10.2196/63010 %U http://www.ncbi.nlm.nih.gov/pubmed/39357052 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e49546 %T Leveraging Temporal Trends for Training Contextual Word Embeddings to Address Bias in Biomedical Applications: Development Study %A Agmon,Shunit %A Singer,Uriel %A Radinsky,Kira %+ Department of Computer Science, Technion—Israel Institute of Technology, CS Taub Building, Haifa, 3200003, Israel, 972 73 378 3897, shunit.agmon@gmail.com %K natural language processing %K NLP %K BERT %K word embeddings %K statistical models %K bias %K algorithms %K gender %D 2024 %7 2.10.2024 %9 Original Paper %J JMIR AI %G English %X Background: Women have been underrepresented in clinical trials for many years. Machine-learning models trained on clinical trial abstracts may capture and amplify biases in the data. Specifically, word embeddings are models that enable representing words as vectors and are the building block of most natural language processing systems. If word embeddings are trained on clinical trial abstracts, predictive models that use the embeddings will exhibit gender performance gaps. Objective: We aim to capture temporal trends in clinical trials through temporal distribution matching on contextual word embeddings (specifically, BERT) and explore its effect on the bias manifested in downstream tasks. Methods: We present TeDi-BERT, a method to harness the temporal trend of increasing women’s inclusion in clinical trials to train contextual word embeddings. We implement temporal distribution matching through an adversarial classifier, trying to distinguish old from new clinical trial abstracts based on their embeddings. The temporal distribution matching acts as a form of domain adaptation from older to more recent clinical trials. We evaluate our model on 2 clinical tasks: prediction of unplanned readmission to the intensive care unit and hospital length of stay prediction. We also conduct an algorithmic analysis of the proposed method. Results: In readmission prediction, TeDi-BERT achieved area under the receiver operating characteristic curve of 0.64 for female patients versus the baseline of 0.62 (P<.001), and 0.66 for male patients versus the baseline of 0.64 (P<.001). In the length of stay regression, TeDi-BERT achieved a mean absolute error of 4.56 (95% CI 4.44-4.68) for female patients versus 4.62 (95% CI 4.50-4.74, P<.001) and 4.54 (95% CI 4.44-4.65) for male patients versus 4.6 (95% CI 4.50-4.71, P<.001). Conclusions: In both clinical tasks, TeDi-BERT improved performance for female patients, as expected; but it also improved performance for male patients. Our results show that accuracy for one gender does not need to be exchanged for bias reduction, but rather that good science improves clinical results for all. Contextual word embedding models trained to capture temporal trends can help mitigate the effects of bias that changes over time in the training data. %M 39357045 %R 10.2196/49546 %U https://ai.jmir.org/2024/1/e49546 %U https://doi.org/10.2196/49546 %U http://www.ncbi.nlm.nih.gov/pubmed/39357045 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e58831 %T “Doctor ChatGPT, Can You Help Me?” The Patient’s Perspective: Cross-Sectional Study %A Armbruster,Jonas %A Bussmann,Florian %A Rothhaas,Catharina %A Titze,Nadine %A Grützner,Paul Alfred %A Freischmidt,Holger %+ Department of Trauma and Orthopedic Surgery, BG Klinik Ludwigshafen, Ludwig-Guttmann-Strasse 13, Ludwigshafen am Rhein, 67071, Germany, 49 6216810, Holger.Freischmidt@bgu-ludwigshafen.de %K artificial intelligence %K AI %K large language models %K LLM %K ChatGPT %K patient education %K patient information %K patient perceptions %K chatbot %K chatbots %K empathy %D 2024 %7 1.10.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence and the language models derived from it, such as ChatGPT, offer immense possibilities, particularly in the field of medicine. It is already evident that ChatGPT can provide adequate and, in some cases, expert-level responses to health-related queries and advice for patients. However, it is currently unknown how patients perceive these capabilities, whether they can derive benefit from them, and whether potential risks, such as harmful suggestions, are detected by patients. Objective: This study aims to clarify whether patients can get useful and safe health care advice from an artificial intelligence chatbot assistant. Methods: This cross-sectional study was conducted using 100 publicly available health-related questions from 5 medical specialties (trauma, general surgery, otolaryngology, pediatrics, and internal medicine) from a web-based platform for patients. Responses generated by ChatGPT-4.0 and by an expert panel (EP) of experienced physicians from the aforementioned web-based platform were packed into 10 sets consisting of 10 questions each. The blinded evaluation was carried out by patients regarding empathy and usefulness (assessed through the question: “Would this answer have helped you?”) on a scale from 1 to 5. As a control, evaluation was also performed by 3 physicians in each respective medical specialty, who were additionally asked about the potential harm of the response and its correctness. Results: In total, 200 sets of questions were submitted by 64 patients (mean 45.7, SD 15.9 years; 29/64, 45.3% male), resulting in 2000 evaluated answers of ChatGPT and the EP each. ChatGPT scored higher in terms of empathy (4.18 vs 2.7; P<.001) and usefulness (4.04 vs 2.98; P<.001). Subanalysis revealed a small bias in terms of levels of empathy given by women in comparison with men (4.46 vs 4.14; P=.049). Ratings of ChatGPT were high regardless of the participant’s age. The same highly significant results were observed in the evaluation of the respective specialist physicians. ChatGPT outperformed significantly in correctness (4.51 vs 3.55; P<.001). Specialists rated the usefulness (3.93 vs 4.59) and correctness (4.62 vs 3.84) significantly lower in potentially harmful responses from ChatGPT (P<.001). This was not the case among patients. Conclusions: The results indicate that ChatGPT is capable of supporting patients in health-related queries better than physicians, at least in terms of written advice through a web-based platform. In this study, ChatGPT’s responses had a lower percentage of potentially harmful advice than the web-based EP. However, it is crucial to note that this finding is based on a specific study design and may not generalize to all health care settings. Alarmingly, patients are not able to independently recognize these potential dangers. %M 39352738 %R 10.2196/58831 %U https://www.jmir.org/2024/1/e58831 %U https://doi.org/10.2196/58831 %U http://www.ncbi.nlm.nih.gov/pubmed/39352738 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e55315 %T Clinical Decision Support and Natural Language Processing in Medicine: Systematic Literature Review %A Eguia,Hans %A Sánchez-Bocanegra,Carlos Luis %A Vinciarelli,Franco %A Alvarez-Lopez,Fernando %A Saigí-Rubió,Francesc %+ Faculty of Health Sciences, Universitat Oberta de Catalunya (UOC), Rambla del Poblenou, 156, Barcelona, 08018, Spain, 34 933263622, fsaigi@uoc.edu %K artificial intelligence %K AI %K natural language processing %K clinical decision support system %K CDSS %K health recommender system %K clinical information extraction %K electronic health record %K systematic literature review %K patient %K treatment %K diagnosis %K health workers %D 2024 %7 30.9.2024 %9 Review %J J Med Internet Res %G English %X Background: Ensuring access to accurate and verified information is essential for effective patient treatment and diagnosis. Although health workers rely on the internet for clinical data, there is a need for a more streamlined approach. Objective: This systematic review aims to assess the current state of artificial intelligence (AI) and natural language processing (NLP) techniques in health care to identify their potential use in electronic health records and automated information searches. Methods: A search was conducted in the PubMed, Embase, ScienceDirect, Scopus, and Web of Science online databases for articles published between January 2000 and April 2023. The only inclusion criteria were (1) original research articles and studies on the application of AI-based medical clinical decision support using NLP techniques and (2) publications in English. A Critical Appraisal Skills Programme tool was used to assess the quality of the studies. Results: The search yielded 707 articles, from which 26 studies were included (24 original articles and 2 systematic reviews). Of the evaluated articles, 21 (81%) explained the use of NLP as a source of data collection, 18 (69%) used electronic health records as a data source, and a further 8 (31%) were based on clinical data. Only 5 (19%) of the articles showed the use of combined strategies for NLP to obtain clinical data. In total, 16 (62%) articles presented stand-alone data review algorithms. Other studies (n=9, 35%) showed that the clinical decision support system alternative was also a way of displaying the information obtained for immediate clinical use. Conclusions: The use of NLP engines can effectively improve clinical decision systems’ accuracy, while biphasic tools combining AI algorithms and human criteria may optimize clinical diagnosis and treatment flows. Trial Registration: PROSPERO CRD42022373386; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=373386 %M 39348889 %R 10.2196/55315 %U https://www.jmir.org/2024/1/e55315 %U https://doi.org/10.2196/55315 %U http://www.ncbi.nlm.nih.gov/pubmed/39348889 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e58740 %T Enhancing Performance of the National Field Triage Guidelines Using Machine Learning: Development of a Prehospital Triage Model to Predict Severe Trauma %A Chen,Qi %A Qin,Yuchen %A Jin,Zhichao %A Zhao,Xinxin %A He,Jia %A Wu,Cheng %A Tang,Bihan %+ Department of Health Management, Naval Medical University, No 800 Xiangyin Road, Shanghai, 200433, China, 86 02181871425, mangotangbihan@126.com %K severe trauma %K field triage %K machine learning %K prediction model %D 2024 %7 30.9.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Prehospital trauma triage is essential to get the right patient to the right hospital. However, the national field triage guidelines proposed by the American College of Surgeons have proven to be relatively insensitive when identifying severe traumas. Objective: This study aimed to build a prehospital triage model to predict severe trauma and enhance the performance of the national field triage guidelines. Methods: This was a multisite prediction study, and the data were extracted from the National Trauma Data Bank between 2017 and 2019. All patients with injury, aged 16 years of age or older, and transported by ambulance from the injury scene to any trauma center were potentially eligible. The data were divided into training, internal, and external validation sets of 672,309; 288,134; and 508,703 patients, respectively. As the national field triage guidelines recommended, age, 7 vital signs, and 8 injury patterns at the prehospital stage were included as candidate variables for model development. Outcomes were severe trauma with an Injured Severity Score ≥16 (primary) and critical resource use within 24 hours of emergency department arrival (secondary). The triage model was developed using an extreme gradient boosting model and Shapley additive explanation analysis. The model’s accuracy regarding discrimination, calibration, and clinical benefit was assessed. Results: At a fixed specificity of 0.5, the model showed a sensitivity of 0.799 (95% CI 0.797-0.801), an undertriage rate of 0.080 (95% CI 0.079-0.081), and an overtriage rate of 0.743 (95% CI 0.742-0.743) for predicting severe trauma. The model showed a sensitivity of 0.774 (95% CI 0.772-0.776), an undertriage rate of 0.158 (95% CI 0.157-0.159), and an overtriage rate of 0.609 (95% CI 0.608-0.609) when predicting critical resource use, fixed at 0.5 specificity. The triage model’s areas under the curve were 0.755 (95% CI 0.753-0.757) for severe trauma prediction and 0.736 (95% CI 0.734-0.737) for critical resource use prediction. The triage model’s performance was better than those of the Glasgow Coma Score, Prehospital Index, revised trauma score, and the 2011 national field triage guidelines RED criteria. The model’s performance was consistent in the 2 validation sets. Conclusions: The prehospital triage model is promising for predicting severe trauma and achieving an undertriage rate of <10%. Moreover, machine learning enhances the performance of field triage guidelines. %M 39348683 %R 10.2196/58740 %U https://www.jmir.org/2024/1/e58740 %U https://doi.org/10.2196/58740 %U http://www.ncbi.nlm.nih.gov/pubmed/39348683 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e64143 %T Practical Aspects of Using Large Language Models to Screen Abstracts for Cardiovascular Drug Development: Cross-Sectional Study %A Ronquillo,Jay G %A Ye,Jamie %A Gorman,Donal %A Lemeshow,Adina R %A Watt,Stephen J %K biomedical informatics %K drug development %K cardiology %K cardio %K LLM %K biomedical %K drug %K cross-sectional study %K biomarker %K cardiovascular %K screening optimization %K GPT %K large language model %K AI %K artificial intelligence %D 2024 %7 30.9.2024 %9 %J JMIR Med Inform %G English %X Cardiovascular drug development requires synthesizing relevant literature about indications, mechanisms, biomarkers, and outcomes. This short study investigates the performance, cost, and prompt engineering trade-offs of 3 large language models accelerating the literature screening process for cardiovascular drug development applications. %R 10.2196/64143 %U https://medinform.jmir.org/2024/1/e64143 %U https://doi.org/10.2196/64143 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e58358 %T AI Governance: A Challenge for Public Health %A Wagner,Jennifer K %A Doerr,Megan %A Schmit,Cason D %K artificial intelligence %K legislation and jurisprudence %K harm reduction %K social determinants of health %K one health %K AI %K invisible algorithms %K modern life %K public health %K engagement %K AI governance %K traditional regulation %K soft law %D 2024 %7 30.9.2024 %9 %J JMIR Public Health Surveill %G English %X The rapid evolution of artificial intelligence (AI) is structuralizing social, political, and economic determinants of health into the invisible algorithms that shape all facets of modern life. Nevertheless, AI holds immense potential as a public health tool, enabling beneficial objectives such as precision public health and medicine. Developing an AI governance framework that can maximize the benefits and minimize the risks of AI is a significant challenge. The benefits of public health engagement in AI governance could be extensive. Here, we describe how several public health concepts can enhance AI governance. Specifically, we explain how (1) harm reduction can provide a framework for navigating the governance debate between traditional regulation and “soft law” approaches; (2) a public health understanding of social determinants of health is crucial to optimally weigh the potential risks and benefits of AI; (3) public health ethics provides a toolset for guiding governance decisions where individual interests intersect with collective interests; and (4) a One Health approach can improve AI governance effectiveness while advancing public health outcomes. Public health theories, perspectives, and innovations could substantially enrich and improve AI governance, creating a more equitable and socially beneficial path for AI development. %R 10.2196/58358 %U https://publichealth.jmir.org/2024/1/e58358 %U https://doi.org/10.2196/58358 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e55648 %T Accuracy of a Commercial Large Language Model (ChatGPT) to Perform Disaster Triage of Simulated Patients Using the Simple Triage and Rapid Treatment (START) Protocol: Gage Repeatability and Reproducibility Study %A Franc,Jeffrey Micheal %A Hertelendy,Attila Julius %A Cheng,Lenard %A Hata,Ryan %A Verde,Manuela %+ Department of Emergency Medicine, University of Alberta, 736c University Terrace, 8203-112 Street NW, Edmonton, AB, T6R2Z6, Canada, 1 7807006730, jeffrey.franc@ualberta.ca %K disaster medicine %K large language models %K triage %K disaster %K emergency %K disasters %K emergencies %K LLM %K LLMs %K GPT %K ChatGPT %K language model %K language models %K NLP %K natural language processing %K artificial intelligence %K repeatability %K reproducibility %K accuracy %K accurate %K reproducible %K repeatable %D 2024 %7 30.9.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: The release of ChatGPT (OpenAI) in November 2022 drastically reduced the barrier to using artificial intelligence by allowing a simple web-based text interface to a large language model (LLM). One use case where ChatGPT could be useful is in triaging patients at the site of a disaster using the Simple Triage and Rapid Treatment (START) protocol. However, LLMs experience several common errors including hallucinations (also called confabulations) and prompt dependency. Objective: This study addresses the research problem: “Can ChatGPT adequately triage simulated disaster patients using the START protocol?” by measuring three outcomes: repeatability, reproducibility, and accuracy. Methods: Nine prompts were developed by 5 disaster medicine physicians. A Python script queried ChatGPT Version 4 for each prompt combined with 391 validated simulated patient vignettes. Ten repetitions of each combination were performed for a total of 35,190 simulated triages. A reference standard START triage code for each simulated case was assigned by 2 disaster medicine specialists (JMF and MV), with a third specialist (LC) added if the first two did not agree. Results were evaluated using a gage repeatability and reproducibility study (gage R and R). Repeatability was defined as variation due to repeated use of the same prompt. Reproducibility was defined as variation due to the use of different prompts on the same patient vignette. Accuracy was defined as agreement with the reference standard. Results: Although 35,102 (99.7%) queries returned a valid START score, there was considerable variability. Repeatability (use of the same prompt repeatedly) was 14% of the overall variation. Reproducibility (use of different prompts) was 4.1% of the overall variation. The accuracy of ChatGPT for START was 63.9% with a 32.9% overtriage rate and a 3.1% undertriage rate. Accuracy varied by prompt with a maximum of 71.8% and a minimum of 46.7%. Conclusions: This study indicates that ChatGPT version 4 is insufficient to triage simulated disaster patients via the START protocol. It demonstrated suboptimal repeatability and reproducibility. The overall accuracy of triage was only 63.9%. Health care professionals are advised to exercise caution while using commercial LLMs for vital medical determinations, given that these tools may commonly produce inaccurate data, colloquially referred to as hallucinations or confabulations. Artificial intelligence–guided tools should undergo rigorous statistical evaluation—using methods such as gage R and R—before implementation into clinical settings. %M 39348189 %R 10.2196/55648 %U https://www.jmir.org/2024/1/e55648 %U https://doi.org/10.2196/55648 %U http://www.ncbi.nlm.nih.gov/pubmed/39348189 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 12 %N %P e59587 %T Data Preprocessing Techniques for AI and Machine Learning Readiness: Scoping Review of Wearable Sensor Data in Cancer Care %A Ortiz,Bengie L %A Gupta,Vibhuti %A Kumar,Rajnish %A Jalin,Aditya %A Cao,Xiao %A Ziegenbein,Charles %A Singhal,Ashutosh %A Tewari,Muneesh %A Choi,Sung Won %+ School of Applied Computational Sciences, Meharry Medical College, 3401 West End Ave #260, Nashville, TN, 37208, United States, 1 (615) 327 567, vgupta@mmc.edu %K machine learning %K artificial intelligence %K preprocessing %K wearables %K mobile phone %K cancer care %D 2024 %7 27.9.2024 %9 Review %J JMIR Mhealth Uhealth %G English %X Background: Wearable sensors are increasingly being explored in health care, including in cancer care, for their potential in continuously monitoring patients. Despite their growing adoption, significant challenges remain in the quality and consistency of data collected from wearable sensors. Moreover, preprocessing pipelines to clean, transform, normalize, and standardize raw data have not yet been fully optimized. Objective: This study aims to conduct a scoping review of preprocessing techniques used on raw wearable sensor data in cancer care, specifically focusing on methods implemented to ensure their readiness for artificial intelligence and machine learning (AI/ML) applications. We sought to understand the current landscape of approaches for handling issues, such as noise, missing values, normalization or standardization, and transformation, as well as techniques for extracting meaningful features from raw sensor outputs and converting them into usable formats for subsequent AI/ML analysis. Methods: We systematically searched IEEE Xplore, PubMed, Embase, and Scopus to identify potentially relevant studies for this review. The eligibility criteria included (1) mobile health and wearable sensor studies in cancer, (2) written and published in English, (3) published between January 2018 and December 2023, (4) full text available rather than abstracts, and (5) original studies published in peer-reviewed journals or conferences. Results: The initial search yielded 2147 articles, of which 20 (0.93%) met the inclusion criteria. Three major categories of preprocessing techniques were identified: data transformation (used in 12/20, 60% of selected studies), data normalization and standardization (used in 8/20, 40% of the selected studies), and data cleaning (used in 8/20, 40% of the selected studies). Transformation methods aimed to convert raw data into more informative formats for analysis, such as by segmenting sensor streams or extracting statistical features. Normalization and standardization techniques usually normalize the range of features to improve comparability and model convergence. Cleaning methods focused on enhancing data reliability by handling artifacts like missing values, outliers, and inconsistencies. Conclusions: While wearable sensors are gaining traction in cancer care, realizing their full potential hinges on the ability to reliably translate raw outputs into high-quality data suitable for AI/ML applications. This review found that researchers are using various preprocessing techniques to address this challenge, but there remains a lack of standardized best practices. Our findings suggest a pressing need to develop and adopt uniform data quality and preprocessing workflows of wearable sensor data that can support the breadth of cancer research and varied patient populations. Given the diverse preprocessing techniques identified in the literature, there is an urgency for a framework that can guide researchers and clinicians in preparing wearable sensor data for AI/ML applications. For the scoping review as well as our research, we propose a general framework for preprocessing wearable sensor data, designed to be adaptable across different disease settings, moving beyond cancer care. %M 38626290 %R 10.2196/59587 %U https://mhealth.jmir.org/2024/1/e59587 %U https://doi.org/10.2196/59587 %U http://www.ncbi.nlm.nih.gov/pubmed/38626290 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 11 %N %P e57362 %T The Most Effective Interventions for Classification Model Development to Predict Chat Outcomes Based on the Conversation Content in Online Suicide Prevention Chats: Machine Learning Approach %A Salmi,Salim %A Mérelle,Saskia %A Gilissen,Renske %A van der Mei,Rob %A Bhulai,Sandjai %+ Research Department, 113 Suicide Prevention, Paasheuvelweg 25, Amsterdam, 1105 BP, Netherlands, 31 640673474, s.salmi@113.nl %K suicide %K suicidality %K suicide prevention %K helpline %K suicide helpline %K classification %K interpretable AI %K explainable AI %K conversations %K BERT %K bidirectional encoder representations from transformers %K machine learning %K artificial intelligence %K large language models %K LLM %K natural language processing %D 2024 %7 26.9.2024 %9 Original Paper %J JMIR Ment Health %G English %X Background: For the provision of optimal care in a suicide prevention helpline, it is important to know what contributes to positive or negative effects on help seekers. Helplines can often be contacted through text-based chat services, which produce large amounts of text data for use in large-scale analysis. Objective: We trained a machine learning classification model to predict chat outcomes based on the content of the chat conversations in suicide helplines and identified the counsellor utterances that had the most impact on its outputs. Methods: From August 2021 until January 2023, help seekers (N=6903) scored themselves on factors known to be associated with suicidality (eg, hopelessness, feeling entrapped, will to live) before and after a chat conversation with the suicide prevention helpline in the Netherlands (113 Suicide Prevention). Machine learning text analysis was used to predict help seeker scores on these factors. Using 2 approaches for interpreting machine learning models, we identified text messages from helpers in a chat that contributed the most to the prediction of the model. Results: According to the machine learning model, helpers’ positive affirmations and expressing involvement contributed to improved scores of the help seekers. Use of macros and ending the chat prematurely due to the help seeker being in an unsafe situation had negative effects on help seekers. Conclusions: This study reveals insights for improving helpline chats, emphasizing the value of an evocative style with questions, positive affirmations, and practical advice. It also underscores the potential of machine learning in helpline chat analysis. %M 39326039 %R 10.2196/57362 %U https://mental.jmir.org/2024/1/e57362 %U https://doi.org/10.2196/57362 %U http://www.ncbi.nlm.nih.gov/pubmed/39326039 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e58741 %T Using Natural Language Processing (GPT-4) for Computed Tomography Image Analysis of Cerebral Hemorrhages in Radiology: Retrospective Analysis %A Zhang,Daiwen %A Ma,Zixuan %A Gong,Ru %A Lian,Liangliang %A Li,Yanzhuo %A He,Zhenghui %A Han,Yuhan %A Hui,Jiyuan %A Huang,Jialin %A Jiang,Jiyao %A Weng,Weiji %A Feng,Junfeng %+ Brain Injury Centre, Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, 160 Pujian Road, Pudong New District, Shanghai, 200127, China, 86 136 1186 0825, fengjfmail@163.com %K GPT-4 %K natural language processing %K NLP %K artificial intelligence %K AI %K cerebral hemorrhage %K computed tomography %K CT %D 2024 %7 26.9.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Cerebral hemorrhage is a critical medical condition that necessitates a rapid and precise diagnosis for timely medical intervention, including emergency operation. Computed tomography (CT) is essential for identifying cerebral hemorrhage, but its effectiveness is limited by the availability of experienced radiologists, especially in resource-constrained regions or when shorthanded during holidays or at night. Despite advancements in artificial intelligence–driven diagnostic tools, most require technical expertise. This poses a challenge for widespread adoption in radiological imaging. The introduction of advanced natural language processing (NLP) models such as GPT-4, which can annotate and analyze images without extensive algorithmic training, offers a potential solution. Objective: This study investigates GPT-4’s capability to identify and annotate cerebral hemorrhages in cranial CT scans. It represents a novel application of NLP models in radiological imaging. Methods: In this retrospective analysis, we collected 208 CT scans with 6 types of cerebral hemorrhages at Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, between January and September 2023. All CT images were mixed together and sequentially numbered, so each CT image had its own corresponding number. A random sequence from 1 to 208 was generated, and all CT images were inputted into GPT-4 for analysis in the order of the random sequence. The outputs were subsequently examined using Photoshop and evaluated by experienced radiologists on a 4-point scale to assess identification completeness, accuracy, and success. Results: The overall identification completeness percentage for the 6 types of cerebral hemorrhages was 72.6% (SD 18.6%). Specifically, GPT-4 achieved higher identification completeness in epidural and intraparenchymal hemorrhages (89.0%, SD 19.1% and 86.9%, SD 17.7%, respectively), yet its identification completeness percentage in chronic subdural hemorrhages was very low (37.3%, SD 37.5%). The misidentification percentages for complex hemorrhages (54.0%, SD 28.0%), epidural hemorrhages (50.2%, SD 22.7%), and subarachnoid hemorrhages (50.5%, SD 29.2%) were relatively high, whereas they were relatively low for acute subdural hemorrhages (32.6%, SD 26.3%), chronic subdural hemorrhages (40.3%, SD 27.2%), and intraparenchymal hemorrhages (26.2%, SD 23.8%). The identification completeness percentages in both massive and minor bleeding showed no significant difference (P=.06). However, the misidentification percentage in recognizing massive bleeding was significantly lower than that for minor bleeding (P=.04). The identification completeness percentages and misidentification percentages for cerebral hemorrhages at different locations showed no significant differences (all P>.05). Lastly, radiologists showed relative acceptance regarding identification completeness (3.60, SD 0.54), accuracy (3.30, SD 0.65), and success (3.38, SD 0.64). Conclusions: GPT-4, a standout among NLP models, exhibits both promising capabilities and certain limitations in the realm of radiological imaging, particularly when it comes to identifying cerebral hemorrhages in CT scans. This opens up new directions and insights for the future development of NLP models in radiology. Trial Registration: ClinicalTrials.gov NCT06230419; https://clinicaltrials.gov/study/NCT06230419 %M 39326037 %R 10.2196/58741 %U https://www.jmir.org/2024/1/e58741 %U https://doi.org/10.2196/58741 %U http://www.ncbi.nlm.nih.gov/pubmed/39326037 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 11 %N %P e53778 %T Generation of Backward-Looking Complex Reflections for a Motivational Interviewing–Based Smoking Cessation Chatbot Using GPT-4: Algorithm Development and Validation %A Kumar,Ash Tanuj %A Wang,Cindy %A Dong,Alec %A Rose,Jonathan %K motivational interviewing %K smoking cessation %K therapy %K automated therapy %K natural language processing %K large language models %K GPT-4 %K chatbot %K dialogue agent %K reflections %K reflection generation %K smoking %K cessation %K ChatGPT %K smokers %K smoker %K effectiveness %K messages %D 2024 %7 26.9.2024 %9 %J JMIR Ment Health %G English %X Background: Motivational interviewing (MI) is a therapeutic technique that has been successful in helping smokers reduce smoking but has limited accessibility due to the high cost and low availability of clinicians. To address this, the MIBot project has sought to develop a chatbot that emulates an MI session with a client with the specific goal of moving an ambivalent smoker toward the direction of quitting. One key element of an MI conversation is reflective listening, where a therapist expresses their understanding of what the client has said by uttering a reflection that encourages the client to continue their thought process. Complex reflections link the client’s responses to relevant ideas and facts to enhance this contemplation. Backward-looking complex reflections (BLCRs) link the client’s most recent response to a relevant selection of the client’s previous statements. Our current chatbot can generate complex reflections—but not BLCRs—using large language models (LLMs) such as GPT-2, which allows the generation of unique, human-like messages customized to client responses. Recent advancements in these models, such as the introduction of GPT-4, provide a novel way to generate complex text by feeding the models instructions and conversational history directly, making this a promising approach to generate BLCRs. Objective: This study aims to develop a method to generate BLCRs for an MI-based smoking cessation chatbot and to measure the method’s effectiveness. Methods: LLMs such as GPT-4 can be stimulated to produce specific types of responses to their inputs by “asking” them with an English-based description of the desired output. These descriptions are called prompts, and the goal of writing a description that causes an LLM to generate the required output is termed prompt engineering. We evolved an instruction to prompt GPT-4 to generate a BLCR, given the portions of the transcript of the conversation up to the point where the reflection was needed. The approach was tested on 50 previously collected MIBot transcripts of conversations with smokers and was used to generate a total of 150 reflections. The quality of the reflections was rated on a 4-point scale by 3 independent raters to determine whether they met specific criteria for acceptability. Results: Of the 150 generated reflections, 132 (88%) met the level of acceptability. The remaining 18 (12%) had one or more flaws that made them inappropriate as BLCRs. The 3 raters had pairwise agreement on 80% to 88% of these scores. Conclusions: The method presented to generate BLCRs is good enough to be used as one source of reflections in an MI-style conversation but would need an automatic checker to eliminate the unacceptable ones. This work illustrates the power of the new LLMs to generate therapeutic client-specific responses under the command of a language-based specification. %R 10.2196/53778 %U https://mental.jmir.org/2024/1/e53778 %U https://doi.org/10.2196/53778 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 11 %N %P e62679 %T Empathy Toward Artificial Intelligence Versus Human Experiences and the Role of Transparency in Mental Health and Social Support Chatbot Design: Comparative Study %A Shen,Jocelyn %A DiPaola,Daniella %A Ali,Safinah %A Sap,Maarten %A Park,Hae Won %A Breazeal,Cynthia %+ MIT Media Lab, 75 Amherst Street, Cambridge, MA, 02139, United States, 1 3109802254, joceshen@mit.edu %K empathy %K large language models %K ethics %K transparency %K crowdsourcing %K human-computer interaction %D 2024 %7 25.9.2024 %9 Original Paper %J JMIR Ment Health %G English %X Background: Empathy is a driving force in our connection to others, our mental well-being, and resilience to challenges. With the rise of generative artificial intelligence (AI) systems, mental health chatbots, and AI social support companions, it is important to understand how empathy unfolds toward stories from human versus AI narrators and how transparency plays a role in user emotions. Objective: We aim to understand how empathy shifts across human-written versus AI-written stories, and how these findings inform ethical implications and human-centered design of using mental health chatbots as objects of empathy. Methods: We conducted crowd-sourced studies with 985 participants who each wrote a personal story and then rated empathy toward 2 retrieved stories, where one was written by a language model, and another was written by a human. Our studies varied disclosing whether a story was written by a human or an AI system to see how transparent author information affects empathy toward the narrator. We conducted mixed methods analyses: through statistical tests, we compared user’s self-reported state empathy toward the stories across different conditions. In addition, we qualitatively coded open-ended feedback about reactions to the stories to understand how and why transparency affects empathy toward human versus AI storytellers. Results: We found that participants significantly empathized with human-written over AI-written stories in almost all conditions, regardless of whether they are aware (t196=7.07, P<.001, Cohen d=0.60) or not aware (t298=3.46, P<.001, Cohen d=0.24) that an AI system wrote the story. We also found that participants reported greater willingness to empathize with AI-written stories when there was transparency about the story author (t494=–5.49, P<.001, Cohen d=0.36). Conclusions: Our work sheds light on how empathy toward AI or human narrators is tied to the way the text is presented, thus informing ethical considerations of empathetic artificial social support or mental health chatbots. %M 39321450 %R 10.2196/62679 %U https://mental.jmir.org/2024/1/e62679 %U https://doi.org/10.2196/62679 %U http://www.ncbi.nlm.nih.gov/pubmed/39321450 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e59505 %T Multimodal Large Language Models in Health Care: Applications, Challenges, and Future Outlook %A AlSaad,Rawan %A Abd-alrazaq,Alaa %A Boughorbel,Sabri %A Ahmed,Arfan %A Renault,Max-Antoine %A Damseh,Rafat %A Sheikh,Javaid %+ Weill Cornell Medicine-Qatar, Education City, Street 2700, Doha, Qatar, 974 44928830, rta4003@qatar-med.cornell.edu %K artificial intelligence %K large language models %K multimodal large language models %K multimodality %K multimodal generative artificial intelligence %K multimodal generative AI %K generative artificial intelligence %K generative AI %K health care %D 2024 %7 25.9.2024 %9 Viewpoint %J J Med Internet Res %G English %X In the complex and multidimensional field of medicine, multimodal data are prevalent and crucial for informed clinical decisions. Multimodal data span a broad spectrum of data types, including medical images (eg, MRI and CT scans), time-series data (eg, sensor data from wearable devices and electronic health records), audio recordings (eg, heart and respiratory sounds and patient interviews), text (eg, clinical notes and research articles), videos (eg, surgical procedures), and omics data (eg, genomics and proteomics). While advancements in large language models (LLMs) have enabled new applications for knowledge retrieval and processing in the medical field, most LLMs remain limited to processing unimodal data, typically text-based content, and often overlook the importance of integrating the diverse data modalities encountered in clinical practice. This paper aims to present a detailed, practical, and solution-oriented perspective on the use of multimodal LLMs (M-LLMs) in the medical field. Our investigation spanned M-LLM foundational principles, current and potential applications, technical and ethical challenges, and future research directions. By connecting these elements, we aimed to provide a comprehensive framework that links diverse aspects of M-LLMs, offering a unified vision for their future in health care. This approach aims to guide both future research and practical implementations of M-LLMs in health care, positioning them as a paradigm shift toward integrated, multimodal data–driven medical practice. We anticipate that this work will spark further discussion and inspire the development of innovative approaches in the next generation of medical M-LLM systems. %M 39321458 %R 10.2196/59505 %U https://www.jmir.org/2024/1/e59505 %U https://doi.org/10.2196/59505 %U http://www.ncbi.nlm.nih.gov/pubmed/39321458 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e60020 %T Impact of a Digital Scribe System on Clinical Documentation Time and Quality: Usability Study %A van Buchem,Marieke Meija %A Kant,Ilse M J %A King,Liza %A Kazmaier,Jacqueline %A Steyerberg,Ewout W %A Bauer,Martijn P %+ CAIRELab (Clinical AI Implementation and Research Lab), Leiden University Medical Center, Albinusdreef 2, Leiden, 2333 ZN, Netherlands, 31 615609183, m.m.van_buchem@lumc.nl %K large language model %K large language models %K LLM %K LLMs %K natural language processing %K NLP %K deep learning %K pilot study %K pilot studies %K implementation %K machine learning %K ML %K artificial intelligence %K AI %K algorithm %K algorithms %K model %K models %K analytics %K practical model %K practical models %K automation %K automate %K documentation %K documentation time %K documentation quality %K clinical documentation %D 2024 %7 23.9.2024 %9 Original Paper %J JMIR AI %G English %X Background: Physicians spend approximately half of their time on administrative tasks, which is one of the leading causes of physician burnout and decreased work satisfaction. The implementation of natural language processing–assisted clinical documentation tools may provide a solution. Objective: This study investigates the impact of a commercially available Dutch digital scribe system on clinical documentation efficiency and quality. Methods: Medical students with experience in clinical practice and documentation (n=22) created a total of 430 summaries of mock consultations and recorded the time they spent on this task. The consultations were summarized using 3 methods: manual summaries, fully automated summaries, and automated summaries with manual editing. We then randomly reassigned the summaries and evaluated their quality using a modified version of the Physician Documentation Quality Instrument (PDQI-9). We compared the differences between the 3 methods in descriptive statistics, quantitative text metrics (word count and lexical diversity), the PDQI-9, Recall-Oriented Understudy for Gisting Evaluation scores, and BERTScore. Results: The median time for manual summarization was 202 seconds against 186 seconds for editing an automatic summary. Without editing, the automatic summaries attained a poorer PDQI-9 score than manual summaries (median PDQI-9 score 25 vs 31, P<.001, ANOVA test). Automatic summaries were found to have higher word counts but lower lexical diversity than manual summaries (P<.001, independent t test). The study revealed variable impacts on PDQI-9 scores and summarization time across individuals. Generally, students viewed the digital scribe system as a potentially useful tool, noting its ease of use and time-saving potential, though some criticized the summaries for their greater length and rigid structure. Conclusions: This study highlights the potential of digital scribes in improving clinical documentation processes by offering a first summary draft for physicians to edit, thereby reducing documentation time without compromising the quality of patient records. Furthermore, digital scribes may be more beneficial to some physicians than to others and could play a role in improving the reusability of clinical documentation. Future studies should focus on the impact and quality of such a system when used by physicians in clinical practice. %M 39312397 %R 10.2196/60020 %U https://ai.jmir.org/2024/1/e60020 %U https://doi.org/10.2196/60020 %U http://www.ncbi.nlm.nih.gov/pubmed/39312397 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e58578 %T Trial Factors Associated With Completion of Clinical Trials Evaluating AI: Retrospective Case-Control Study %A Chen,David %A Cao,Christian %A Kloosterman,Robert %A Parsa,Rod %A Raman,Srinivas %+ Department of Radiation Oncology, University of Toronto, 610 University Avenue, Toronto, ON, M5G 2M9, Canada, 1 416 946 4501 ext 2320, srinivas.raman@uhn.ca %K artificial intelligence %K clinical trial %K completion %K AI %K cross-sectional study %K application %K intervention %K trial design %K logistic regression %K Europe %K clinical %K trials testing %K health care %K informatics %K health information %D 2024 %7 23.9.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Evaluation of artificial intelligence (AI) tools in clinical trials remains the gold standard for translation into clinical settings. However, design factors associated with successful trial completion and the common reasons for trial failure are unknown. Objective: This study aims to compare trial design factors of complete and incomplete clinical trials testing AI tools. We conducted a case-control study of complete (n=485) and incomplete (n=51) clinical trials that evaluated AI as an intervention of ClinicalTrials.gov. Methods: Trial design factors, including area of clinical application, intended use population, and intended role of AI, were extracted. Trials that did not evaluate AI as an intervention and active trials were excluded. The assessed trial design factors related to AI interventions included the domain of clinical application related to organ systems; intended use population for patients or health care providers; and the role of AI for different applications in patient-facing clinical workflows, such as diagnosis, screening, and treatment. In addition, we also assessed general trial design factors including study type, allocation, intervention model, masking, age, sex, funder, continent, length of time, sample size, number of enrollment sites, and study start year. The main outcome was the completion of the clinical trial. Odds ratio (OR) and 95% CI values were calculated for all trial design factors using propensity-matched, multivariable logistic regression. Results: We queried ClinicalTrials.gov on December 23, 2023, using AI keywords to identify complete and incomplete trials testing AI technologies as a primary intervention, yielding 485 complete and 51 incomplete trials for inclusion in this study. Our nested propensity-matched, case-control results suggest that trials conducted in Europe were significantly associated with trial completion when compared with North American trials (OR 2.85, 95% CI 1.14-7.10; P=.03), and the trial sample size was positively associated with trial completion (OR 1.00, 95% CI 1.00-1.00; P=.02). Conclusions: Our case-control study is one of the first to identify trial design factors associated with completion of AI trials and catalog study-reported reasons for AI trial failure. We observed that trial design factors positively associated with trial completion include trials conducted in Europe and sample size. Given the promising clinical use of AI tools in health care, our results suggest that future translational research should prioritize addressing the design factors of AI clinical trials associated with trial incompletion and common reasons for study failure. %M 39312296 %R 10.2196/58578 %U https://www.jmir.org/2024/1/e58578 %U https://doi.org/10.2196/58578 %U http://www.ncbi.nlm.nih.gov/pubmed/39312296 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e52837 %T Personalized Prediction of Long-Term Renal Function Prognosis Following Nephrectomy Using Interpretable Machine Learning Algorithms: Case-Control Study %A Xu,Lingyu %A Li,Chenyu %A Gao,Shuang %A Zhao,Long %A Guan,Chen %A Shen,Xuefei %A Zhu,Zhihui %A Guo,Cheng %A Zhang,Liwei %A Yang,Chengyu %A Bu,Quandong %A Zhou,Bin %A Xu,Yan %+ Department of Nephrology, The Affiliated Hospital of Qingdao University, 16 Jiangsu Road, Qingdao, 266003, China, 86 0532 82911668, xuyan@qdu.edu.cn %K nephrectomy %K acute kidney injury %K acute kidney disease %K chronic kidney disease %K machine learning %D 2024 %7 20.9.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Acute kidney injury (AKI) is a common adverse outcome following nephrectomy. The progression from AKI to acute kidney disease (AKD) and subsequently to chronic kidney disease (CKD) remains a concern; yet, the predictive mechanisms for these transitions are not fully understood. Interpretable machine learning (ML) models offer insights into how clinical features influence long-term renal function outcomes after nephrectomy, providing a more precise framework for identifying patients at risk and supporting improved clinical decision-making processes. Objective: This study aimed to (1) evaluate postnephrectomy rates of AKI, AKD, and CKD, analyzing long-term renal outcomes along different trajectories; (2) interpret AKD and CKD models using Shapley Additive Explanations values and Local Interpretable Model-Agnostic Explanations algorithm; and (3) develop a web-based tool for estimating AKD or CKD risk after nephrectomy. Methods: We conducted a retrospective cohort study involving patients who underwent nephrectomy between July 2012 and June 2019. Patient data were randomly split into training, validation, and test sets, maintaining a ratio of 76.5:8.5:15. Eight ML algorithms were used to construct predictive models for postoperative AKD and CKD. The performance of the best-performing models was assessed using various metrics. We used various Shapley Additive Explanations plots and Local Interpretable Model-Agnostic Explanations bar plots to interpret the model and generated directed acyclic graphs to explore the potential causal relationships between features. Additionally, we developed a web-based prediction tool using the top 10 features for AKD prediction and the top 5 features for CKD prediction. Results: The study cohort comprised 1559 patients. Incidence rates for AKI, AKD, and CKD were 21.7% (n=330), 15.3% (n=238), and 10.6% (n=165), respectively. Among the evaluated ML models, the Light Gradient-Boosting Machine (LightGBM) model demonstrated superior performance, with an area under the receiver operating characteristic curve of 0.97 for AKD prediction and 0.96 for CKD prediction. Performance metrics and plots highlighted the model’s competence in discrimination, calibration, and clinical applicability. Operative duration, hemoglobin, blood loss, urine protein, and hematocrit were identified as the top 5 features associated with predicted AKD. Baseline estimated glomerular filtration rate, pathology, trajectories of renal function, age, and total bilirubin were the top 5 features associated with predicted CKD. Additionally, we developed a web application using the LightGBM model to estimate AKD and CKD risks. Conclusions: An interpretable ML model effectively elucidated its decision-making process in identifying patients at risk of AKD and CKD following nephrectomy by enumerating critical features. The web-based calculator, found on the LightGBM model, can assist in formulating more personalized and evidence-based clinical strategies. %M 39303280 %R 10.2196/52837 %U https://medinform.jmir.org/2024/1/e52837 %U https://doi.org/10.2196/52837 %U http://www.ncbi.nlm.nih.gov/pubmed/39303280 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 10 %N %P e60323 %T A Machine Learning Approach for Predicting Biochemical Outcome After PSMA-PET–Guided Salvage Radiotherapy in Recurrent Prostate Cancer After Radical Prostatectomy: Retrospective Study %A Janbain,Ali %A Farolfi,Andrea %A Guenegou-Arnoux,Armelle %A Romengas,Louis %A Scharl,Sophia %A Fanti,Stefano %A Serani,Francesca %A Peeken,Jan C %A Katsahian,Sandrine %A Strouthos,Iosif %A Ferentinos,Konstantinos %A Koerber,Stefan A %A Vogel,Marco E %A Combs,Stephanie E %A Vrachimis,Alexis %A Morganti,Alessio Giuseppe %A Spohn,Simon KB %A Grosu,Anca-Ligia %A Ceci,Francesco %A Henkenberens,Christoph %A Kroeze,Stephanie GC %A Guckenberger,Matthias %A Belka,Claus %A Bartenstein,Peter %A Hruby,George %A Emmett,Louise %A Omerieh,Ali Afshar %A Schmidt-Hegemann,Nina-Sophie %A Mose,Lucas %A Aebersold,Daniel M %A Zamboglou,Constantinos %A Wiegel,Thomas %A Shelan,Mohamed %+ Department of Radiation Oncology, Inselspital, Bern University Hospital, University of Bern, Freiburgstrasse 10, Bern, 3010, Switzerland, 41 41316322632, mohamed.shelan@insel.ch %K cancer %K oncologist %K oncologist %K metastases %K prostate %K prostate cancer %K prostatectomy %K salvage radiotherapy %K PSMA-PET %K prostate-specific membrane antigen–positron emission tomography %K prostate-specific membrane antigen %K PET %K positron emission tomography %K radiotherapy %K radiology %K radiography %K machine learning %K ML %K artificial intelligence %K AI %K algorithm %K algorithms %K predictive model %K predictive models %K predictive analytics %K predictive system %K practical model %K practical models %K deep learning %D 2024 %7 20.9.2024 %9 Original Paper %J JMIR Cancer %G English %X Background: Salvage radiation therapy (sRT) is often the sole curative option in patients with biochemical recurrence after radical prostatectomy. After sRT, we developed and validated a nomogram to predict freedom from biochemical failure. Objective: This study aims to evaluate prostate-specific membrane antigen–positron emission tomography (PSMA-PET)–based sRT efficacy for postprostatectomy prostate-specific antigen (PSA) persistence or recurrence. Objectives include developing a random survival forest (RSF) model for predicting biochemical failure, comparing it with a Cox model, and assessing predictive accuracy over time. Multinational cohort data will validate the model’s performance, aiming to improve clinical management of recurrent prostate cancer. Methods: This multicenter retrospective study collected data from 13 medical facilities across 5 countries: Germany, Cyprus, Australia, Italy, and Switzerland. A total of 1029 patients who underwent sRT following PSMA-PET–based assessment for PSA persistence or recurrence were included. Patients were treated between July 2013 and June 2020, with clinical decisions guided by PSMA-PET results and contemporary standards. The primary end point was freedom from biochemical failure, defined as 2 consecutive PSA rises >0.2 ng/mL after treatment. Data were divided into training (708 patients), testing (271 patients), and external validation (50 patients) sets for machine learning algorithm development and validation. RSF models were used, with 1000 trees per model, optimizing predictive performance using the Harrell concordance index and Brier score. Statistical analysis used R Statistical Software (R Foundation for Statistical Computing), and ethical approval was obtained from participating institutions. Results: Baseline characteristics of 1029 patients undergoing sRT PSMA-PET–based assessment were analyzed. The median age at sRT was 70 (IQR 64-74) years. PSMA-PET scans revealed local recurrences in 43.9% (430/979) and nodal recurrences in 27.2% (266/979) of patients. Treatment included dose-escalated sRT to pelvic lymphatics in 35.6% (349/979) of cases. The external outlier validation set showed distinct features, including higher rates of positive lymph nodes (47/50, 94% vs 266/979, 27.2% in the learning cohort) and lower delivered sRT doses (<66 Gy in 57/979, 5.8% vs 46/50, 92% of patients; P<.001). The RSF model, validated internally and externally, demonstrated robust predictive performance (Harrell C-index range: 0.54-0.91) across training and validation datasets, outperforming a previously published nomogram. Conclusions: The developed RSF model demonstrates enhanced predictive accuracy, potentially improving patient outcomes and assisting clinicians in making treatment decisions. %M 39303279 %R 10.2196/60323 %U https://cancer.jmir.org/2024/1/e60323 %U https://doi.org/10.2196/60323 %U http://www.ncbi.nlm.nih.gov/pubmed/39303279 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e58278 %T Evaluating a Natural Language Processing–Driven, AI-Assisted International Classification of Diseases, 10th Revision, Clinical Modification, Coding System for Diagnosis Related Groups in a Real Hospital Environment: Algorithm Development and Validation Study %A Dai,Hong-Jie %A Wang,Chen-Kai %A Chen,Chien-Chang %A Liou,Chong-Sin %A Lu,An-Tai %A Lai,Chia-Hsin %A Shain,Bo-Tsz %A Ke,Cheng-Rong %A Wang,William Yu Chung %A Mir,Tatheer Hussain %A Simanjuntak,Mutiara %A Kao,Hao-Yun %A Tsai,Ming-Ju %A Tseng,Vincent S %+ Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, No 100, Tzyou 1st Road, Sanmin District, Kaohsiung, 80756, Taiwan, 886 73121101 ext 4660035, mjt@kmu.edu.tw %K natural language processing %K International Classification of Diseases %K deep learning %K electronic medical record %K Taiwan diagnosis related groups %D 2024 %7 20.9.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: International Classification of Diseases codes are widely used to describe diagnosis information, but manual coding relies heavily on human interpretation, which can be expensive, time consuming, and prone to errors. With the transition from the International Classification of Diseases, Ninth Revision, to the International Classification of Diseases, Tenth Revision (ICD-10), the coding process has become more complex, highlighting the need for automated approaches to enhance coding efficiency and accuracy. Inaccurate coding can result in substantial financial losses for hospitals, and a precise assessment of outcomes generated by a natural language processing (NLP)–driven autocoding system thus assumes a critical role in safeguarding the accuracy of the Taiwan diagnosis related groups (Tw-DRGs). Objective: This study aims to evaluate the feasibility of applying an International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM), autocoding system that can automatically determine diagnoses and codes based on free-text discharge summaries to facilitate the assessment of Tw-DRGs, specifically principal diagnosis and major diagnostic categories (MDCs). Methods: By using the patient discharge summaries from Kaohsiung Medical University Chung-Ho Memorial Hospital (KMUCHH) from April 2019 to December 2020 as a reference data set we developed artificial intelligence (AI)–assisted ICD-10-CM coding systems based on deep learning models. We constructed a web-based user interface for the AI-assisted coding system and deployed the system to the workflow of the certified coding specialists (CCSs) of KMUCHH. The data used for the assessment of Tw-DRGs were manually curated by a CCS with the principal diagnosis and MDC was determined from discharge summaries collected at KMUCHH from February 2023 to April 2023. Results: Both the reference data set and real hospital data were used to assess performance in determining ICD-10-CM coding, principal diagnosis, and MDC for Tw-DRGs. Among all methods, the GPT-2 (OpenAI)-based model achieved the highest F1-score, 0.667 (F1-score 0.851 for the top 50 codes), on the KMUCHH test set and a slightly lower F1-score, 0.621, in real hospital data. Cohen κ evaluation for the agreement of MDC between the models and the CCS revealed that the overall average κ value for GPT-2 (κ=0.714) was approximately 12.2 percentage points higher than that of the hierarchy attention network (κ=0.592). GPT-2 demonstrated superior agreement with the CCS across 6 categories of MDC, with an average κ value of approximately 0.869 (SD 0.033), underscoring the effectiveness of the developed AI-assisted coding system in supporting the work of CCSs. Conclusions: An NLP-driven AI-assisted coding system can assist CCSs in ICD-10-CM coding by offering coding references via a user interface, demonstrating the potential to reduce the manual workload and expedite Tw-DRG assessment. Consistency in performance affirmed the effectiveness of the system in supporting CCSs in ICD-10-CM coding and the judgment of Tw-DRGs. %M 39302714 %R 10.2196/58278 %U https://www.jmir.org/2024/1/e58278 %U https://doi.org/10.2196/58278 %U http://www.ncbi.nlm.nih.gov/pubmed/39302714 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 11 %N %P e58493 %T Regulating AI in Mental Health: Ethics of Care Perspective %A Tavory,Tamar %+ Faculty of Law, Bar Ilan University, Ramat Gan, 5290002, Israel, 972 35318816, ttavory@gmail.com %K artificial intelligence %K ethics of care %K regulation %K legal %K relationship %K mental health %K mental healthcare %K AI %K ethic %K ethics %K ethical %K regulations %K law %K framework %K frameworks %K regulatory %K relationships %K chatbot %K chatbots %K conversational agent %K conversational agents %K European Artificial Intelligence Act %D 2024 %7 19.9.2024 %9 Viewpoint %J JMIR Ment Health %G English %X This article contends that the responsible artificial intelligence (AI) approach—which is the dominant ethics approach ruling most regulatory and ethical guidance—falls short because it overlooks the impact of AI on human relationships. Focusing only on responsible AI principles reinforces a narrow concept of accountability and responsibility of companies developing AI. This article proposes that applying the ethics of care approach to AI regulation can offer a more comprehensive regulatory and ethical framework that addresses AI’s impact on human relationships. This dual approach is essential for the effective regulation of AI in the domain of mental health care. The article delves into the emergence of the new “therapeutic” area facilitated by AI-based bots, which operate without a therapist. The article highlights the difficulties involved, mainly the absence of a defined duty of care toward users, and shows how implementing ethics of care can establish clear responsibilities for developers. It also sheds light on the potential for emotional manipulation and the risks involved. In conclusion, the article proposes a series of considerations grounded in the ethics of care for the developmental process of AI-powered therapeutic tools. %M 39298759 %R 10.2196/58493 %U https://mental.jmir.org/2024/1/e58493 %U https://doi.org/10.2196/58493 %U http://www.ncbi.nlm.nih.gov/pubmed/39298759 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 11 %N %P e58462 %T Patient Perspectives on AI for Mental Health Care: Cross-Sectional Survey Study %A Benda,Natalie %A Desai,Pooja %A Reza,Zayan %A Zheng,Anna %A Kumar,Shiveen %A Harkins,Sarah %A Hermann,Alison %A Zhang,Yiye %A Joly,Rochelle %A Kim,Jessica %A Pathak,Jyotishman %A Reading Turchioe,Meghan %+ School of Nursing, Columbia University, 560 168th Street, New York, NY, 10032, United States, 1 917 426 3069, nb3115@cumc.columbia.edu %K artificial intelligence %K AI %K mental health %K patient perspectives %K patients %K public survey %K application %K applications %K health care %K health professionals %K somatic issues %K radiology %K perinatal health %K Black %K professional relationship %K patient-health %K autonomy %K risk %K confidentiality %K machine learning %K digital mental health %K computing %K coding %K mobile phone %D 2024 %7 18.9.2024 %9 Original Paper %J JMIR Ment Health %G English %X Background: The application of artificial intelligence (AI) to health and health care is rapidly increasing. Several studies have assessed the attitudes of health professionals, but far fewer studies have explored the perspectives of patients or the general public. Studies investigating patient perspectives have focused on somatic issues, including those related to radiology, perinatal health, and general applications. Patient feedback has been elicited in the development of specific mental health care solutions, but broader perspectives toward AI for mental health care have been underexplored. Objective: This study aims to understand public perceptions regarding potential benefits of AI, concerns about AI, comfort with AI accomplishing various tasks, and values related to AI, all pertaining to mental health care. Methods: We conducted a 1-time cross-sectional survey with a nationally representative sample of 500 US-based adults. Participants provided structured responses on their perceived benefits, concerns, comfort, and values regarding AI for mental health care. They could also add free-text responses to elaborate on their concerns and values. Results: A plurality of participants (245/497, 49.3%) believed AI may be beneficial for mental health care, but this perspective differed based on sociodemographic variables (all P<.05). Specifically, Black participants (odds ratio [OR] 1.76, 95% CI 1.03-3.05) and those with lower health literacy (OR 2.16, 95% CI 1.29-3.78) perceived AI to be more beneficial, and women (OR 0.68, 95% CI 0.46-0.99) perceived AI to be less beneficial. Participants endorsed concerns about accuracy, possible unintended consequences such as misdiagnosis, the confidentiality of their information, and the loss of connection with their health professional when AI is used for mental health care. A majority of participants (80.4%, 402/500) valued being able to understand individual factors driving their risk, confidentiality, and autonomy as it pertained to the use of AI for their mental health. When asked who was responsible for the misdiagnosis of mental health conditions using AI, 81.6% (408/500) of participants found the health professional to be responsible. Qualitative results revealed similar concerns related to the accuracy of AI and how its use may impact the confidentiality of patients’ information. Conclusions: Future work involving the use of AI for mental health care should investigate strategies for conveying the level of AI’s accuracy, factors that drive patients’ mental health risks, and how data are used confidentially so that patients can determine with their health professionals when AI may be beneficial. It will also be important in a mental health care context to ensure the patient–health professional relationship is preserved when AI is used. %M 39293056 %R 10.2196/58462 %U https://mental.jmir.org/2024/1/e58462 %U https://doi.org/10.2196/58462 %U http://www.ncbi.nlm.nih.gov/pubmed/39293056 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e59914 %T Assessment of Clinical Metadata on the Accuracy of Retinal Fundus Image Labels in Diabetic Retinopathy in Uganda: Case-Crossover Study Using the Multimodal Database of Retinal Images in Africa %A Arunga,Simon %A Morley,Katharine Elise %A Kwaga,Teddy %A Morley,Michael Gerard %A Nakayama,Luis Filipe %A Mwavu,Rogers %A Kaggwa,Fred %A Ssempiira,Julius %A Celi,Leo Anthony %A Haberer,Jessica E %A Obua,Celestino %+ Massachusetts General Hospital Center for Global Health, Department of Medicine, Harvard Medical School, 125 Nashua St., Boston, MA, 02114, United States, 1 617 726 2000, kemorley@mgh.harvard.edu %K image labeling %K metadata %K diabetic retinopathy %K assessment %K bias %K multimodal database %K retinal images %K Africa %K African %K artificial intelligence %K AI %K screening algorithms %K screening %K algorithms %K diabetic %K diabetes %K treatment %K sensitivity %K clinical images %D 2024 %7 18.9.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Labeling color fundus photos (CFP) is an important step in the development of artificial intelligence screening algorithms for the detection of diabetic retinopathy (DR). Most studies use the International Classification of Diabetic Retinopathy (ICDR) to assign labels to CFP, plus the presence or absence of macular edema (ME). Images can be grouped as referrable or nonreferrable according to these classifications. There is little guidance in the literature about how to collect and use metadata as a part of the CFP labeling process. Objective: This study aimed to improve the quality of the Multimodal Database of Retinal Images in Africa (MoDRIA) by determining whether the availability of metadata during the image labeling process influences the accuracy, sensitivity, and specificity of image labels. MoDRIA was developed as one of the inaugural research projects of the Mbarara University Data Science Research Hub, part of the Data Science for Health Discovery and Innovation in Africa (DS-I Africa) initiative. Methods: This is a crossover assessment with 2 groups and 2 phases. Each group had 10 randomly assigned labelers who provided an ICDR score and the presence or absence of ME for each of the 50 CFP in a test image with and without metadata including blood pressure, visual acuity, glucose, and medical history. Specificity and sensitivity of referable retinopathy were based on ICDR scores, and ME was calculated using a 2-sided t test. Comparison of sensitivity and specificity for ICDR scores and ME with and without metadata for each participant was calculated using the Wilcoxon signed rank test. Statistical significance was set at P<.05. Results: The sensitivity for identifying referrable DR with metadata was 92.8% (95% CI 87.6-98.0) compared with 93.3% (95% CI 87.6-98.9) without metadata, and the specificity was 84.9% (95% CI 75.1-94.6) with metadata compared with 88.2% (95% CI 79.5-96.8) without metadata. The sensitivity for identifying the presence of ME was 64.3% (95% CI 57.6-71.0) with metadata, compared with 63.1% (95% CI 53.4-73.0) without metadata, and the specificity was 86.5% (95% CI 81.4-91.5) with metadata compared with 87.7% (95% CI 83.9-91.5) without metadata. The sensitivity and specificity of the ICDR score and the presence or absence of ME were calculated for each labeler with and without metadata. No findings were statistically significant. Conclusions: The sensitivity and specificity scores for the detection of referrable DR were slightly better without metadata, but the difference was not statistically significant. We cannot make definitive conclusions about the impact of metadata on the sensitivity and specificity of image labels in our study. Given the importance of metadata in clinical situations, we believe that metadata may benefit labeling quality. A more rigorous study to determine the sensitivity and specificity of CFP labels with and without metadata is recommended. %R 10.2196/59914 %U https://formative.jmir.org/2024/1/e59914 %U https://doi.org/10.2196/59914 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e58202 %T Revised Tool for the Quality Assessment of Diagnostic Accuracy Studies Using AI (QUADAS-AI): Protocol for a Qualitative Study %A Guni,Ahmad %A Sounderajah,Viknesh %A Whiting,Penny %A Bossuyt,Patrick %A Darzi,Ara %A Ashrafian,Hutan %+ Institute of Global Health Innovation, Imperial College London, 10th Floor QEQM Building, St Mary’s Hospital, Praed St, London, W2 1NY, United Kingdom, 44 2075895111, h.ashrafian@imperial.ac.uk %K artificial intelligence %K AI %K AI-specific quality assessment of diagnostic accuracy studies %K QUADAS-AI %K AI-driven %K diagnostics %K evidence synthesis %K quality assessment %K evaluation %K diagnostic %K accuracy %K bias %K translation %K clinical practice %K assessment tool %K diagnostic service %D 2024 %7 18.9.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Quality assessment of diagnostic accuracy studies (QUADAS), and more recently QUADAS-2, were developed to aid the evaluation of methodological quality within primary diagnostic accuracy studies. However, its current form, QUADAS-2 does not address the unique considerations raised by artificial intelligence (AI)–centered diagnostic systems. The rapid progression of the AI diagnostics field mandates suitable quality assessment tools to determine the risk of bias and applicability, and subsequently evaluate translational potential for clinical practice. Objective: We aim to develop an AI-specific QUADAS (QUADAS-AI) tool that addresses the specific challenges associated with the appraisal of AI diagnostic accuracy studies. This paper describes the processes and methods that will be used to develop QUADAS-AI. Methods: The development of QUADAS-AI can be distilled into 3 broad stages. Stage 1—a project organization phase had been undertaken, during which a project team and a steering committee were established. The steering committee consists of a panel of international experts representing diverse stakeholder groups. Following this, the scope of the project was finalized. Stage 2—an item generation process will be completed following (1) a mapping review, (2) a meta-research study, (3) a scoping survey of international experts, and (4) a patient and public involvement and engagement exercise. Candidate items will then be put forward to the international Delphi panel to achieve consensus for inclusion in the revised tool. A modified Delphi consensus methodology involving multiple online rounds and a final consensus meeting will be carried out to refine the tool, following which the initial QUADAS-AI tool will be drafted. A piloting phase will be carried out to identify components that are considered to be either ambiguous or missing. Stage 3—once the steering committee has finalized the QUADAS-AI tool, specific dissemination strategies will be aimed toward academic, policy, regulatory, industry, and public stakeholders, respectively. Results: As of July 2024, the project organization phase, as well as the mapping review and meta-research study, have been completed. We aim to complete the item generation, including the Delphi consensus, and finalize the tool by the end of 2024. Therefore, QUADAS-AI will be able to provide a consensus-derived platform upon which stakeholders may systematically appraise the methodological quality associated with AI diagnostic accuracy studies by the beginning of 2025. Conclusions: AI-driven systems comprise an increasingly significant proportion of research in clinical diagnostics. Through this process, QUADAS-AI will aid the evaluation of studies in this domain in order to identify bias and applicability concerns. As such, QUADAS-AI may form a key part of clinical, governmental, and regulatory evaluation frameworks for AI diagnostic systems globally. International Registered Report Identifier (IRRID): DERR1-10.2196/58202 %M 39293047 %R 10.2196/58202 %U https://www.researchprotocols.org/2024/1/e58202 %U https://doi.org/10.2196/58202 %U http://www.ncbi.nlm.nih.gov/pubmed/39293047 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e62890 %T Early Prediction of Cardiac Arrest in the Intensive Care Unit Using Explainable Machine Learning: Retrospective Study %A Kim,Yun Kwan %A Seo,Won-Doo %A Lee,Sun Jung %A Koo,Ja Hyung %A Kim,Gyung Chul %A Song,Hee Seok %A Lee,Minji %+ Department of Biomedical Software Engineering, The Catholic University of Korea, 43, Jibong-ro, Bucheon-si, Gyeonggi-do, 14662, Republic of Korea, 82 2 2164 4364, minjilee@catholic.ac.kr %K early cardiac arrest warning system %K electric medical record %K explainable clinical decision support system %K pseudo-real-time evaluation %K ensemble learning %K cost-sensitive learning %D 2024 %7 17.9.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Cardiac arrest (CA) is one of the leading causes of death among patients in the intensive care unit (ICU). Although many CA prediction models with high sensitivity have been developed to anticipate CA, their practical application has been challenging due to a lack of generalization and validation. Additionally, the heterogeneity among patients in different ICU subtypes has not been adequately addressed. Objective: This study aims to propose a clinically interpretable ensemble approach for the timely and accurate prediction of CA within 24 hours, regardless of patient heterogeneity, including variations across different populations and ICU subtypes. Additionally, we conducted patient-independent evaluations to emphasize the model’s generalization performance and analyzed interpretable results that can be readily adopted by clinicians in real-time. Methods: Patients were retrospectively analyzed using data from the Medical Information Mart for Intensive Care-IV (MIMIC-IV) and the eICU-Collaborative Research Database (eICU-CRD). To address the problem of underperformance, we constructed our framework using feature sets based on vital signs, multiresolution statistical analysis, and the Gini index, with a 12-hour window to capture the unique characteristics of CA. We extracted 3 types of features from each database to compare the performance of CA prediction between high-risk patient groups from MIMIC-IV and patients without CA from eICU-CRD. After feature extraction, we developed a tabular network (TabNet) model using feature screening with cost-sensitive learning. To assess real-time CA prediction performance, we used 10-fold leave-one-patient-out cross-validation and a cross–data set method. We evaluated MIMIC-IV and eICU-CRD across different cohort populations and subtypes of ICU within each database. Finally, external validation using the eICU-CRD and MIMIC-IV databases was conducted to assess the model’s generalization ability. The decision mask of the proposed method was used to capture the interpretability of the model. Results: The proposed method outperformed conventional approaches across different cohort populations in both MIMIC-IV and eICU-CRD. Additionally, it achieved higher accuracy than baseline models for various ICU subtypes within both databases. The interpretable prediction results can enhance clinicians’ understanding of CA prediction by serving as a statistical comparison between non-CA and CA groups. Next, we tested the eICU-CRD and MIMIC-IV data sets using models trained on MIMIC-IV and eICU-CRD, respectively, to evaluate generalization ability. The results demonstrated superior performance compared with baseline models. Conclusions: Our novel framework for learning unique features provides stable predictive power across different ICU environments. Most of the interpretable global information reveals statistical differences between CA and non-CA groups, demonstrating its utility as an indicator for clinical decisions. Consequently, the proposed CA prediction system is a clinically validated algorithm that enables clinicians to intervene early based on CA prediction information and can be applied to clinical trials in digital health. %R 10.2196/62890 %U https://www.jmir.org/2024/1/e62890 %U https://doi.org/10.2196/62890 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e65527 %T Considerations and Challenges in the Application of Large Language Models for Patient Complaint Resolution %A Wei,Bin %A Hu,Xin %A Wu,XiaoRong %+ The 1st Affiliated Hospital, Jiangxi Medical College, Nanchang University, No. 17 Yongwai Zheng Street, Donghu District, Nanchang, 330000, China, 86 13617093259, wxr98021@126.com %K ChatGPT %K large language model %K LLM %K artificial intelligence %K AI %K patient complaint %K empathy %K efficiency %K patient satisfaction %K resource allocation %D 2024 %7 17.9.2024 %9 Letter to the Editor %J J Med Internet Res %G English %X %M 39288405 %R 10.2196/65527 %U https://www.jmir.org/2024/1/e65527 %U https://doi.org/10.2196/65527 %U http://www.ncbi.nlm.nih.gov/pubmed/39288405 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54737 %T Artificial Intelligence–Augmented Clinical Decision Support Systems for Pregnancy Care: Systematic Review %A Lin,Xinnian %A Liang,Chen %A Liu,Jihong %A Lyu,Tianchu %A Ghumman,Nadia %A Campbell,Berry %+ Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, 850 Republican Street, Seattle, WA, 98109, United States, 1 2065432259, cl0512@uw.edu %K artificial intelligence %K biomedical ontologies %K clinical decision support systems %K implementation science %K obstetrics %K pregnancy %K AI %K systematic review %K CDSS %K functionality %K methodology %K implementation %K database query %K database queries %K bibliography %K record %K records %K eligibility %K literature review %K prenatal %K early pregnancy %K obstetric care %K postpartum care %K pregnancy care %K diagnostic support %K clinical prediction %K knowledge base %K therapeutic %K therapeutics %K recommendation %K recommendations %K diagnosis %K abnormality %K abnormalities %K cost-effective %K surveillance %K ultrasound %K ontology %D 2024 %7 16.9.2024 %9 Review %J J Med Internet Res %G English %X Background: Despite the emerging application of clinical decision support systems (CDSS) in pregnancy care and the proliferation of artificial intelligence (AI) over the last decade, it remains understudied regarding the role of AI in CDSS specialized for pregnancy care. Objective: To identify and synthesize AI-augmented CDSS in pregnancy care, CDSS functionality, AI methodologies, and clinical implementation, we reported a systematic review based on empirical studies that examined AI-augmented CDSS in pregnancy care. Methods: We retrieved studies that examined AI-augmented CDSS in pregnancy care using database queries involved with titles, abstracts, keywords, and MeSH (Medical Subject Headings) terms. Bibliographic records from their inception to 2022 were retrieved from PubMed/MEDLINE (n=206), Embase (n=101), and ACM Digital Library (n=377), followed by eligibility screening and literature review. The eligibility criteria include empirical studies that (1) developed or tested AI methods, (2) developed or tested CDSS or CDSS components, and (3) focused on pregnancy care. Data of studies used for review and appraisal include title, abstract, keywords, MeSH terms, full text, and supplements. Publications with ancillary information or overlapping outcomes were synthesized as one single study. Reviewers independently reviewed and assessed the quality of selected studies. Results: We identified 30 distinct studies of 684 studies from their inception to 2022. Topics of clinical applications covered AI-augmented CDSS from prenatal, early pregnancy, obstetric care, and postpartum care. Topics of CDSS functions include diagnostic support, clinical prediction, therapeutics recommendation, and knowledge base. Conclusions: Our review acknowledged recent advances in CDSS studies including early diagnosis of prenatal abnormalities, cost-effective surveillance, prenatal ultrasound support, and ontology development. To recommend future directions, we also noted key gaps from existing studies, including (1) decision support in current childbirth deliveries without using observational data from consequential fetal or maternal outcomes in future pregnancies; (2) scarcity of studies in identifying several high-profile biases from CDSS, including social determinants of health highlighted by the American College of Obstetricians and Gynecologists; and (3) chasm between internally validated CDSS models, external validity, and clinical implementation. %M 39283665 %R 10.2196/54737 %U https://www.jmir.org/2024/1/e54737 %U https://doi.org/10.2196/54737 %U http://www.ncbi.nlm.nih.gov/pubmed/39283665 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e59858 %T Early Diagnosis of Hereditary Angioedema in Japan Based on a US Medical Dataset: Algorithm Development and Validation %A Yamashita,Kouhei %A Nomoto,Yuji %A Hirose,Tomoya %A Yutani,Akira %A Okada,Akira %A Watanabe,Nayu %A Suzuki,Ken %A Senzaki,Munenori %A Kuroda,Tomohiro %+ Department of Hematology and Oncology, Graduate School of Medicine, Kyoto University, 54 Shogoin-kawahara-cho, Sakyo-ku, Kyoto, 606-8507, Japan, 81 75 751 4964, kouhei@kuhp.kyoto-u.ac.jp %K machine learning %K screening %K AI %K prediction %K rare diseases %K HAE %K electronic medical record %K real world data %K big data %K angioedema %K edema %K ML %K artificial intelligence %K algorithm %K algorithms %K predictive model %K predictive models %K predictive analytics %K predictive system %K practical model %K practical models %K early warning %K early detection %K real world data %K RWD %K Electronic health record %K EHR %K electronic health records %K EHRs %K EMR %K electronic medical records %K EMRs %K patient record %K patient record %K health record %K health records %K personal health record %K PHR %D 2024 %7 13.9.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Hereditary angioedema (HAE), a rare genetic disease, induces acute attacks of swelling in various regions of the body. Its prevalence is estimated to be 1 in 50,000 people, with no reported bias among different ethnic groups. However, considering the estimated prevalence, the number of patients in Japan diagnosed with HAE remains approximately 1 in 250,000, which means that only 20% of potential HAE cases are identified. Objective: This study aimed to develop an artificial intelligence (AI) model that can detect patients with suspected HAE using medical history data (medical claims, prescriptions, and electronic medical records [EMRs]) in the United States. We also aimed to validate the detection performance of the model for HAE cases using the Japanese dataset. Methods: The HAE patient and control groups were identified using the US claims and EMR datasets. We analyzed the characteristics of the diagnostic history of patients with HAE and developed an AI model to predict the probability of HAE based on a generalized linear model and bootstrap method. The model was then applied to the EMR data of the Kyoto University Hospital to verify its applicability to the Japanese dataset. Results: Precision and sensitivity were measured to validate the model performance. Using the comprehensive US dataset, the precision score was 2% in the initial model development step. Our model can screen out suspected patients, where 1 in 50 of these patients have HAE. In addition, in the validation step with Japanese EMR data, the precision score was 23.6%, which exceeded our expectations. We achieved a sensitivity score of 61.5% for the US dataset and 37.6% for the validation exercise using data from a single Japanese hospital. Overall, our model could predict patients with typical HAE symptoms. Conclusions: This study indicates that our AI model can detect HAE in patients with typical symptoms and is effective in Japanese data. However, further prospective clinical studies are required to investigate whether this model can be used to diagnose HAE. %M 39270211 %R 10.2196/59858 %U https://medinform.jmir.org/2024/1/e59858 %U https://doi.org/10.2196/59858 %U http://www.ncbi.nlm.nih.gov/pubmed/39270211 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52490 %T Using AI to Differentiate Mpox From Common Skin Lesions in a Sexual Health Clinic: Algorithm Development and Validation Study %A Soe,Nyi Nyi %A Yu,Zhen %A Latt,Phyu Mon %A Lee,David %A Samra,Ranjit Singh %A Ge,Zongyuan %A Rahman,Rashidur %A Sun,Jiajun %A Ong,Jason J %A Fairley,Christopher K %A Zhang,Lei %+ Artificial Intelligence and Modelling in Epidemiology Program, Melbourne Sexual Health Centre, Alfred Health, 580 Swanston Street, Carlton, Melbourne, Australia, 61 0433452013, Lei.Zhang1@monash.edu %K mpox %K sexually transmitted infections %K artificial intelligence %K deep learning %K skin lesion %D 2024 %7 13.9.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: The 2022 global outbreak of mpox has significantly impacted health facilities, and necessitated additional infection prevention and control measures and alterations to clinic processes. Early identification of suspected mpox cases will assist in mitigating these impacts. Objective: We aimed to develop and evaluate an artificial intelligence (AI)–based tool to differentiate mpox lesion images from other skin lesions seen in a sexual health clinic. Methods: We used a data set with 2200 images, that included mpox and non-mpox lesions images, collected from Melbourne Sexual Health Centre and web resources. We adopted deep learning approaches which involved 6 different deep learning architectures to train our AI models. We subsequently evaluated the performance of each model using a hold-out data set and an external validation data set to determine the optimal model for differentiating between mpox and non-mpox lesions. Results: The DenseNet-121 model outperformed other models with an overall area under the receiver operating characteristic curve (AUC) of 0.928, an accuracy of 0.848, a precision of 0.942, a recall of 0.742, and an F1-score of 0.834. Implementation of a region of interest approach significantly improved the performance of all models, with the AUC for the DenseNet-121 model increasing to 0.982. This approach resulted in an increase in the correct classification of mpox images from 79% (55/70) to 94% (66/70). The effectiveness of this approach was further validated by a visual analysis with gradient-weighted class activation mapping, demonstrating a reduction in false detection within the background of lesion images. On the external validation data set, ResNet-18 and DenseNet-121 achieved the highest performance. ResNet-18 achieved an AUC of 0.990 and an accuracy of 0.947, and DenseNet-121 achieved an AUC of 0.982 and an accuracy of 0.926. Conclusions: Our study demonstrated it was possible to use an AI-based image recognition algorithm to accurately differentiate between mpox and common skin lesions. Our findings provide a foundation for future investigations aimed at refining the algorithm and establishing the place of such technology in a sexual health clinic. %M 39269753 %R 10.2196/52490 %U https://www.jmir.org/2024/1/e52490 %U https://doi.org/10.2196/52490 %U http://www.ncbi.nlm.nih.gov/pubmed/39269753 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e56797 %T ChatGPT Use Among Pediatric Health Care Providers: Cross-Sectional Survey Study %A Kisvarday,Susannah %A Yan,Adam %A Yarahuan,Julia %A Kats,Daniel J %A Ray,Mondira %A Kim,Eugene %A Hong,Peter %A Spector,Jacob %A Bickel,Jonathan %A Parsons,Chase %A Rabbani,Naveed %A Hron,Jonathan D %+ Division of General Pediatrics, Boston Children's Hospital, 300 Longwood Avenue, Boston, MA, 02115, United States, 1 5704283137, susannah.kisvarday@childrens.harvard.edu %K ChatGPT %K machine learning %K surveys and questionnaires %K medical informatics applications %K OpenAI %K large language model %K LLM %K machine learning %K pediatric %K chatbot %K artificial intelligence %K AI %K digital tools %D 2024 %7 12.9.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: The public launch of OpenAI’s ChatGPT platform generated immediate interest in the use of large language models (LLMs). Health care institutions are now grappling with establishing policies and guidelines for the use of these technologies, yet little is known about how health care providers view LLMs in medical settings. Moreover, there are no studies assessing how pediatric providers are adopting these readily accessible tools. Objective: The aim of this study was to determine how pediatric providers are currently using LLMs in their work as well as their interest in using a Health Insurance Portability and Accountability Act (HIPAA)–compliant version of ChatGPT in the future. Methods: A survey instrument consisting of structured and unstructured questions was iteratively developed by a team of informaticians from various pediatric specialties. The survey was sent via Research Electronic Data Capture (REDCap) to all Boston Children’s Hospital pediatric providers. Participation was voluntary and uncompensated, and all survey responses were anonymous.  Results: Surveys were completed by 390 pediatric providers. Approximately 50% (197/390) of respondents had used an LLM; of these, almost 75% (142/197) were already using an LLM for nonclinical work and 27% (52/195) for clinical work. Providers detailed the various ways they are currently using an LLM in their clinical and nonclinical work. Only 29% (n=105) of 362 respondents indicated that ChatGPT should be used for patient care in its present state; however, 73.8% (273/368) reported they would use a HIPAA-compliant version of ChatGPT if one were available. Providers’ proposed future uses of LLMs in health care are described. Conclusions: Despite significant concerns and barriers to LLM use in health care, pediatric providers are already using LLMs at work. This study will give policy makers needed information about how providers are using LLMs clinically. %M 39265163 %R 10.2196/56797 %U https://formative.jmir.org/2024/1/e56797 %U https://doi.org/10.2196/56797 %U http://www.ncbi.nlm.nih.gov/pubmed/39265163 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e55591 %T The Normalization of Vaping on TikTok Using Computer Vision, Natural Language Processing, and Qualitative Thematic Analysis: Mixed Methods Study %A Jung,Sungwon %A Murthy,Dhiraj %A Bateineh,Bara S %A Loukas,Alexandra %A Wilkinson,Anna V %+ School of Journalism and Media, University of Texas at Austin, 300 W Dean Keeton St, Austin, TX, 78712, United States, 1 512 749 3267, sungwon.jung@utexas.edu %K electronic cigarettes %K vaping %K social media %K natural language processing %K computer vision %D 2024 %7 11.9.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Social media posts that portray vaping in positive social contexts shape people’s perceptions and serve to normalize vaping. Despite restrictions on depicting or promoting controlled substances, vape-related content is easily accessible on TikTok. There is a need to understand strategies used in promoting vaping on TikTok, especially among susceptible youth audiences. Objective: This study seeks to comprehensively describe direct (ie, explicit promotional efforts) and indirect (ie, subtler strategies) themes promoting vaping on TikTok using a mixture of computational and qualitative thematic analyses of social media posts. In addition, we aim to describe how these themes might play a role in normalizing vaping behavior on TikTok for youth audiences, thereby informing public health communication and regulatory policies regarding vaping endorsements on TikTok. Methods: We collected 14,002 unique TikTok posts using 50 vape-related hashtags (eg, #vapetok and #boxmod). Using the k-means unsupervised machine learning algorithm, we identified clusters and then categorized posts qualitatively based on themes. Next, we organized all videos from the posts thematically and extracted the visual features of each theme using 3 machine learning–based model architectures: residual network (ResNet) with 50 layers (ResNet50), Visual Geometry Group model with 16 layers, and vision transformer. We chose the best-performing model, ResNet50, to thoroughly analyze the image clustering output. To assess clustering accuracy, we examined 4.01% (441/10,990) of the samples from each video cluster. Finally, we randomly selected 50 videos (5% of the total videos) from each theme, which were qualitatively coded and compared with the machine-derived classification for validation. Results: We successfully identified 5 major themes from the TikTok posts. Vape product marketing (1160/10,990, 8.28%) reflected direct marketing, while the other 4 themes reflected indirect marketing: TikTok influencer (3775/14,002, 26.96%), general vape (2741/14,002, 19.58%), vape brands (2042/14,002, 14.58%), and vaping cessation (1272/14,002, 9.08%). The ResNet50 model successfully classified clusters based on image features, achieving an average F1-score of 0.97, the highest among the 3 models. Qualitative content analyses indicated that vaping was depicted as a normal, routine part of daily life, with TikTok influencers subtly incorporating vaping into popular culture (eg, gaming, skateboarding, and tattooing) and social practices (eg, shopping sprees, driving, and grocery shopping). Conclusions: The results from both computational and qualitative analyses of text and visual data reveal that vaping is normalized on TikTok. Our identified themes underscore how everyday conversations, promotional content, and the influence of popular figures collectively contribute to depicting vaping as a normal and accepted aspect of daily life on TikTok. Our study provides valuable insights for regulatory policies and public health initiatives aimed at tackling the normalization of vaping on social media platforms. %M 39259963 %R 10.2196/55591 %U https://www.jmir.org/2024/1/e55591 %U https://doi.org/10.2196/55591 %U http://www.ncbi.nlm.nih.gov/pubmed/39259963 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e59711 %T A 25-Year Retrospective of the Use of AI for Diagnosing Acute Stroke: Systematic Review %A Wang,Zhaoxin %A Yang,Wenwen %A Li,Zhengyu %A Rong,Ze %A Wang,Xing %A Han,Jincong %A Ma,Lei %+ Nantong University, 9# Seyuan Road, Chongchuan District, Nantong, 226019, China, 86 18860970645, mlmyhero@163.com %K acute stroke %K artificial intelligence %K AI %K machine learning %K deep learning %K stroke lesion segmentation and classification %K stroke prediction %K stroke prognosis %D 2024 %7 10.9.2024 %9 Review %J J Med Internet Res %G English %X Background: Stroke is a leading cause of death and disability worldwide. Rapid and accurate diagnosis is crucial for minimizing brain damage and optimizing treatment plans. Objective: This review aims to summarize the methods of artificial intelligence (AI)–assisted stroke diagnosis over the past 25 years, providing an overview of performance metrics and algorithm development trends. It also delves into existing issues and future prospects, intending to offer a comprehensive reference for clinical practice. Methods: A total of 50 representative articles published between 1999 and 2024 on using AI technology for stroke prevention and diagnosis were systematically selected and analyzed in detail. Results: AI-assisted stroke diagnosis has made significant advances in stroke lesion segmentation and classification, stroke risk prediction, and stroke prognosis. Before 2012, research mainly focused on segmentation using traditional thresholding and heuristic techniques. From 2012 to 2016, the focus shifted to machine learning (ML)–based approaches. After 2016, the emphasis moved to deep learning (DL), which brought significant improvements in accuracy. In stroke lesion segmentation and classification as well as stroke risk prediction, DL has shown superiority over ML. In stroke prognosis, both DL and ML have shown good performance. Conclusions: Over the past 25 years, AI technology has shown promising performance in stroke diagnosis. %M 39255472 %R 10.2196/59711 %U https://www.jmir.org/2024/1/e59711 %U https://doi.org/10.2196/59711 %U http://www.ncbi.nlm.nih.gov/pubmed/39255472 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e60501 %T Prompt Engineering Paradigms for Medical Applications: Scoping Review %A Zaghir,Jamil %A Naguib,Marco %A Bjelogrlic,Mina %A Névéol,Aurélie %A Tannier,Xavier %A Lovis,Christian %+ Department of Radiology and Medical Informatics, University of Geneva, Chemin des Mines, 9, Geneva, 1202, Switzerland, 41 022 379 08 18, Jamil.Zaghir@unige.ch %K prompt engineering %K prompt design %K prompt learning %K prompt tuning %K large language models %K LLMs %K scoping review %K clinical natural language processing %K natural language processing %K NLP %K medical texts %K medical application %K medical applications %K clinical practice %K privacy %K medicine %K computer science %K medical informatics %D 2024 %7 10.9.2024 %9 Review %J J Med Internet Res %G English %X Background: Prompt engineering, focusing on crafting effective prompts to large language models (LLMs), has garnered attention for its capabilities at harnessing the potential of LLMs. This is even more crucial in the medical domain due to its specialized terminology and language technicity. Clinical natural language processing applications must navigate complex language and ensure privacy compliance. Prompt engineering offers a novel approach by designing tailored prompts to guide models in exploiting clinically relevant information from complex medical texts. Despite its promise, the efficacy of prompt engineering in the medical domain remains to be fully explored. Objective: The aim of the study is to review research efforts and technical approaches in prompt engineering for medical applications as well as provide an overview of opportunities and challenges for clinical practice. Methods: Databases indexing the fields of medicine, computer science, and medical informatics were queried in order to identify relevant published papers. Since prompt engineering is an emerging field, preprint databases were also considered. Multiple data were extracted, such as the prompt paradigm, the involved LLMs, the languages of the study, the domain of the topic, the baselines, and several learning, design, and architecture strategies specific to prompt engineering. We include studies that apply prompt engineering–based methods to the medical domain, published between 2022 and 2024, and covering multiple prompt paradigms such as prompt learning (PL), prompt tuning (PT), and prompt design (PD). Results: We included 114 recent prompt engineering studies. Among the 3 prompt paradigms, we have observed that PD is the most prevalent (78 papers). In 12 papers, PD, PL, and PT terms were used interchangeably. While ChatGPT is the most commonly used LLM, we have identified 7 studies using this LLM on a sensitive clinical data set. Chain-of-thought, present in 17 studies, emerges as the most frequent PD technique. While PL and PT papers typically provide a baseline for evaluating prompt-based approaches, 61% (48/78) of the PD studies do not report any nonprompt-related baseline. Finally, we individually examine each of the key prompt engineering–specific information reported across papers and find that many studies neglect to explicitly mention them, posing a challenge for advancing prompt engineering research. Conclusions: In addition to reporting on trends and the scientific landscape of prompt engineering, we provide reporting guidelines for future studies to help advance research in the medical field. We also disclose tables and figures summarizing medical prompt engineering papers available and hope that future contributions will leverage these existing works to better advance the field. %M 39255030 %R 10.2196/60501 %U https://www.jmir.org/2024/1/e60501 %U https://doi.org/10.2196/60501 %U http://www.ncbi.nlm.nih.gov/pubmed/39255030 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e58187 %T Detection of Sleep Apnea Using Wearable AI: Systematic Review and Meta-Analysis %A Abd-alrazaq,Alaa %A Aslam,Hania %A AlSaad,Rawan %A Alsahli,Mohammed %A Ahmed,Arfan %A Damseh,Rafat %A Aziz,Sarah %A Sheikh,Javaid %+ AI Center for Precision Health, Weill Cornell Medicine-Qatar, Qatar Foundation, A31 Luqta street, Education City, Doha, Qatar, 974 55787845654, aaa4027@qatar-med.cornell.edu %K sleep apnea %K hypopnea %K artificial intelligence %K wearable devices %K machine learning %K systematic review %K mobile phone %D 2024 %7 10.9.2024 %9 Review %J J Med Internet Res %G English %X Background: Early detection of sleep apnea, the health condition where airflow either ceases or decreases episodically during sleep, is crucial to initiate timely interventions and avoid complications. Wearable artificial intelligence (AI), the integration of AI algorithms into wearable devices to collect and analyze data to offer various functionalities and insights, can efficiently detect sleep apnea due to its convenience, accessibility, affordability, objectivity, and real-time monitoring capabilities, thereby addressing the limitations of traditional approaches such as polysomnography. Objective: The objective of this systematic review was to examine the effectiveness of wearable AI in detecting sleep apnea, its type, and its severity. Methods: Our search was conducted in 6 electronic databases. This review included English research articles evaluating wearable AI’s performance in identifying sleep apnea, distinguishing its type, and gauging its severity. Two researchers independently conducted study selection, extracted data, and assessed the risk of bias using an adapted Quality Assessment of Studies of Diagnostic Accuracy-Revised tool. We used both narrative and statistical techniques for evidence synthesis. Results: Among 615 studies, 38 (6.2%) met the eligibility criteria for this review. The pooled mean accuracy, sensitivity, and specificity of wearable AI in detecting apnea events in respiration (apnea and nonapnea events) were 0.893, 0.793, and 0.947, respectively. The pooled mean accuracy of wearable AI in differentiating types of apnea events in respiration (normal, obstructive sleep apnea, central sleep apnea, mixed apnea, and hypopnea) was 0.815. The pooled mean accuracy, sensitivity, and specificity of wearable AI in detecting sleep apnea were 0.869, 0.938, and 0.752, respectively. The pooled mean accuracy of wearable AI in identifying the severity level of sleep apnea (normal, mild, moderate, and severe) and estimating the severity score (Apnea-Hypopnea Index) was 0.651 and 0.877, respectively. Subgroup analyses found different moderators of wearable AI performance for different outcomes, such as the type of algorithm, type of data, type of sleep apnea, and placement of wearable devices. Conclusions: Wearable AI shows potential in identifying and classifying sleep apnea, but its current performance is suboptimal for routine clinical use. We recommend concurrent use with traditional assessments until improved evidence supports its reliability. Certified commercial wearables are needed for effectively detecting sleep apnea, predicting its occurrence, and delivering proactive interventions. Researchers should conduct further studies on detecting central sleep apnea, prioritize deep learning algorithms, incorporate self-reported and nonwearable data, evaluate performance across different device placements, and provide detailed findings for effective meta-analyses. %M 39255014 %R 10.2196/58187 %U https://www.jmir.org/2024/1/e58187 %U https://doi.org/10.2196/58187 %U http://www.ncbi.nlm.nih.gov/pubmed/39255014 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e56121 %T Quality and Accountability of ChatGPT in Health Care in Low- and Middle-Income Countries: Simulated Patient Study %A Si,Yafei %A Yang,Yuyi %A Wang,Xi %A Zu,Jiaqi %A Chen,Xi %A Fan,Xiaojing %A An,Ruopeng %A Gong,Sen %+ School of Public Policy and Administration, Xi’an Jiaotong University, 28 West Xianning Road, Xi'an, 710049, China, 86 15891725861, emirada@163.com %K ChatGPT %K generative AI %K simulated patient %K health care %K quality and safety %K low- and middle-income countries %K quality %K LMIC %K patient study %K effectiveness %K reliability %K medication prescription %K prescription %K noncommunicable diseases %K AI integration %K AI %K artificial intelligence %D 2024 %7 9.9.2024 %9 Research Letter %J J Med Internet Res %G English %X Using simulated patients to mimic 9 established noncommunicable and infectious diseases, we assessed ChatGPT’s performance in treatment recommendations for common diseases in low- and middle-income countries. ChatGPT had a high level of accuracy in both correct diagnoses (20/27, 74%) and medication prescriptions (22/27, 82%) but a concerning level of unnecessary or harmful medications (23/27, 85%) even with correct diagnoses. ChatGPT performed better in managing noncommunicable diseases than infectious ones. These results highlight the need for cautious AI integration in health care systems to ensure quality and safety. %M 39250188 %R 10.2196/56121 %U https://www.jmir.org/2024/1/e56121 %U https://doi.org/10.2196/56121 %U http://www.ncbi.nlm.nih.gov/pubmed/39250188 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e58478 %T Practical Applications of Large Language Models for Health Care Professionals and Scientists %A Reis,Florian %A Lenz,Christian %A Gossen,Manfred %A Volk,Hans-Dieter %A Drzeniek,Norman Michael %K artificial intelligence %K healthcare %K chatGPT %K large language model %K prompting %K LLM %K applications %K AI %K scientists %K physicians %K health care %D 2024 %7 5.9.2024 %9 %J JMIR Med Inform %G English %X With the popularization of large language models (LLMs), strategies for their effective and safe usage in health care and research have become increasingly pertinent. Despite the growing interest and eagerness among health care professionals and scientists to exploit the potential of LLMs, initial attempts may yield suboptimal results due to a lack of user experience, thus complicating the integration of artificial intelligence (AI) tools into workplace routine. Focusing on scientists and health care professionals with limited LLM experience, this viewpoint article highlights and discusses 6 easy-to-implement use cases of practical relevance. These encompass customizing translations, refining text and extracting information, generating comprehensive overviews and specialized insights, compiling ideas into cohesive narratives, crafting personalized educational materials, and facilitating intellectual sparring. Additionally, we discuss general prompting strategies and precautions for the implementation of AI tools in biomedicine. Despite various hurdles and challenges, the integration of LLMs into daily routines of physicians and researchers promises heightened workplace productivity and efficiency. %R 10.2196/58478 %U https://medinform.jmir.org/2024/1/e58478 %U https://doi.org/10.2196/58478 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e58185 %T Challenges and Facilitation Approaches for the Participatory Design of AI-Based Clinical Decision Support Systems: Protocol for a Scoping Review %A Rambach,Tabea %A Gleim,Patricia %A Mandelartz,Sekina %A Heizmann,Carolin %A Kunze,Christophe %A Kellmeyer,Philipp %+ Care & Technology Lab, Furtwangen University, Robert-Gerwig-Platz 1, Furtwangen, , Germany, 49 7723 920 2976, tabea.rambach@hs-furtwangen.de %K artificial intelligence %K AI %K participation %K participatory design %K co-creation %K clinical decision support system %K CDSS %K decision support %K challenges %K clinical staff %K scoping review %D 2024 %7 5.9.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: In the last few years, there has been an increasing interest in the development of artificial intelligence (AI)–based clinical decision support systems (CDSS). However, there are barriers to the successful implementation of such systems in practice, including the lack of acceptance of these systems. Participatory approaches aim to involve future users in designing applications such as CDSS to be more acceptable, feasible, and fundamentally more relevant for practice. The development of technologies based on AI, however, challenges the process of user involvement and related methods. Objective: The aim of this review is to summarize and present the main approaches, methods, practices, and specific challenges for participatory research and development of AI-based decision support systems involving clinicians. Methods: This scoping review will follow the Joanna Briggs Institute approach to scoping reviews. The search for eligible studies was conducted in the databases MEDLINE via PubMed; ACM Digital Library; Cumulative Index to Nursing and Allied Health; and PsycInfo. The following search filters, adapted to each database, were used: Period January 01, 2012, to October 31, 2023, English and German studies only, abstract available. The scoping review will include studies that involve the development, piloting, implementation, and evaluation of AI-based CDSS (hybrid and data-driven AI approaches). Clinical staff must be involved in a participatory manner. Data retrieval will be accompanied by a manual gray literature search. Potential publications will then be exported into reference management software, and duplicates will be removed. Afterward, the obtained set of papers will be transferred into a systematic review management tool. All publications will be screened, extracted, and analyzed: title and abstract screening will be carried out by 2 independent reviewers. Disagreements will be resolved by involving a third reviewer. Data will be extracted using a data extraction tool prepared for the study. Results: This scoping review protocol was registered on March 11, 2023, at the Open Science Framework. The full-text screening had already started at that time. Of the 3,118 studies screened by title and abstract, 31 were included in the full-text screening. Data collection and analysis as well as manuscript preparation are planned for the second and third quarter of 2024. The manuscript should be submitted towards the end of 2024. Conclusions: This review will describe the current state of knowledge on participatory development of AI-based decision support systems. The aim is to identify knowledge gaps and provide research impetus. It also aims to provide relevant information for policy makers and practitioners. International Registered Report Identifier (IRRID): DERR1-10.2196/58185 %M 39235846 %R 10.2196/58185 %U https://www.researchprotocols.org/2024/1/e58185 %U https://doi.org/10.2196/58185 %U http://www.ncbi.nlm.nih.gov/pubmed/39235846 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e56022 %T An Advanced Machine Learning Model for a Web-Based Artificial Intelligence–Based Clinical Decision Support System Application: Model Development and Validation Study %A Lin,Tai-Han %A Chung,Hsing-Yi %A Jian,Ming-Jr %A Chang,Chih-Kai %A Perng,Cherng-Lih %A Liao,Guo-Shiou %A Yu,Jyh-Cherng %A Dai,Ming-Shen %A Yu,Cheng-Ping %A Shang,Hung-Sheng %+ Division of Clinical Pathology, Department of Pathology, Tri-Service General Hospital, National Defense Medical Center, No. 161, Sec. 6, Minquan E. Road, Neihu District, Taipei, 11490, Taiwan, 886 920713130, iamkeith001@gmail.com %K breast cancer recurrence %K artificial intelligence–based clinical decision support system %K machine learning %K personalized treatment planning %K ChatGPT %K predictive model accuracy %D 2024 %7 4.9.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Breast cancer is a leading global health concern, necessitating advancements in recurrence prediction and management. The development of an artificial intelligence (AI)–based clinical decision support system (AI-CDSS) using ChatGPT addresses this need with the aim of enhancing both prediction accuracy and user accessibility. Objective: This study aims to develop and validate an advanced machine learning model for a web-based AI-CDSS application, leveraging the question-and-answer guidance capabilities of ChatGPT to enhance data preprocessing and model development, thereby improving the prediction of breast cancer recurrence. Methods: This study focused on developing an advanced machine learning model by leveraging data from the Tri-Service General Hospital breast cancer registry of 3577 patients (2004-2016). As a tertiary medical center, it accepts referrals from four branches—3 branches in the northern region and 1 branch on an offshore island in our country—that manage chronic diseases but refer complex surgical cases, including breast cancer, to the main center, enriching our study population’s diversity. Model training used patient data from 2004 to 2012, with subsequent validation using data from 2013 to 2016, ensuring comprehensive assessment and robustness of our predictive models. ChatGPT is integral to preprocessing and model development, aiding in hormone receptor categorization, age binning, and one-hot encoding. Techniques such as the synthetic minority oversampling technique address the imbalance of data sets. Various algorithms, including light gradient-boosting machine, gradient boosting, and extreme gradient boosting, were used, and their performance was evaluated using metrics such as the area under the curve, accuracy, sensitivity, and F1-score. Results: The light gradient-boosting machine model demonstrated superior performance, with an area under the curve of 0.80, followed closely by the gradient boosting and extreme gradient boosting models. The web interface of the AI-CDSS tool was effectively tested in clinical decision-making scenarios, proving its use in personalized treatment planning and patient involvement. Conclusions: The AI-CDSS tool, enhanced by ChatGPT, marks a significant advancement in breast cancer recurrence prediction, offering a more individualized and accessible approach for clinicians and patients. Although promising, further validation in diverse clinical settings is recommended to confirm its efficacy and expand its use. %M 39231422 %R 10.2196/56022 %U https://www.jmir.org/2024/1/e56022 %U https://doi.org/10.2196/56022 %U http://www.ncbi.nlm.nih.gov/pubmed/39231422 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54621 %T Enhancing Patient Selection in Sepsis Clinical Trials Design Through an AI Enrichment Strategy: Algorithm Development and Validation %A Yang,Meicheng %A Zhuang,Jinqiang %A Hu,Wenhan %A Li,Jianqing %A Wang,Yu %A Zhang,Zhongheng %A Liu,Chengyu %A Chen,Hui %+ Jiangsu Provincial Key Laboratory of Critical Care Medicine, Department of Critical Care Medicine, Zhongda Hospital, School of Medicine, Southeast University, No 87, Dingjiaqiao Road, Gulou District, Nanjing, 210009, China, 86 15905162429, 15905162429@163.com %K sepsis %K enrichment strategy %K disease progression trajectories %K artificial intelligence %K predictive modeling %K conformal prediction %D 2024 %7 4.9.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Sepsis is a heterogeneous syndrome, and enrollment of more homogeneous patients is essential to improve the efficiency of clinical trials. Artificial intelligence (AI) has facilitated the identification of homogeneous subgroups, but how to estimate the uncertainty of the model outputs when applying AI to clinical decision-making remains unknown. Objective: We aimed to design an AI-based model for purposeful patient enrollment, ensuring that a patient with sepsis recruited into a trial would still be persistently ill by the time the proposed therapy could impact patient outcome. We also expected that the model could provide interpretable factors and estimate the uncertainty of the model outputs at a customized confidence level. Methods: In this retrospective study, 9135 patients with sepsis requiring vasopressor treatment within 24 hours after sepsis onset were enrolled from Beth Israel Deaconess Medical Center. This cohort was used for model development, and 10-fold cross-validation with 50 repeats was used for internal validation. In total, 3743 patients with sepsis from the eICU Collaborative Research Database were used as the external validation cohort. All included patients with sepsis were stratified based on disease progression trajectories: rapid death, recovery, and persistent ill. A total of 148 variables were selected for predicting the 3 trajectories. Four machine learning algorithms with 3 different setups were used. We estimated the uncertainty of the model outputs using conformal prediction (CP). The Shapley Additive Explanations method was used to explain the model. Results: The multiclass gradient boosting machine was identified as the best-performing model with good discrimination and calibration performance in both validation cohorts. The mean area under the receiver operating characteristic curve with SD was 0.906 (0.018) for rapid death, 0.843 (0.008) for recovery, and 0.807 (0.010) for persistent ill in the internal validation cohort. In the external validation cohort, the mean area under the receiver operating characteristic curve (SD) was 0.878 (0.003) for rapid death, 0.764 (0.008) for recovery, and 0.696 (0.007) for persistent ill. The maximum norepinephrine equivalence, total urine output, Acute Physiology Score III, mean systolic blood pressure, and the coefficient of variation of oxygen saturation contributed the most. Compared to the model without CP, using the model with CP at a mixed confidence approach reduced overall prediction errors by 27.6% (n=62) and 30.7% (n=412) in the internal and external validation cohorts, respectively, as well as enabled the identification of more potentially persistent ill patients. Conclusions: The implementation of our model has the potential to reduce heterogeneity and enroll more homogeneous patients in sepsis clinical trials. The use of CP for estimating the uncertainty of the model outputs allows for a more comprehensive understanding of the model’s reliability and assists in making informed decisions based on the predicted outcomes. %M 39231425 %R 10.2196/54621 %U https://www.jmir.org/2024/1/e54621 %U https://doi.org/10.2196/54621 %U http://www.ncbi.nlm.nih.gov/pubmed/39231425 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e59258 %T Evaluating the Capabilities of Generative AI Tools in Understanding Medical Papers: Qualitative Study %A Akyon,Seyma Handan %A Akyon,Fatih Cagatay %A Camyar,Ahmet Sefa %A Hızlı,Fatih %A Sari,Talha %A Hızlı,Şamil %+ Golpazari Family Health Center, Istiklal Mahallesi Fevzi Cakmak Caddesi No:23 Golpazari, Bilecik, 11700, Turkey, 90 5052568096, drseymahandan@gmail.com %K large language models %K LLM %K LLMs %K ChatGPT %K artificial intelligence %K AI %K natural language processing %K medicine %K health care %K GPT %K machine learning %K language model %K language models %K generative %K research paper %K research papers %K scientific research %K answer %K answers %K response %K responses %K comprehension %K STROBE %K Strengthening the Reporting of Observational Studies in Epidemiology %D 2024 %7 4.9.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Reading medical papers is a challenging and time-consuming task for doctors, especially when the papers are long and complex. A tool that can help doctors efficiently process and understand medical papers is needed. Objective: This study aims to critically assess and compare the comprehension capabilities of large language models (LLMs) in accurately and efficiently understanding medical research papers using the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) checklist, which provides a standardized framework for evaluating key elements of observational study. Methods: The study is a methodological type of research. The study aims to evaluate the understanding capabilities of new generative artificial intelligence tools in medical papers. A novel benchmark pipeline processed 50 medical research papers from PubMed, comparing the answers of 6 LLMs (GPT-3.5-Turbo, GPT-4-0613, GPT-4-1106, PaLM 2, Claude v1, and Gemini Pro) to the benchmark established by expert medical professors. Fifteen questions, derived from the STROBE checklist, assessed LLMs’ understanding of different sections of a research paper. Results: LLMs exhibited varying performance, with GPT-3.5-Turbo achieving the highest percentage of correct answers (n=3916, 66.9%), followed by GPT-4-1106 (n=3837, 65.6%), PaLM 2 (n=3632, 62.1%), Claude v1 (n=2887, 58.3%), Gemini Pro (n=2878, 49.2%), and GPT-4-0613 (n=2580, 44.1%). Statistical analysis revealed statistically significant differences between LLMs (P<.001), with older models showing inconsistent performance compared to newer versions. LLMs showcased distinct performances for each question across different parts of a scholarly paper—with certain models like PaLM 2 and GPT-3.5 showing remarkable versatility and depth in understanding. Conclusions: This study is the first to evaluate the performance of different LLMs in understanding medical papers using the retrieval augmented generation method. The findings highlight the potential of LLMs to enhance medical research by improving efficiency and facilitating evidence-based decision-making. Further research is needed to address limitations such as the influence of question formats, potential biases, and the rapid evolution of LLM models. %M 39230947 %R 10.2196/59258 %U https://medinform.jmir.org/2024/1/e59258 %U https://doi.org/10.2196/59258 %U http://www.ncbi.nlm.nih.gov/pubmed/39230947 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e57335 %T The Application of Mask Region-Based Convolutional Neural Networks in the Detection of Nasal Septal Deviation Using Cone Beam Computed Tomography Images: Proof-of-Concept Study %A Shetty,Shishir %A Mubarak,Auwalu Saleh %A R David,Leena %A Al Jouhari,Mhd Omar %A Talaat,Wael %A Al-Rawi,Natheer %A AlKawas,Sausan %A Shetty,Sunaina %A Uzun Ozsahin,Dilber %+ Department of Medical Diagnostic Imaging, College of Health Sciences, University of Sharjah, Building M31, Sharjah, 27272, United Arab Emirates, 971 556491740, dozsahin@sharjah.ac.ae %K convolutional neural networks %K nasal septal deviation %K cone beam computed tomography %K tomographic %K tomography %K nasal %K nose %K face %K facial %K image %K images %K imagery %K artificial intelligence %K CNN %K neural network %K neural networks %K ResNet %D 2024 %7 3.9.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Artificial intelligence (AI) models are being increasingly studied for the detection of variations and pathologies in different imaging modalities. Nasal septal deviation (NSD) is an important anatomical structure with clinical implications. However, AI-based radiographic detection of NSD has not yet been studied. Objective: This research aimed to develop and evaluate a real-time model that can detect probable NSD using cone beam computed tomography (CBCT) images. Methods: Coronal section images were obtained from 204 full-volume CBCT scans. The scans were classified as normal and deviated by 2 maxillofacial radiologists. The images were then used to train and test the AI model. Mask region-based convolutional neural networks (Mask R-CNNs) comprising 3 different backbones—ResNet50, ResNet101, and MobileNet—were used to detect deviated nasal septum in 204 CBCT images. To further improve the detection, an image preprocessing technique (contrast enhancement [CEH]) was added. Results: The best-performing model—CEH-ResNet101—achieved a mean average precision of 0.911, with an area under the curve of 0.921. Conclusions: The performance of the model shows that the model is capable of detecting nasal septal deviation. Future research in this field should focus on additional preprocessing of images and detection of NSD based on multiple planes using 3D images. %M 39226096 %R 10.2196/57335 %U https://formative.jmir.org/2024/1/e57335 %U https://doi.org/10.2196/57335 %U http://www.ncbi.nlm.nih.gov/pubmed/39226096 %0 Journal Article %@ 2371-4379 %I JMIR Publications %V 9 %N %P e59867 %T Implementation of Artificial Intelligence–Based Diabetic Retinopathy Screening in a Tertiary Care Hospital in Quebec: Prospective Validation Study %A Antaki,Fares %A Hammana,Imane %A Tessier,Marie-Catherine %A Boucher,Andrée %A David Jetté,Maud Laurence %A Beauchemin,Catherine %A Hammamji,Karim %A Ong,Ariel Yuhan %A Rhéaume,Marc-André %A Gauthier,Danny %A Harissi-Dagher,Mona %A Keane,Pearse A %A Pomp,Alfons %+ Institute of Ophthalmology, University College London, 11-43 Bath St, London, EC1V 9EL, United Kingdom, 44 20 7608 6800, f.antaki@ucl.ac.uk %K artificial intelligence %K diabetic retinopathy %K screening %K clinical validation %K diabetic %K diabetes %K screening %K tertiary care hospital %K validation study %K Quebec %K Canada %K vision %K vision loss %K ophthalmological %K AI %K detection %K eye %D 2024 %7 3.9.2024 %9 Original Paper %J JMIR Diabetes %G English %X Background: Diabetic retinopathy (DR) affects about 25% of people with diabetes in Canada. Early detection of DR is essential for preventing vision loss. Objective: We evaluated the real-world performance of an artificial intelligence (AI) system that analyzes fundus images for DR screening in a Quebec tertiary care center. Methods: We prospectively recruited adult patients with diabetes at the Centre hospitalier de l’Université de Montréal (CHUM) in Montreal, Quebec, Canada. Patients underwent dual-pathway screening: first by the Computer Assisted Retinal Analysis (CARA) AI system (index test), then by standard ophthalmological examination (reference standard). We measured the AI system's sensitivity and specificity for detecting referable disease at the patient level, along with its performance for detecting any retinopathy and diabetic macular edema (DME) at the eye level, and potential cost savings. Results: This study included 115 patients. CARA demonstrated a sensitivity of 87.5% (95% CI 71.9-95.0) and specificity of 66.2% (95% CI 54.3-76.3) for detecting referable disease at the patient level. For any retinopathy detection at the eye level, CARA showed 88.2% sensitivity (95% CI 76.6-94.5) and 71.4% specificity (95% CI 63.7-78.1). For DME detection, CARA had 100% sensitivity (95% CI 64.6-100) and 81.9% specificity (95% CI 75.6-86.8). Potential yearly savings from implementing CARA at the CHUM were estimated at CAD $245,635 (US $177,643.23, as of July 26, 2024) considering 5000 patients with diabetes. Conclusions: Our study indicates that integrating a semiautomated AI system for DR screening demonstrates high sensitivity for detecting referable disease in a real-world setting. This system has the potential to improve screening efficiency and reduce costs at the CHUM, but more work is needed to validate it. %M 39226095 %R 10.2196/59867 %U https://diabetes.jmir.org/2024/1/e59867 %U https://doi.org/10.2196/59867 %U http://www.ncbi.nlm.nih.gov/pubmed/39226095 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 11 %N %P e62866 %T Development of a System for Predicting Hospitalization Time for Patients With Traumatic Brain Injury Based on Machine Learning Algorithms: User-Centered Design Case Study %A Zhou,Huan %A Fang,Cheng %A Pan,Yifeng %K machine learning %K traumatic brain injury %K support vector regression machine %K predictive model %K hospitalization %D 2024 %7 30.8.2024 %9 %J JMIR Hum Factors %G English %X Background: Currently, the treatment and care of patients with traumatic brain injury (TBI) are intractable health problems worldwide and greatly increase the medical burden in society. However, machine learning–based algorithms and the use of a large amount of data accumulated in the clinic in the past can predict the hospitalization time of patients with brain injury in advance, so as to design a reasonable arrangement of resources and effectively reduce the medical burden of society. Especially in China, where medical resources are so tight, this method has important application value. Objective: We aimed to develop a system based on a machine learning model for predicting the length of hospitalization of patients with TBI, which is available to patients, nurses, and physicians. Methods: We collected information on 1128 patients who received treatment at the Neurosurgery Center of the Second Affiliated Hospital of Anhui Medical University from May 2017 to May 2022, and we trained and tested the machine learning model using 5 cross-validations to avoid overfitting; 28 types of independent variables were used as input variables in the machine learning model, and the length of hospitalization was used as the output variables. Once the models were trained, we obtained the error and goodness of fit (R2) of each machine learning model from the 5 rounds of cross-validation and compared them to select the best predictive model to be encapsulated in the developed system. In addition, we externally tested the models using clinical data related to patients treated at the First Affiliated Hospital of Anhui Medical University from June 2021 to February 2022. Results: Six machine learning models were built, including support vector regression machine, convolutional neural network, back propagation neural network, random forest, logistic regression, and multilayer perceptron. Among them, the support vector regression has the smallest error of 10.22% on the test set, the highest goodness of fit of 90.4%, and all performances are the best among the 6 models. In addition, we used external datasets to verify the experimental results of these 6 models in order to avoid experimental chance, and the support vector regression machine eventually performed the best in the external datasets. Therefore, we chose to encapsulate the support vector regression machine into our system for predicting the length of stay of patients with traumatic brain trauma. Finally, we made the developed system available to patients, nurses, and physicians, and the satisfaction questionnaire showed that patients, nurses, and physicians agreed that the system was effective in providing clinical decisions to help patients, nurses, and physicians. Conclusions: This study shows that the support vector regression machine model developed using machine learning methods can accurately predict the length of hospitalization of patients with TBI, and the developed prediction system has strong clinical use. %R 10.2196/62866 %U https://humanfactors.jmir.org/2024/1/e62866 %U https://doi.org/10.2196/62866 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e54449 %T Near Real-Time Syndromic Surveillance of Emergency Department Triage Texts Using Natural Language Processing: Case Study in Febrile Convulsion Detection %A Khademi,Sedigh %A Palmer,Christopher %A Javed,Muhammad %A Dimaguila,Gerardo Luis %A Clothier,Hazel %A Buttery,Jim %A Black,Jim %+ Department of Paediatrics, University of Melbourne, Grattan Street, Parkville, Melbourne, 3010, Australia, 61 405761879, sedigh.khademi@gmail.com %K vaccine safety %K immunization %K febrile convulsion %K syndromic surveillance %K emergency department %K natural language processing %D 2024 %7 30.8.2024 %9 Original Paper %J JMIR AI %G English %X Background: Collecting information on adverse events following immunization from as many sources as possible is critical for promptly identifying potential safety concerns and taking appropriate actions. Febrile convulsions are recognized as an important potential reaction to vaccination in children aged <6 years. Objective: The primary aim of this study was to evaluate the performance of natural language processing techniques and machine learning (ML) models for the rapid detection of febrile convulsion presentations in emergency departments (EDs), especially with respect to the minimum training data requirements to obtain optimum model performance. In addition, we examined the deployment requirements for a ML model to perform real-time monitoring of ED triage notes. Methods: We developed a pattern matching approach as a baseline and evaluated ML models for the classification of febrile convulsions in ED triage notes to determine both their training requirements and their effectiveness in detecting febrile convulsions. We measured their performance during training and then compared the deployed models’ result on new incoming ED data. Results: Although the best standard neural networks had acceptable performance and were low-resource models, transformer-based models outperformed them substantially, justifying their ongoing deployment. Conclusions: Using natural language processing, particularly with the use of large language models, offers significant advantages in syndromic surveillance. Large language models make highly effective classifiers, and their text generation capacity can be used to enhance the quality and diversity of training data. %M 39213519 %R 10.2196/54449 %U https://ai.jmir.org/2024/1/e54449 %U https://doi.org/10.2196/54449 %U http://www.ncbi.nlm.nih.gov/pubmed/39213519 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e46608 %T Predictors of Medical and Dental Clinic Closure by Machine Learning Methods: Cross-Sectional Study Using Empirical Data %A Park,Young-Taek %A Kim,Donghan %A Jeon,Ji Soo %A Kim,Kwang Gi %+ Department of Biomedical Engineering, College of Medicine, Gil Medical Center, Gachon University, 58-13 Docjemro, NamdongGum, Inchon, 21565, Republic of Korea, 82 324582770, kimkg@gachon.ac.kr %K machine learning %K health facility closure %K hospital closure %K clinic closure %K clinic bankruptcy %K hospital bankruptcy %K health clinic %K prediction %K healthcare resources %K artificial intelligence %K medical clinic %K health insurance %K health facility closure %D 2024 %7 30.8.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Small clinics are important in providing health care in local communities. Accurately predicting their closure would help manage health care resource allocation. There have been few studies on the prediction of clinic closure using machine learning techniques. Objective: This study aims to test the feasibility of predicting the closure of medical and dental clinics (MCs and DCs, respectively) and investigate important factors associated with their closure using machine running techniques. Methods: The units of analysis were MCs and DCs. This study used health insurance administrative data. The participants of this study ran and closed clinics between January 1, 2020, and December 31, 2021. Using all closed clinics, closed and run clinics were selected at a ratio of 1:2 based on the locality of study participants using the propensity matching score of logistic regression. This study used 23 and 19 variables to predict the closure of MCs and DCs, respectively. Key variables were extracted using permutation importance and the sequential feature selection technique. Finally, this study used 5 and 6 variables of MCs and DCs, respectively, for model learning. Furthermore, four machine learning techniques were used: (1) logistic regression, (2) support vector machine, (3) random forest (RF), and (4) Extreme Gradient Boost. This study evaluated the modeling accuracy using the area under curve (AUC) method and presented important factors critically affecting closures. This study used SAS (version 9.4; SAS Institute Inc) and Python (version 3.7.9; Python Software Foundation). Results: The best-fit model for the closure of MCs with cross-validation was the support vector machine (AUC 0.762, 95% CI 0.746-0.777; P<.001) followed by RF (AUC 0.736, 95% CI 0.720-0.752; P<.001). The best-fit model for DCs was Extreme Gradient Boost (AUC 0.700, 95% CI 0.675-0.725; P<.001) followed by RF (AUC 0.687, 95% CI 0.661-0.712; P<.001). The most significant factor associated with the closure of MCs was years of operation, followed by population growth, population, and percentage of medical specialties. In contrast, the main factor affecting the closure of DCs was the number of patients, followed by annual variation in the number of patients, year of operation, and percentage of dental specialists. Conclusions: This study showed that machine running methods are useful tools for predicting the closure of small medical facilities with a moderate level of accuracy. Essential factors affecting medical facility closure also differed between MCs and DCs. Developing good models would prevent unnecessary medical facility closures at the national level. %M 39213534 %R 10.2196/46608 %U https://www.jmir.org/2024/1/e46608 %U https://doi.org/10.2196/46608 %U http://www.ncbi.nlm.nih.gov/pubmed/39213534 %0 Journal Article %@ 2817-092X %I JMIR Publications %V 3 %N %P e56665 %T Ethics and Governance of Neurotechnology in Africa: Lessons From AI %A Eke,Damian %+ School of Computer Science, University of Nottingham, Wollaton Rd, Lenton, Nottingham, Nottingham, NG8 1BB, United Kingdom, damian.eke@nottingham.ac.uk %K neurotechnology %K Africa %K AI %K ethics %K governance %K ethics dumping %K regulations %K artificial intelligence %D 2024 %7 29.8.2024 %9 Viewpoint %J JMIR Neurotech %G English %X As a novel technology frontier, neurotechnology is revolutionizing our perceptions of the brain and nervous system. With growing private and public investments, a thriving ecosystem of direct-to-consumer neurotechnologies has also emerged. These technologies are increasingly being introduced in many parts of the world, including Africa. However, as the use of this technology expands, neuroethics and ethics of emerging technology scholars are bringing attention to the critical concerns it raises. These concerns are largely not new but are uniquely amplified by the novelty of technology. They include ethical and legal issues such as privacy, human rights, human identity, bias, autonomy, and safety, which are part of the artificial intelligence ethics discourse. Most importantly, there is an obvious lack of regulatory oversight and a dearth of literature on the consideration of contextual ethical principles in the design and application of neurotechnology in Africa. This paper highlights lessons African stakeholders need to learn from the ethics and governance of artificial intelligence to ensure the design of ethically responsible and socially acceptable neurotechnology in and for Africa. %R 10.2196/56665 %U https://neuro.jmir.org/2024/1/e56665 %U https://doi.org/10.2196/56665 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e58455 %T Obtaining the Most Accurate, Explainable Model for Predicting Chronic Obstructive Pulmonary Disease: Triangulation of Multiple Linear Regression and Machine Learning Methods %A Kamis,Arnold %A Gadia,Nidhi %A Luo,Zilin %A Ng,Shu Xin %A Thumbar,Mansi %+ Brandeis International Business School, Brandeis University, Sachar International Center, 415 South St, Waltham, MA, 02453, United States, 1 781 736 8544, akamis@brandeis.edu %K chronic obstructive pulmonary disease %K COPD %K cigarette smoking %K ethnic and racial differences %K machine learning %K multiple linear regression %K household income %K practical model %D 2024 %7 29.8.2024 %9 Original Paper %J JMIR AI %G English %X Background: Lung disease is a severe problem in the United States. Despite the decreasing rates of cigarette smoking, chronic obstructive pulmonary disease (COPD) continues to be a health burden in the United States. In this paper, we focus on COPD in the United States from 2016 to 2019. Objective: We gathered a diverse set of non–personally identifiable information from public data sources to better understand and predict COPD rates at the core-based statistical area (CBSA) level in the United States. Our objective was to compare linear models with machine learning models to obtain the most accurate and interpretable model of COPD. Methods: We integrated non–personally identifiable information from multiple Centers for Disease Control and Prevention sources and used them to analyze COPD with different types of methods. We included cigarette smoking, a well-known contributing factor, and race/ethnicity because health disparities among different races and ethnicities in the United States are also well known. The models also included the air quality index, education, employment, and economic variables. We fitted models with both multiple linear regression and machine learning methods. Results: The most accurate multiple linear regression model has variance explained of 81.1%, mean absolute error of 0.591, and symmetric mean absolute percentage error of 9.666. The most accurate machine learning model has variance explained of 85.7%, mean absolute error of 0.456, and symmetric mean absolute percentage error of 6.956. Overall, cigarette smoking and household income are the strongest predictor variables. Moderately strong predictors include education level and unemployment level, as well as American Indian or Alaska Native, Black, and Hispanic population percentages, all measured at the CBSA level. Conclusions: This research highlights the importance of using diverse data sources as well as multiple methods to understand and predict COPD. The most accurate model was a gradient boosted tree, which captured nonlinearities in a model whose accuracy is superior to the best multiple linear regression. Our interpretable models suggest ways that individual predictor variables can be used in tailored interventions aimed at decreasing COPD rates in specific demographic and ethnographic communities. Gaps in understanding the health impacts of poor air quality, particularly in relation to climate change, suggest a need for further research to design interventions and improve public health. %M 39207843 %R 10.2196/58455 %U https://ai.jmir.org/2024/1/e58455 %U https://doi.org/10.2196/58455 %U http://www.ncbi.nlm.nih.gov/pubmed/39207843 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 11 %N %P e48633 %T Barriers to and Facilitators of Artificial Intelligence Adoption in Health Care: Scoping Review %A Hassan,Masooma %A Kushniruk,Andre %A Borycki,Elizabeth %+ Department of Health Information Science, University of Victoria, HSD Building, A202, Victoria, BC, V8W 2Y2, Canada, 1 6472876274, masooma.d.hassan@gmail.com %K artificial intelligence %K governance %K health information systems %K artificial intelligence adoption %K system implementation %K health care organizations %K health services %K mobile phone %D 2024 %7 29.8.2024 %9 Review %J JMIR Hum Factors %G English %X Background: Artificial intelligence (AI) use cases in health care are on the rise, with the potential to improve operational efficiency and care outcomes. However, the translation of AI into practical, everyday use has been limited, as its effectiveness relies on successful implementation and adoption by clinicians, patients, and other health care stakeholders. Objective: As adoption is a key factor in the successful proliferation of an innovation, this scoping review aimed at presenting an overview of the barriers to and facilitators of AI adoption in health care. Methods: A scoping review was conducted using the guidance provided by the Joanna Briggs Institute and the framework proposed by Arksey and O’Malley. MEDLINE, IEEE Xplore, and ScienceDirect databases were searched to identify publications in English that reported on the barriers to or facilitators of AI adoption in health care. This review focused on articles published between January 2011 and December 2023. The review did not have any limitations regarding the health care setting (hospital or community) or the population (patients, clinicians, physicians, or health care administrators). A thematic analysis was conducted on the selected articles to map factors associated with the barriers to and facilitators of AI adoption in health care. Results: A total of 2514 articles were identified in the initial search. After title and abstract reviews, 50 (1.99%) articles were included in the final analysis. These articles were reviewed for the barriers to and facilitators of AI adoption in health care. Most articles were empirical studies, literature reviews, reports, and thought articles. Approximately 18 categories of barriers and facilitators were identified. These were organized sequentially to provide considerations for AI development, implementation, and the overall structure needed to facilitate adoption. Conclusions: The literature review revealed that trust is a significant catalyst of adoption, and it was found to be impacted by several barriers identified in this review. A governance structure can be a key facilitator, among others, in ensuring all the elements identified as barriers are addressed appropriately. The findings demonstrate that the implementation of AI in health care is still, in many ways, dependent on the establishment of regulatory and legal frameworks. Further research into a combination of governance and implementation frameworks, models, or theories to enhance trust that would specifically enable adoption is needed to provide the necessary guidance to those translating AI research into practice. Future research could also be expanded to include attempts at understanding patients’ perspectives on complex, high-risk AI use cases and how the use of AI applications affects clinical practice and patient care, including sociotechnical considerations, as more algorithms are implemented in actual clinical environments. %M 39207831 %R 10.2196/48633 %U https://humanfactors.jmir.org/2024/1/e48633 %U https://doi.org/10.2196/48633 %U http://www.ncbi.nlm.nih.gov/pubmed/39207831 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e56628 %T Transforming Health Care Through Chatbots for Medical History-Taking and Future Directions: Comprehensive Systematic Review %A Hindelang,Michael %A Sitaru,Sebastian %A Zink,Alexander %+ Department of Dermatology and Allergy, TUM School of Medicine and Health, Technical University of Munich, Biedersteiner Straße 29, Munich, 80802, Germany, 49 894140 ext 3061, michael.hindelang@tum.de %K medical history-taking %K chatbots %K artificial intelligence %K natural language processing %K health care data collection %K patient engagement %K clinical decision-making %K usability %K acceptability %K systematic review %K diagnostic accuracy %K patient-doctor communication %K cybersecurity %K machine learning %K conversational agents %K health informatics %D 2024 %7 29.8.2024 %9 Review %J JMIR Med Inform %G English %X Background: The integration of artificial intelligence and chatbot technology in health care has attracted significant attention due to its potential to improve patient care and streamline history-taking. As artificial intelligence–driven conversational agents, chatbots offer the opportunity to revolutionize history-taking, necessitating a comprehensive examination of their impact on medical practice. Objective: This systematic review aims to assess the role, effectiveness, usability, and patient acceptance of chatbots in medical history–taking. It also examines potential challenges and future opportunities for integration into clinical practice. Methods: A systematic search included PubMed, Embase, MEDLINE (via Ovid), CENTRAL, Scopus, and Open Science and covered studies through July 2024. The inclusion and exclusion criteria for the studies reviewed were based on the PICOS (participants, interventions, comparators, outcomes, and study design) framework. The population included individuals using health care chatbots for medical history–taking. Interventions focused on chatbots designed to facilitate medical history–taking. The outcomes of interest were the feasibility, acceptance, and usability of chatbot-based medical history–taking. Studies not reporting on these outcomes were excluded. All study designs except conference papers were eligible for inclusion. Only English-language studies were considered. There were no specific restrictions on study duration. Key search terms included “chatbot*,” “conversational agent*,” “virtual assistant,” “artificial intelligence chatbot,” “medical history,” and “history-taking.” The quality of observational studies was classified using the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) criteria (eg, sample size, design, data collection, and follow-up). The RoB 2 (Risk of Bias) tool assessed areas and the levels of bias in randomized controlled trials (RCTs). Results: The review included 15 observational studies and 3 RCTs and synthesized evidence from different medical fields and populations. Chatbots systematically collect information through targeted queries and data retrieval, improving patient engagement and satisfaction. The results show that chatbots have great potential for history-taking and that the efficiency and accessibility of the health care system can be improved by 24/7 automated data collection. Bias assessments revealed that of the 15 observational studies, 5 (33%) studies were of high quality, 5 (33%) studies were of moderate quality, and 5 (33%) studies were of low quality. Of the RCTs, 2 had a low risk of bias, while 1 had a high risk. Conclusions: This systematic review provides critical insights into the potential benefits and challenges of using chatbots for medical history–taking. The included studies showed that chatbots can increase patient engagement, streamline data collection, and improve health care decision-making. For effective integration into clinical practice, it is crucial to design user-friendly interfaces, ensure robust data security, and maintain empathetic patient-physician interactions. Future research should focus on refining chatbot algorithms, improving their emotional intelligence, and extending their application to different health care settings to realize their full potential in modern medicine. Trial Registration: PROSPERO CRD42023410312; www.crd.york.ac.uk/prospero %M 39207827 %R 10.2196/56628 %U https://medinform.jmir.org/2024/1/e56628 %U https://doi.org/10.2196/56628 %U http://www.ncbi.nlm.nih.gov/pubmed/39207827 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54944 %T Combining Clinical-Radiomics Features With Machine Learning Methods for Building Models to Predict Postoperative Recurrence in Patients With Chronic Subdural Hematoma: Retrospective Cohort Study %A Fang,Cheng %A Ji,Xiao %A Pan,Yifeng %A Xie,Guanchao %A Zhang,Hongsheng %A Li,Sai %A Wan,Jinghai %+ Department of Neurosurgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 17 Nanli, Panjiayuan, Chaoyang District, Beijing, 100021, China, 86 13426261848, wanjinghai@sina.com %K chronic subdural hematoma %K convolutional neural network %K machine learning %K neurosurgery %K radiomics %K support vector machine %D 2024 %7 28.8.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Chronic subdural hematoma (CSDH) represents a prevalent medical condition, posing substantial challenges in postoperative management due to risks of recurrence. Such recurrences not only cause physical suffering to the patient but also add to the financial burden on the family and the health care system. Currently, prognosis determination largely depends on clinician expertise, revealing a dearth of precise prediction models in clinical settings. Objective: This study aims to use machine learning (ML) techniques for the construction of predictive models to assess the likelihood of CSDH recurrence after surgery, which leads to greater benefits for patients and the health care system. Methods: Data from 133 patients were amassed and partitioned into a training set (n=93) and a test set (n=40). Radiomics features were extracted from preoperative cranial computed tomography scans using 3D Slicer software. These features, in conjunction with clinical data and composite clinical-radiomics features, served as input variables for model development. Four distinct ML algorithms were used to build predictive models, and their performance was rigorously evaluated via accuracy, area under the curve (AUC), and recall metrics. The optimal model was identified, followed by recursive feature elimination for feature selection, leading to enhanced predictive efficacy. External validation was conducted using data sets from additional health care facilities. Results: Following rigorous experimental analysis, the support vector machine model, predicated on clinical-radiomics features, emerged as the most efficacious for predicting postoperative recurrence in patients with CSDH. Subsequent to feature selection, key variables exerting significant impact on the model were incorporated as the input set, thereby augmenting its predictive accuracy. The model demonstrated robust performance, with metrics including accuracy of 92.72%, AUC of 91.34%, and recall of 93.16%. External validation further substantiated its effectiveness, yielding an accuracy of 90.32%, AUC of 91.32%, and recall of 88.37%, affirming its clinical applicability. Conclusions: This study substantiates the feasibility and clinical relevance of an ML-based predictive model, using clinical-radiomics features, for relatively accurate prognostication of postoperative recurrence in patients with CSDH. If the model is integrated into clinical practice, it will be of great significance in enhancing the quality and efficiency of clinical decision-making processes, which can improve the accuracy of diagnosis and treatment, reduce unnecessary tests and surgeries, and reduce the waste of medical resources. %M 39197165 %R 10.2196/54944 %U https://www.jmir.org/2024/1/e54944 %U https://doi.org/10.2196/54944 %U http://www.ncbi.nlm.nih.gov/pubmed/39197165 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e57896 %T Current Status of ChatGPT Use in Medical Education: Potentials, Challenges, and Strategies %A Xu,Tianhui %A Weng,Huiting %A Liu,Fang %A Yang,Li %A Luo,Yuanyuan %A Ding,Ziwei %A Wang,Qin %+ Clinical Nursing Teaching and Research Section, The Second Xiangya Hospital of Central South University, 139 Middle Renmin Road, Changsha, 410011, China, 86 18774806226, wangqin3421@csu.edu.cn %K chat generative pretrained transformer %K ChatGPT %K artificial intelligence %K medical education %K natural language processing %K clinical practice %D 2024 %7 28.8.2024 %9 Viewpoint %J J Med Internet Res %G English %X ChatGPT, a generative pretrained transformer, has garnered global attention and sparked discussions since its introduction on November 30, 2022. However, it has generated controversy within the realms of medical education and scientific research. This paper examines the potential applications, limitations, and strategies for using ChatGPT. ChatGPT offers personalized learning support to medical students through its robust natural language generation capabilities, enabling it to furnish answers. Moreover, it has demonstrated significant use in simulating clinical scenarios, facilitating teaching and learning processes, and revitalizing medical education. Nonetheless, numerous challenges accompany these advancements. In the context of education, it is of paramount importance to prevent excessive reliance on ChatGPT and combat academic plagiarism. Likewise, in the field of medicine, it is vital to guarantee the timeliness, accuracy, and reliability of content generated by ChatGPT. Concurrently, ethical challenges and concerns regarding information security arise. In light of these challenges, this paper proposes targeted strategies for addressing them. First, the risk of overreliance on ChatGPT and academic plagiarism must be mitigated through ideological education, fostering comprehensive competencies, and implementing diverse evaluation criteria. The integration of contemporary pedagogical methodologies in conjunction with the use of ChatGPT serves to enhance the overall quality of medical education. To enhance the professionalism and reliability of the generated content, it is recommended to implement measures to optimize ChatGPT’s training data professionally and enhance the transparency of the generation process. This ensures that the generated content is aligned with the most recent standards of medical practice. Moreover, the enhancement of value alignment and the establishment of pertinent legislation or codes of practice address ethical concerns, including those pertaining to algorithmic discrimination, the allocation of medical responsibility, privacy, and security. In conclusion, while ChatGPT presents significant potential in medical education, it also encounters various challenges. Through comprehensive research and the implementation of suitable strategies, it is anticipated that ChatGPT’s positive impact on medical education will be harnessed, laying the groundwork for advancing the discipline and fostering the development of high-caliber medical professionals. %M 39196640 %R 10.2196/57896 %U https://www.jmir.org/2024/1/e57896 %U https://doi.org/10.2196/57896 %U http://www.ncbi.nlm.nih.gov/pubmed/39196640 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e52190 %T Traditional Machine Learning, Deep Learning, and BERT (Large Language Model) Approaches for Predicting Hospitalizations From Nurse Triage Notes: Comparative Evaluation of Resource Management %A Patel,Dhavalkumar %A Timsina,Prem %A Gorenstein,Larisa %A Glicksberg,Benjamin S %A Raut,Ganesh %A Cheetirala,Satya Narayan %A Santana,Fabio %A Tamegue,Jules %A Kia,Arash %A Zimlichman,Eyal %A Levin,Matthew A %A Freeman,Robert %A Klang,Eyal %+ Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, 2nd Floor, 150 East 42nd Street, New York, NY, 10017, United States, 1 (212) 523 5555, pateldhaval021@hotmail.com %K Bio-Clinical-BERT %K term frequency–inverse document frequency %K TF-IDF %K health informatics %K patient care %K hospital resource management %K care %K resource management %K management %K language model %K machine learning %K hospitalization %K deep learning %K logistic regression %K retrospective analysis %K training %K large language model %D 2024 %7 27.8.2024 %9 Original Paper %J JMIR AI %G English %X Background: Predicting hospitalization from nurse triage notes has the potential to augment care. However, there needs to be careful considerations for which models to choose for this goal. Specifically, health systems will have varying degrees of computational infrastructure available and budget constraints. Objective: To this end, we compared the performance of the deep learning, Bidirectional Encoder Representations from Transformers (BERT)–based model, Bio-Clinical-BERT, with a bag-of-words (BOW) logistic regression (LR) model incorporating term frequency–inverse document frequency (TF-IDF). These choices represent different levels of computational requirements. Methods: A retrospective analysis was conducted using data from 1,391,988 patients who visited emergency departments in the Mount Sinai Health System spanning from 2017 to 2022. The models were trained on 4 hospitals’ data and externally validated on a fifth hospital’s data. Results: The Bio-Clinical-BERT model achieved higher areas under the receiver operating characteristic curve (0.82, 0.84, and 0.85) compared to the BOW-LR-TF-IDF model (0.81, 0.83, and 0.84) across training sets of 10,000; 100,000; and ~1,000,000 patients, respectively. Notably, both models proved effective at using triage notes for prediction, despite the modest performance gap. Conclusions: Our findings suggest that simpler machine learning models such as BOW-LR-TF-IDF could serve adequately in resource-limited settings. Given the potential implications for patient care and hospital resource management, further exploration of alternative models and techniques is warranted to enhance predictive performance in this critical domain. International Registered Report Identifier (IRRID): RR2-10.1101/2023.08.07.23293699 %M 39190905 %R 10.2196/52190 %U https://ai.jmir.org/2024/1/e52190 %U https://doi.org/10.2196/52190 %U http://www.ncbi.nlm.nih.gov/pubmed/39190905 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54616 %T AI-Driven Diagnostic Assistance in Medical Inquiry: Reinforcement Learning Algorithm Development and Validation %A Zou,Xuan %A He,Weijie %A Huang,Yu %A Ouyang,Yi %A Zhang,Zhen %A Wu,Yu %A Wu,Yongsheng %A Feng,Lili %A Wu,Sheng %A Yang,Mengqi %A Chen,Xuyan %A Zheng,Yefeng %A Jiang,Rui %A Chen,Ting %+ Department of Computer Science and Technology, Tsinghua University, Room 3-609, Future Internet Technology Research Center, Tsinghua University, Beijing, 100084, China, 86 010 62797101, tingchen@tsinghua.edu.cn %K inquiry and diagnosis %K electronic health record %K reinforcement learning %K natural language processing %K artificial intelligence %D 2024 %7 23.8.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: For medical diagnosis, clinicians typically begin with a patient’s chief concerns, followed by questions about symptoms and medical history, physical examinations, and requests for necessary auxiliary examinations to gather comprehensive medical information. This complex medical investigation process has yet to be modeled by existing artificial intelligence (AI) methodologies. Objective: The aim of this study was to develop an AI-driven medical inquiry assistant for clinical diagnosis that provides inquiry recommendations by simulating clinicians’ medical investigating logic via reinforcement learning. Methods: We compiled multicenter, deidentified outpatient electronic health records from 76 hospitals in Shenzhen, China, spanning the period from July to November 2021. These records consisted of both unstructured textual information and structured laboratory test results. We first performed feature extraction and standardization using natural language processing techniques and then used a reinforcement learning actor-critic framework to explore the rational and effective inquiry logic. To align the inquiry process with actual clinical practice, we segmented the inquiry into 4 stages: inquiring about symptoms and medical history, conducting physical examinations, requesting auxiliary examinations, and terminating the inquiry with a diagnosis. External validation was conducted to validate the inquiry logic of the AI model. Results: This study focused on 2 retrospective inquiry-and-diagnosis tasks in the emergency and pediatrics departments. The emergency departments provided records of 339,020 consultations including mainly children (median age 5.2, IQR 2.6-26.1 years) with various types of upper respiratory tract infections (250,638/339,020, 73.93%). The pediatrics department provided records of 561,659 consultations, mainly of children (median age 3.8, IQR 2.0-5.7 years) with various types of upper respiratory tract infections (498,408/561,659, 88.73%). When conducting its own inquiries in both scenarios, the AI model demonstrated high diagnostic performance, with areas under the receiver operating characteristic curve of 0.955 (95% CI 0.953-0.956) and 0.943 (95% CI 0.941-0.944), respectively. When the AI model was used in a simulated collaboration with physicians, it notably reduced the average number of physicians’ inquiries to 46% (6.037/13.26; 95% CI 6.009-6.064) and 43% (6.245/14.364; 95% CI 6.225-6.269) while achieving areas under the receiver operating characteristic curve of 0.972 (95% CI 0.970-0.973) and 0.968 (95% CI 0.967-0.969) in the scenarios. External validation revealed a normalized Kendall τ distance of 0.323 (95% CI 0.301-0.346), indicating the inquiry consistency of the AI model with physicians. Conclusions: This retrospective analysis of predominantly respiratory pediatric presentations in emergency and pediatrics departments demonstrated that an AI-driven diagnostic assistant had high diagnostic performance both in stand-alone use and in simulated collaboration with clinicians. Its investigation process was found to be consistent with the clinicians’ medical investigation logic. These findings highlight the diagnostic assistant’s promise in assisting the decision-making processes of health care professionals. %M 39178403 %R 10.2196/54616 %U https://www.jmir.org/2024/1/e54616 %U https://doi.org/10.2196/54616 %U http://www.ncbi.nlm.nih.gov/pubmed/39178403 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e53662 %T Automated Interpretation of Lung Sounds by Deep Learning in Children With Asthma: Scoping Review and Strengths, Weaknesses, Opportunities, and Threats Analysis %A Ruchonnet-Métrailler,Isabelle %A Siebert,Johan N %A Hartley,Mary-Anne %A Lacroix,Laurence %+ Division of Pediatric Emergency Medicine, Department of Pediatrics, Geneva Children’s Hospital, Geneva University Hospitals, 47, Avenue de la Roseraie, Geneva, 1205, Switzerland, 41 (0)795534072, Johan.Siebert@hug.ch %K asthma %K wheezing disorders %K artificial intelligence %K deep learning %K machine learning %K respiratory sounds %K auscultation %K stethoscope %K pediatric %K mobile phone %D 2024 %7 23.8.2024 %9 Review %J J Med Internet Res %G English %X Background: The interpretation of lung sounds plays a crucial role in the appropriate diagnosis and management of pediatric asthma. Applying artificial intelligence (AI) to this task has the potential to better standardize assessment and may even improve its predictive potential. Objective: This study aims to objectively review the literature on AI-assisted lung auscultation for pediatric asthma and provide a balanced assessment of its strengths, weaknesses, opportunities, and threats. Methods: A scoping review on AI-assisted lung sound analysis in children with asthma was conducted across 4 major scientific databases (PubMed, MEDLINE Ovid, Embase, and Web of Science), supplemented by a gray literature search on Google Scholar, to identify relevant studies published from January 1, 2000, until May 23, 2023. The search strategy incorporated a combination of keywords related to AI, pulmonary auscultation, children, and asthma. The quality of eligible studies was assessed using the ChAMAI (Checklist for the Assessment of Medical Artificial Intelligence). Results: The search identified 7 relevant studies out of 82 (9%) to be included through an academic literature search, while 11 of 250 (4.4%) studies from the gray literature search were considered but not included in the subsequent review and quality assessment. All had poor to medium ChAMAI scores, mostly due to the absence of external validation. Identified strengths were improved predictive accuracy of AI to allow for prompt and early diagnosis, personalized management strategies, and remote monitoring capabilities. Weaknesses were the heterogeneity between studies and the lack of standardization in data collection and interpretation. Opportunities were the potential of coordinated surveillance, growing data sets, and new ways of collaboratively learning from distributed data. Threats were both generic for the field of medical AI (loss of interpretability) but also specific to the use case, as clinicians might lose the skill of auscultation. Conclusions: To achieve the opportunities of automated lung auscultation, there is a need to address weaknesses and threats with large-scale coordinated data collection in globally representative populations and leveraging new approaches to collaborative learning. %M 39178033 %R 10.2196/53662 %U https://www.jmir.org/2024/1/e53662 %U https://doi.org/10.2196/53662 %U http://www.ncbi.nlm.nih.gov/pubmed/39178033 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e50545 %T Integration of ChatGPT Into a Course for Medical Students: Explorative Study on Teaching Scenarios, Students’ Perception, and Applications %A Thomae,Anita V %A Witt,Claudia M %A Barth,Jürgen %K medical education %K ChatGPT %K artificial intelligence %K information for patients %K critical appraisal %K evaluation %K blended learning %K AI %K digital skills %K teaching %D 2024 %7 22.8.2024 %9 %J JMIR Med Educ %G English %X Background: Text-generating artificial intelligence (AI) such as ChatGPT offers many opportunities and challenges in medical education. Acquiring practical skills necessary for using AI in a clinical context is crucial, especially for medical education. Objective: This explorative study aimed to investigate the feasibility of integrating ChatGPT into teaching units and to evaluate the course and the importance of AI-related competencies for medical students. Since a possible application of ChatGPT in the medical field could be the generation of information for patients, we further investigated how such information is perceived by students in terms of persuasiveness and quality. Methods: ChatGPT was integrated into 3 different teaching units of a blended learning course for medical students. Using a mixed methods approach, quantitative and qualitative data were collected. As baseline data, we assessed students’ characteristics, including their openness to digital innovation. The students evaluated the integration of ChatGPT into the course and shared their thoughts regarding the future of text-generating AI in medical education. The course was evaluated based on the Kirkpatrick Model, with satisfaction, learning progress, and applicable knowledge considered as key assessment levels. In ChatGPT-integrating teaching units, students evaluated videos featuring information for patients regarding their persuasiveness on treatment expectations in a self-experience experiment and critically reviewed information for patients written using ChatGPT 3.5 based on different prompts. Results: A total of 52 medical students participated in the study. The comprehensive evaluation of the course revealed elevated levels of satisfaction, learning progress, and applicability specifically in relation to the ChatGPT-integrating teaching units. Furthermore, all evaluation levels demonstrated an association with each other. Higher openness to digital innovation was associated with higher satisfaction and, to a lesser extent, with higher applicability. AI-related competencies in other courses of the medical curriculum were perceived as highly important by medical students. Qualitative analysis highlighted potential use cases of ChatGPT in teaching and learning. In ChatGPT-integrating teaching units, students rated information for patients generated using a basic ChatGPT prompt as “moderate” in terms of comprehensibility, patient safety, and the correct application of communication rules taught during the course. The students’ ratings were considerably improved using an extended prompt. The same text, however, showed the smallest increase in treatment expectations when compared with information provided by humans (patient, clinician, and expert) via videos. Conclusions: This study offers valuable insights into integrating the development of AI competencies into a blended learning course. Integration of ChatGPT enhanced learning experiences for medical students. %R 10.2196/50545 %U https://mededu.jmir.org/2024/1/e50545 %U https://doi.org/10.2196/50545 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 11 %N %P e59560 %T Self-Administered Interventions Based on Natural Language Processing Models for Reducing Depressive and Anxiety Symptoms: Systematic Review and Meta-Analysis %A Villarreal-Zegarra,David %A Reategui-Rivera,C Mahony %A García-Serna,Jackeline %A Quispe-Callo,Gleni %A Lázaro-Cruz,Gabriel %A Centeno-Terrazas,Gianfranco %A Galvez-Arevalo,Ricardo %A Escobar-Agreda,Stefan %A Dominguez-Rodriguez,Alejandro %A Finkelstein,Joseph %+ Department of Biomedical Informatics, School of Medicine, University of Utah, 421 Wakara Way, Salt Lake City, UT, 84108, United States, 1 (801) 581 4080, mahony.reategui@utah.edu %K natural language processing %K depression %K anxiety %K systematic review %K artificial intelligence %K AI %D 2024 %7 21.8.2024 %9 Review %J JMIR Ment Health %G English %X Background: The introduction of natural language processing (NLP) technologies has significantly enhanced the potential of self-administered interventions for treating anxiety and depression by improving human-computer interactions. Although these advances, particularly in complex models such as generative artificial intelligence (AI), are highly promising, robust evidence validating the effectiveness of the interventions remains sparse. Objective: The aim of this study was to determine whether self-administered interventions based on NLP models can reduce depressive and anxiety symptoms. Methods: We conducted a systematic review and meta-analysis. We searched Web of Science, Scopus, MEDLINE, PsycINFO, IEEE Xplore, Embase, and Cochrane Library from inception to November 3, 2023. We included studies with participants of any age diagnosed with depression or anxiety through professional consultation or validated psychometric instruments. Interventions had to be self-administered and based on NLP models, with passive or active comparators. Outcomes measured included depressive and anxiety symptom scores. We included randomized controlled trials and quasi-experimental studies but excluded narrative, systematic, and scoping reviews. Data extraction was performed independently by pairs of authors using a predefined form. Meta-analysis was conducted using standardized mean differences (SMDs) and random effects models to account for heterogeneity. Results: In all, 21 articles were selected for review, of which 76% (16/21) were included in the meta-analysis for each outcome. Most of the studies (16/21, 76%) were recent (2020-2023), with interventions being mostly AI-based NLP models (11/21, 52%); most (19/21, 90%) delivered some form of therapy (primarily cognitive behavioral therapy: 16/19, 84%). The overall meta-analysis showed that self-administered interventions based on NLP models were significantly more effective in reducing both depressive (SMD 0.819, 95% CI 0.389-1.250; P<.001) and anxiety (SMD 0.272, 95% CI 0.116-0.428; P=.001) symptoms compared to various control conditions. Subgroup analysis indicated that AI-based NLP models were effective in reducing depressive symptoms (SMD 0.821, 95% CI 0.207-1.436; P<.001) compared to pooled control conditions. Rule-based NLP models showed effectiveness in reducing both depressive (SMD 0.854, 95% CI 0.172-1.537; P=.01) and anxiety (SMD 0.347, 95% CI 0.116-0.578; P=.003) symptoms. The meta-regression showed no significant association between participants’ mean age and treatment outcomes (all P>.05). Although the findings were positive, the overall certainty of evidence was very low, mainly due to a high risk of bias, heterogeneity, and potential publication bias. Conclusions: Our findings support the effectiveness of self-administered NLP-based interventions in alleviating depressive and anxiety symptoms, highlighting their potential to increase accessibility to, and reduce costs in, mental health care. Although the results were encouraging, the certainty of evidence was low, underscoring the need for further high-quality randomized controlled trials and studies examining implementation and usability. These interventions could become valuable components of public health strategies to address mental health issues. Trial Registration: PROSPERO International Prospective Register of Systematic Reviews CRD42023472120; https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42023472120 %M 39167795 %R 10.2196/59560 %U https://mental.jmir.org/2024/1/e59560 %U https://doi.org/10.2196/59560 %U http://www.ncbi.nlm.nih.gov/pubmed/39167795 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52730 %T Using Domain Adaptation and Inductive Transfer Learning to Improve Patient Outcome Prediction in the Intensive Care Unit: Retrospective Observational Study %A Mutnuri,Maruthi Kumar %A Stelfox,Henry Thomas %A Forkert,Nils Daniel %A Lee,Joon %+ Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, CWPH 5E17, 3280 Hospital Drive Northwest, Calgary, AB, T2N 4Z6, Canada, 1 4032202968, joon.lee@ucalgary.ca %K transfer learning %K patient outcome prediction %K intensive care %K deep learning %K electronic health record %D 2024 %7 21.8.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Accurate patient outcome prediction in the intensive care unit (ICU) can potentially lead to more effective and efficient patient care. Deep learning models are capable of learning from data to accurately predict patient outcomes, but they typically require large amounts of data and computational resources. Transfer learning (TL) can help in scenarios where data and computational resources are scarce by leveraging pretrained models. While TL has been widely used in medical imaging and natural language processing, it has been rare in electronic health record (EHR) analysis. Furthermore, domain adaptation (DA) has been the most common TL method in general, whereas inductive transfer learning (ITL) has been rare. To the best of our knowledge, DA and ITL have never been studied in-depth in the context of EHR-based ICU patient outcome prediction. Objective: This study investigated DA, as well as rarely researched ITL, in EHR-based ICU patient outcome prediction under simulated, varying levels of data scarcity. Methods: Two patient cohorts were used in this study: (1) eCritical, a multicenter ICU data from 55,689 unique admission records from 48,672 unique patients admitted to 15 medical-surgical ICUs in Alberta, Canada, between March 2013 and December 2019, and (2) Medical Information Mart for Intensive Care III, a single-center, publicly available ICU data set from Boston, Massachusetts, acquired between 2001 and 2012 containing 61,532 admission records from 46,476 patients. We compared DA and ITL models with baseline models (without TL) of fully connected neural networks, logistic regression, and lasso regression in the prediction of 30-day mortality, acute kidney injury, ICU length of stay, and hospital length of stay. Random subsets of training data, ranging from 1% to 75%, as well as the full data set, were used to compare the performances of DA and ITL with the baseline models at various levels of data scarcity. Results: Overall, the ITL models outperformed the baseline models in 55 of 56 comparisons (all P values <.001). The DA models outperformed the baseline models in 45 of 56 comparisons (all P values <.001). ITL resulted in better performance than DA in terms of the number of times and the margin with which it outperformed the baseline models. In 11 of 16 cases (8 of 8 for ITL and 3 of 8 for DA), TL models outperformed baseline models when trained using 1% data subset. Conclusions: TL-based ICU patient outcome prediction models are useful in data-scarce scenarios. The results of this study can be used to estimate ICU outcome prediction performance at different levels of data scarcity, with and without TL. The publicly available pretrained models from this study can serve as building blocks in further research for the development and validation of models in other ICU cohorts and outcomes. %M 39167442 %R 10.2196/52730 %U https://www.jmir.org/2024/1/e52730 %U https://doi.org/10.2196/52730 %U http://www.ncbi.nlm.nih.gov/pubmed/39167442 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e55820 %T Mitigating Sociodemographic Bias in Opioid Use Disorder Prediction: Fairness-Aware Machine Learning Framework %A Yaseliani,Mohammad %A Noor-E-Alam,Md %A Hasan,Md Mahmudul %+ Department of Pharmaceutical Outcomes and Policy, College of Pharmacy, University of Florida, Malachowsky Hall for Data Science & Information Technology, Suite 6300, 1889 Museum Rd, Gainesville, FL, 32611, United States, 1 352 273 6276, hasan.mdmahmudul@ufl.edu %K opioid use disorder %K fairness and bias %K bias mitigation %K machine learning %K majority voting %D 2024 %7 20.8.2024 %9 Original Paper %J JMIR AI %G English %X Background: Opioid use disorder (OUD) is a critical public health crisis in the United States, affecting >5.5 million Americans in 2021. Machine learning has been used to predict patient risk of incident OUD. However, little is known about the fairness and bias of these predictive models. Objective: The aims of this study are two-fold: (1) to develop a machine learning bias mitigation algorithm for sociodemographic features and (2) to develop a fairness-aware weighted majority voting (WMV) classifier for OUD prediction. Methods: We used the 2020 National Survey on Drug and Health data to develop a neural network (NN) model using stochastic gradient descent (SGD; NN-SGD) and an NN model using Adam (NN-Adam) optimizers and evaluated sociodemographic bias by comparing the area under the curve values. A bias mitigation algorithm, based on equality of odds, was implemented to minimize disparities in specificity and recall. Finally, a WMV classifier was developed for fairness-aware prediction of OUD. To further analyze bias detection and mitigation, we did a 1-N matching of OUD to non-OUD cases, controlling for socioeconomic variables, and evaluated the performance of the proposed bias mitigation algorithm and WMV classifier. Results: Our bias mitigation algorithm substantially reduced bias with NN-SGD, by 21.66% for sex, 1.48% for race, and 21.04% for income, and with NN-Adam by 16.96% for sex, 8.87% for marital status, 8.45% for working condition, and 41.62% for race. The fairness-aware WMV classifier achieved a recall of 85.37% and 92.68% and an accuracy of 58.85% and 90.21% using NN-SGD and NN-Adam, respectively. The results after matching also indicated remarkable bias reduction with NN-SGD and NN-Adam, respectively, as follows: sex (0.14% vs 0.97%), marital status (12.95% vs 10.33%), working condition (14.79% vs 15.33%), race (60.13% vs 41.71%), and income (0.35% vs 2.21%). Moreover, the fairness-aware WMV classifier achieved high performance with a recall of 100% and 85.37% and an accuracy of 73.20% and 89.38% using NN-SGD and NN-Adam, respectively. Conclusions: The application of the proposed bias mitigation algorithm shows promise in reducing sociodemographic bias, with the WMV classifier confirming bias reduction and high performance in OUD prediction. %M 39163597 %R 10.2196/55820 %U https://ai.jmir.org/2024/1/e55820 %U https://doi.org/10.2196/55820 %U http://www.ncbi.nlm.nih.gov/pubmed/39163597 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e57037 %T Integrating ChatGPT in Orthopedic Education for Medical Undergraduates: Randomized Controlled Trial %A Gan,Wenyi %A Ouyang,Jianfeng %A Li,Hua %A Xue,Zhaowen %A Zhang,Yiming %A Dong,Qiu %A Huang,Jiadong %A Zheng,Xiaofei %A Zhang,Yiyi %+ The First Clinical Medical College of Jinan University, The First Affiliated Hospital of Jinan University, No. 613, Huangpu Avenue West, Tianhe District, Guangzhou, 510630, China, 86 130 76855735, yiyizjun@126.com %K ChatGPT %K medical education %K orthopedics %K artificial intelligence %K large language model %K natural language processing %K randomized controlled trial %K learning aid %D 2024 %7 20.8.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: ChatGPT is a natural language processing model developed by OpenAI, which can be iteratively updated and optimized to accommodate the changing and complex requirements of human verbal communication. Objective: The study aimed to evaluate ChatGPT’s accuracy in answering orthopedics-related multiple-choice questions (MCQs) and assess its short-term effects as a learning aid through a randomized controlled trial. In addition, long-term effects on student performance in other subjects were measured using final examination results. Methods: We first evaluated ChatGPT’s accuracy in answering MCQs pertaining to orthopedics across various question formats. Then, 129 undergraduate medical students participated in a randomized controlled study in which the ChatGPT group used ChatGPT as a learning tool, while the control group was prohibited from using artificial intelligence software to support learning. Following a 2-week intervention, the 2 groups’ understanding of orthopedics was assessed by an orthopedics test, and variations in the 2 groups’ performance in other disciplines were noted through a follow-up at the end of the semester. Results: ChatGPT-4.0 answered 1051 orthopedics-related MCQs with a 70.60% (742/1051) accuracy rate, including 71.8% (237/330) accuracy for A1 MCQs, 73.7% (330/448) accuracy for A2 MCQs, 70.2% (92/131) accuracy for A3/4 MCQs, and 58.5% (83/142) accuracy for case analysis MCQs. As of April 7, 2023, a total of 129 individuals participated in the experiment. However, 19 individuals withdrew from the experiment at various phases; thus, as of July 1, 2023, a total of 110 individuals accomplished the trial and completed all follow-up work. After we intervened in the learning style of the students in the short term, the ChatGPT group answered more questions correctly than the control group (ChatGPT group: mean 141.20, SD 26.68; control group: mean 130.80, SD 25.56; P=.04) in the orthopedics test, particularly on A1 (ChatGPT group: mean 46.57, SD 8.52; control group: mean 42.18, SD 9.43; P=.01), A2 (ChatGPT group: mean 60.59, SD 10.58; control group: mean 56.66, SD 9.91; P=.047), and A3/4 MCQs (ChatGPT group: mean 19.57, SD 5.48; control group: mean 16.46, SD 4.58; P=.002). At the end of the semester, we found that the ChatGPT group performed better on final examinations in surgery (ChatGPT group: mean 76.54, SD 9.79; control group: mean 72.54, SD 8.11; P=.02) and obstetrics and gynecology (ChatGPT group: mean 75.98, SD 8.94; control group: mean 72.54, SD 8.66; P=.04) than the control group. Conclusions: ChatGPT answers orthopedics-related MCQs accurately, and students using it excel in both short-term and long-term assessments. Our findings strongly support ChatGPT’s integration into medical education, enhancing contemporary instructional methods. Trial Registration: Chinese Clinical Trial Registry Chictr2300071774; https://www.chictr.org.cn/hvshowproject.html ?id=225740&v=1.0 %M 39163598 %R 10.2196/57037 %U https://www.jmir.org/2024/1/e57037 %U https://doi.org/10.2196/57037 %U http://www.ncbi.nlm.nih.gov/pubmed/39163598 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e48320 %T The Use of Deep Learning and Machine Learning on Longitudinal Electronic Health Records for the Early Detection and Prevention of Diseases: Scoping Review %A Swinckels,Laura %A Bennis,Frank C %A Ziesemer,Kirsten A %A Scheerman,Janneke F M %A Bijwaard,Harmen %A de Keijzer,Ander %A Bruers,Josef Jan %+ Department of Oral Public Health, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and Vrije Universiteit, Gustav Mahlerlaan 3004, Amsterdam, 1081 LA, Netherlands, 31 205980308, L.Swinckels@acta.nl %K artificial intelligence %K big data %K detection %K electronic health records %K machine learning %K personalized health care %K prediction %K prevention %D 2024 %7 20.8.2024 %9 Review %J J Med Internet Res %G English %X Background: Electronic health records (EHRs) contain patients’ health information over time, including possible early indicators of disease. However, the increasing amount of data hinders clinicians from using them. There is accumulating evidence suggesting that machine learning (ML) and deep learning (DL) can assist clinicians in analyzing these large-scale EHRs, as algorithms thrive on high volumes of data. Although ML has become well developed, studies mainly focus on engineering but lack medical outcomes. Objective: This study aims for a scoping review of the evidence on how the use of ML on longitudinal EHRs can support the early detection and prevention of disease. The medical insights and clinical benefits that have been generated were investigated by reviewing applications in a variety of diseases. Methods: This study was conducted according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. A literature search was performed in 2022 in collaboration with a medical information specialist in the following databases: PubMed, Embase, Web of Science Core Collection (Clarivate Analytics), and IEEE Xplore Digital Library and computer science bibliography. Studies were eligible when longitudinal EHRs were used that aimed for the early detection of disease via ML in a prevention context. Studies with a technical focus or using imaging or hospital admission data were beyond the scope of this review. Study screening and selection and data extraction were performed independently by 2 researchers. Results: In total, 20 studies were included, mainly published between 2018 and 2022. They showed that a variety of diseases could be detected or predicted, particularly diabetes; kidney diseases; diseases of the circulatory system; and mental, behavioral, and neurodevelopmental disorders. Demographics, symptoms, procedures, laboratory test results, diagnoses, medications, and BMI were frequently used EHR data in basic recurrent neural network or long short-term memory techniques. By developing and comparing ML and DL models, medical insights such as a high diagnostic performance, an earlier detection, the most important predictors, and additional health indicators were obtained. A clinical benefit that has been evaluated positively was preliminary screening. If these models are applied in practice, patients might also benefit from personalized health care and prevention, with practical benefits such as workload reduction and policy insights. Conclusions: Longitudinal EHRs proved to be helpful for support in health care. Current ML models on EHRs can support the detection of diseases in terms of accuracy and offer preliminary screening benefits. Regarding the prevention of diseases, ML and specifically DL models can accurately predict or detect diseases earlier than current clinical diagnoses. Adding personally responsible factors allows targeted prevention interventions. While ML models based on textual EHRs are still in the developmental stage, they have high potential to support clinicians and the health care system and improve patient outcomes. %M 39163096 %R 10.2196/48320 %U https://www.jmir.org/2024/1/e48320 %U https://doi.org/10.2196/48320 %U http://www.ncbi.nlm.nih.gov/pubmed/39163096 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52758 %T Human-Comparable Sensitivity of Large Language Models in Identifying Eligible Studies Through Title and Abstract Screening: 3-Layer Strategy Using GPT-3.5 and GPT-4 for Systematic Reviews %A Matsui,Kentaro %A Utsumi,Tomohiro %A Aoki,Yumi %A Maruki,Taku %A Takeshima,Masahiro %A Takaesu,Yoshikazu %+ Department of Neuropsychiatry, Graduate School of Medicine, University of the Ryukyus, 207 Uehara, Nishihara, Okinawa, 903-0215, Japan, 81 98 895 3331, takaesuy@med.u-ryukyu.ac.jp %K systematic review %K screening %K GPT-3.5 %K GPT-4 %K language model %K information science %K library science %K artificial intelligence %K prompt engineering %K meta-analysis %D 2024 %7 16.8.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: The screening process for systematic reviews is resource-intensive. Although previous machine learning solutions have reported reductions in workload, they risked excluding relevant papers. Objective: We evaluated the performance of a 3-layer screening method using GPT-3.5 and GPT-4 to streamline the title and abstract-screening process for systematic reviews. Our goal is to develop a screening method that maximizes sensitivity for identifying relevant records. Methods: We conducted screenings on 2 of our previous systematic reviews related to the treatment of bipolar disorder, with 1381 records from the first review and 3146 from the second. Screenings were conducted using GPT-3.5 (gpt-3.5-turbo-0125) and GPT-4 (gpt-4-0125-preview) across three layers: (1) research design, (2) target patients, and (3) interventions and controls. The 3-layer screening was conducted using prompts tailored to each study. During this process, information extraction according to each study’s inclusion criteria and optimization for screening were carried out using a GPT-4–based flow without manual adjustments. Records were evaluated at each layer, and those meeting the inclusion criteria at all layers were subsequently judged as included. Results: On each layer, both GPT-3.5 and GPT-4 were able to process about 110 records per minute, and the total time required for screening the first and second studies was approximately 1 hour and 2 hours, respectively. In the first study, the sensitivities/specificities of the GPT-3.5 and GPT-4 were 0.900/0.709 and 0.806/0.996, respectively. Both screenings by GPT-3.5 and GPT-4 judged all 6 records used for the meta-analysis as included. In the second study, the sensitivities/specificities of the GPT-3.5 and GPT-4 were 0.958/0.116 and 0.875/0.855, respectively. The sensitivities for the relevant records align with those of human evaluators: 0.867-1.000 for the first study and 0.776-0.979 for the second study. Both screenings by GPT-3.5 and GPT-4 judged all 9 records used for the meta-analysis as included. After accounting for justifiably excluded records by GPT-4, the sensitivities/specificities of the GPT-4 screening were 0.962/0.996 in the first study and 0.943/0.855 in the second study. Further investigation indicated that the cases incorrectly excluded by GPT-3.5 were due to a lack of domain knowledge, while the cases incorrectly excluded by GPT-4 were due to misinterpretations of the inclusion criteria. Conclusions: Our 3-layer screening method with GPT-4 demonstrated acceptable level of sensitivity and specificity that supports its practical application in systematic review screenings. Future research should aim to generalize this approach and explore its effectiveness in diverse settings, both medical and nonmedical, to fully establish its use and operational feasibility. %M 39151163 %R 10.2196/52758 %U https://www.jmir.org/2024/1/e52758 %U https://doi.org/10.2196/52758 %U http://www.ncbi.nlm.nih.gov/pubmed/39151163 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e59213 %T A Language Model–Powered Simulated Patient With Automated Feedback for History Taking: Prospective Study %A Holderried,Friederike %A Stegemann-Philipps,Christian %A Herrmann-Werner,Anne %A Festl-Wietek,Teresa %A Holderried,Martin %A Eickhoff,Carsten %A Mahling,Moritz %+ Tübingen Institute for Medical Education (TIME), Medical Faculty, University of Tübingen, Elfriede-Aulhorn-Strasse 10, Tübingen, 72076, Germany, 49 707129 ext 73688, friederike.holderried@med.uni-tuebingen.de %K virtual patients communication %K communication skills %K technology enhanced education %K TEL %K medical education %K ChatGPT %K GPT: LLM %K LLMs %K NLP %K natural language processing %K machine learning %K artificial intelligence %K language model %K language models %K communication %K relationship %K relationships %K chatbot %K chatbots %K conversational agent %K conversational agents %K history %K histories %K simulated %K student %K students %K interaction %K interactions %D 2024 %7 16.8.2024 %9 Original Paper %J JMIR Med Educ %G English %X Background: Although history taking is fundamental for diagnosing medical conditions, teaching and providing feedback on the skill can be challenging due to resource constraints. Virtual simulated patients and web-based chatbots have thus emerged as educational tools, with recent advancements in artificial intelligence (AI) such as large language models (LLMs) enhancing their realism and potential to provide feedback. Objective: In our study, we aimed to evaluate the effectiveness of a Generative Pretrained Transformer (GPT) 4 model to provide structured feedback on medical students’ performance in history taking with a simulated patient. Methods: We conducted a prospective study involving medical students performing history taking with a GPT-powered chatbot. To that end, we designed a chatbot to simulate patients’ responses and provide immediate feedback on the comprehensiveness of the students’ history taking. Students’ interactions with the chatbot were analyzed, and feedback from the chatbot was compared with feedback from a human rater. We measured interrater reliability and performed a descriptive analysis to assess the quality of feedback. Results: Most of the study’s participants were in their third year of medical school. A total of 1894 question-answer pairs from 106 conversations were included in our analysis. GPT-4’s role-play and responses were medically plausible in more than 99% of cases. Interrater reliability between GPT-4 and the human rater showed “almost perfect” agreement (Cohen κ=0.832). Less agreement (κ<0.6) detected for 8 out of 45 feedback categories highlighted topics about which the model’s assessments were overly specific or diverged from human judgement. Conclusions: The GPT model was effective in providing structured feedback on history-taking dialogs provided by medical students. Although we unraveled some limitations regarding the specificity of feedback for certain feedback categories, the overall high agreement with human raters suggests that LLMs can be a valuable tool for medical education. Our findings, thus, advocate the careful integration of AI-driven feedback mechanisms in medical training and highlight important aspects when LLMs are used in that context. %M 39150749 %R 10.2196/59213 %U https://mededu.jmir.org/2024/1/e59213 %U https://doi.org/10.2196/59213 %U http://www.ncbi.nlm.nih.gov/pubmed/39150749 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e48594 %T Reforming China’s Secondary Vocational Medical Education: Adapting to the Challenges and Opportunities of the AI Era %A Tong,Wenting %A Zhang,Xiaowen %A Zeng,Haiping %A Pan,Jianping %A Gong,Chao %A Zhang,Hui %K secondary vocational medical education %K artificial intelligence %K practical skills %K critical thinking %K AI %D 2024 %7 15.8.2024 %9 %J JMIR Med Educ %G English %X China’s secondary vocational medical education is essential for training primary health care personnel and enhancing public health responses. This education system currently faces challenges, primarily due to its emphasis on knowledge acquisition that overshadows the development and application of skills, especially in the context of emerging artificial intelligence (AI) technologies. This article delves into the impact of AI on medical practices and uses this analysis to suggest reforms for the vocational medical education system in China. AI is found to significantly enhance diagnostic capabilities, therapeutic decision-making, and patient management. However, it also brings about concerns such as potential job losses and necessitates the adaptation of medical professionals to new technologies. Proposed reforms include a greater focus on critical thinking, hands-on experiences, skill development, medical ethics, and integrating humanities and AI into the curriculum. These reforms require ongoing evaluation and sustained research to effectively prepare medical students for future challenges in the field. %R 10.2196/48594 %U https://mededu.jmir.org/2024/1/e48594 %U https://doi.org/10.2196/48594 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52506 %T Impact of Gold-Standard Label Errors on Evaluating Performance of Deep Learning Models in Diabetic Retinopathy Screening: Nationwide Real-World Validation Study %A Wang,Yueye %A Han,Xiaotong %A Li,Cong %A Luo,Lixia %A Yin,Qiuxia %A Zhang,Jian %A Peng,Guankai %A Shi,Danli %A He,Mingguang %+ School of Optometry, The Hong Kong Polytechnic University, Hung Hom, Kowloon, China (Hong Kong), 852 34002795, mingguang.he@polyu.edu.hk %K artificial intelligence %K diabetic retinopathy %K diabetes %K real world %K deep learning %D 2024 %7 14.8.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: For medical artificial intelligence (AI) training and validation, human expert labels are considered the gold standard that represents the correct answers or desired outputs for a given data set. These labels serve as a reference or benchmark against which the model’s predictions are compared. Objective: This study aimed to assess the accuracy of a custom deep learning (DL) algorithm on classifying diabetic retinopathy (DR) and further demonstrate how label errors may contribute to this assessment in a nationwide DR-screening program. Methods: Fundus photographs from the Lifeline Express, a nationwide DR-screening program, were analyzed to identify the presence of referable DR using both (1) manual grading by National Health Service England–certificated graders and (2) a DL-based DR-screening algorithm with validated good lab performance. To assess the accuracy of labels, a random sample of images with disagreement between the DL algorithm and the labels was adjudicated by ophthalmologists who were masked to the previous grading results. The error rates of labels in this sample were then used to correct the number of negative and positive cases in the entire data set, serving as postcorrection labels. The DL algorithm’s performance was evaluated against both pre- and postcorrection labels. Results: The analysis included 736,083 images from 237,824 participants. The DL algorithm exhibited a gap between the real-world performance and the lab-reported performance in this nationwide data set, with a sensitivity increase of 12.5% (from 79.6% to 92.5%, P<.001) and a specificity increase of 6.9% (from 91.6% to 98.5%, P<.001). In the random sample, 63.6% (560/880) of negative images and 5.2% (140/2710) of positive images were misclassified in the precorrection human labels. High myopia was the primary reason for misclassifying non-DR images as referable DR images, while laser spots were predominantly responsible for misclassified referable cases. The estimated label error rate for the entire data set was 1.2%. The label correction was estimated to bring about a 12.5% enhancement in the estimated sensitivity of the DL algorithm (P<.001). Conclusions: Label errors based on human image grading, although in a small percentage, can significantly affect the performance evaluation of DL algorithms in real-world DR screening. %M 39141915 %R 10.2196/52506 %U https://www.jmir.org/2024/1/e52506 %U https://doi.org/10.2196/52506 %U http://www.ncbi.nlm.nih.gov/pubmed/39141915 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 7 %N %P e55204 %T Readability of Information Generated by ChatGPT for Hidradenitis Suppurativa %A Gawey,Lauren %A Dagenet,Caitlyn B %A Tran,Khiem A %A Park,Sarah %A Hsiao,Jennifer L %A Shi,Vivian %+ Department of Dermatology, University of Arkansas for Medical Sciences, 4301 W Markham St, #576, Little Rock, AR, 72205, United States, 1 8148022747, vivian.shi.publications@gmail.com %K hidradenitis suppurativa %K ChatGPT %K Chat-GPT %K chatbot %K chatbots %K chat-bot %K chat-bots %K machine learning %K ML %K artificial intelligence %K AI %K algorithm %K algorithms %K predictive model %K predictive models %K predictive analytics %K predictive system %K practical model %K practical models %K deep learning %K patient resources %K readability %D 2024 %7 14.8.2024 %9 Research Letter %J JMIR Dermatol %G English %X %M 39141908 %R 10.2196/55204 %U https://derma.jmir.org/2024/1/e55204 %U https://doi.org/10.2196/55204 %U http://www.ncbi.nlm.nih.gov/pubmed/39141908 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54556 %T Leadership for AI Transformation in Health Care Organization: Scoping Review %A Sriharan,Abi %A Sekercioglu,Nigar %A Mitchell,Cheryl %A Senkaiahliyan,Senthujan %A Hertelendy,Attila %A Porter,Tracy %A Banaszak-Holl,Jane %+ Krembil Centre for Health Management and Leadership, Schulich School of Business, York University, MB Room G315, 4700 Keele St, Toronto, ON, M3J 1P3, Canada, 1 3658855898, abisri@yorku.ca %K AI implementation %K innovation %K health care %K leadership %K AI %K artificial intelligence %K management %K organization %K health care organization %K strategy %D 2024 %7 14.8.2024 %9 Review %J J Med Internet Res %G English %X Background: The leaders of health care organizations are grappling with rising expenses and surging demands for health services. In response, they are increasingly embracing artificial intelligence (AI) technologies to improve patient care delivery, alleviate operational burdens, and efficiently improve health care safety and quality. Objective: In this paper, we map the current literature and synthesize insights on the role of leadership in driving AI transformation within health care organizations. Methods: We conducted a comprehensive search across several databases, including MEDLINE (via Ovid), PsycINFO (via Ovid), CINAHL (via EBSCO), Business Source Premier (via EBSCO), and Canadian Business & Current Affairs (via ProQuest), spanning articles published from 2015 to June 2023 discussing AI transformation within the health care sector. Specifically, we focused on empirical studies with a particular emphasis on leadership. We used an inductive, thematic analysis approach to qualitatively map the evidence. The findings were reported in accordance with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analysis extension for Scoping Reviews) guidelines. Results: A comprehensive review of 2813 unique abstracts led to the retrieval of 97 full-text articles, with 22 included for detailed assessment. Our literature mapping reveals that successful AI integration within healthcare organizations requires leadership engagement across technological, strategic, operational, and organizational domains. Leaders must demonstrate a blend of technical expertise, adaptive strategies, and strong interpersonal skills to navigate the dynamic healthcare landscape shaped by complex regulatory, technological, and organizational factors. Conclusions: In conclusion, leading AI transformation in healthcare requires a multidimensional approach, with leadership across technological, strategic, operational, and organizational domains. Organizations should implement a comprehensive leadership development strategy, including targeted training and cross-functional collaboration, to equip leaders with the skills needed for AI integration. Additionally, when upskilling or recruiting AI talent, priority should be given to individuals with a strong mix of technical expertise, adaptive capacity, and interpersonal acumen, enabling them to navigate the unique complexities of the healthcare environment. %M 39009038 %R 10.2196/54556 %U https://www.jmir.org/2024/1/e54556 %U https://doi.org/10.2196/54556 %U http://www.ncbi.nlm.nih.gov/pubmed/39009038 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e54371 %T Evaluation of Generative Language Models in Personalizing Medical Information: Instrument Validation Study %A Spina,Aidin %A Andalib,Saman %A Flores,Daniel %A Vermani,Rishi %A Halaseh,Faris F %A Nelson,Ariana M %+ School of Medicine, University of California, Irvine, 1001 Health Sciences Road, Irvine, CA, 92617, United States, 1 949 290 8347, acspina@hs.uci.edu %K generative language model %K GLM %K artificial intelligence %K AI %K low health literacy %K LHL %K readability %K GLMs %K language model %K language models %K health literacy %K understandable %K understandability %K knowledge translation %K comprehension %K generative %K NLP %K natural language processing %K reading level %K reading levels %K education %K medical text %K medical texts %K medical information %K health information %D 2024 %7 13.8.2024 %9 Original Paper %J JMIR AI %G English %X Background: Although uncertainties exist regarding implementation, artificial intelligence–driven generative language models (GLMs) have enormous potential in medicine. Deployment of GLMs could improve patient comprehension of clinical texts and improve low health literacy. Objective: The goal of this study is to evaluate the potential of ChatGPT-3.5 and GPT-4 to tailor the complexity of medical information to patient-specific input education level, which is crucial if it is to serve as a tool in addressing low health literacy. Methods: Input templates related to 2 prevalent chronic diseases—type II diabetes and hypertension—were designed. Each clinical vignette was adjusted for hypothetical patient education levels to evaluate output personalization. To assess the success of a GLM (GPT-3.5 and GPT-4) in tailoring output writing, the readability of pre- and posttransformation outputs were quantified using the Flesch reading ease score (FKRE) and the Flesch-Kincaid grade level (FKGL). Results: Responses (n=80) were generated using GPT-3.5 and GPT-4 across 2 clinical vignettes. For GPT-3.5, FKRE means were 57.75 (SD 4.75), 51.28 (SD 5.14), 32.28 (SD 4.52), and 28.31 (SD 5.22) for 6th grade, 8th grade, high school, and bachelor’s, respectively; FKGL mean scores were 9.08 (SD 0.90), 10.27 (SD 1.06), 13.4 (SD 0.80), and 13.74 (SD 1.18). GPT-3.5 only aligned with the prespecified education levels at the bachelor’s degree. Conversely, GPT-4’s FKRE mean scores were 74.54 (SD 2.6), 71.25 (SD 4.96), 47.61 (SD 6.13), and 13.71 (SD 5.77), with FKGL mean scores of 6.3 (SD 0.73), 6.7 (SD 1.11), 11.09 (SD 1.26), and 17.03 (SD 1.11) for the same respective education levels. GPT-4 met the target readability for all groups except the 6th-grade FKRE average. Both GLMs produced outputs with statistically significant differences (P<.001; 8th grade P<.001; high school P<.001; bachelors P=.003; FKGL: 6th grade P=.001; 8th grade P<.001; high school P<.001; bachelors P<.001) between mean FKRE and FKGL across input education levels. Conclusions: GLMs can change the structure and readability of medical text outputs according to input-specified education. However, GLMs categorize input education designation into 3 broad tiers of output readability: easy (6th and 8th grade), medium (high school), and difficult (bachelor’s degree). This is the first result to suggest that there are broader boundaries in the success of GLMs in output text simplification. Future research must establish how GLMs can reliably personalize medical texts to prespecified education levels to enable a broader impact on health care literacy. %M 39137416 %R 10.2196/54371 %U https://ai.jmir.org/2024/1/e54371 %U https://doi.org/10.2196/54371 %U http://www.ncbi.nlm.nih.gov/pubmed/39137416 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e51757 %T Understanding Health Care Students’ Perceptions, Beliefs, and Attitudes Toward AI-Powered Language Models: Cross-Sectional Study %A Cherrez-Ojeda,Ivan %A Gallardo-Bastidas,Juan C %A Robles-Velasco,Karla %A Osorio,María F %A Velez Leon,Eleonor Maria %A Leon Velastegui,Manuel %A Pauletto,Patrícia %A Aguilar-Díaz,F C %A Squassi,Aldo %A González Eras,Susana Patricia %A Cordero Carrasco,Erita %A Chavez Gonzalez,Karol Leonor %A Calderon,Juan C %A Bousquet,Jean %A Bedbrook,Anna %A Faytong-Haro,Marco %+ Universidad Espiritu Santo, Km. 2.5 via Samborondon, Samborondon, 0901952, Ecuador, 593 999981769, ivancherrez@gmail.com %K artificial intelligence %K ChatGPT %K education %K health care %K students %D 2024 %7 13.8.2024 %9 Original Paper %J JMIR Med Educ %G English %X Background: ChatGPT was not intended for use in health care, but it has potential benefits that depend on end-user understanding and acceptability, which is where health care students become crucial. There is still a limited amount of research in this area. Objective: The primary aim of our study was to assess the frequency of ChatGPT use, the perceived level of knowledge, the perceived risks associated with its use, and the ethical issues, as well as attitudes toward the use of ChatGPT in the context of education in the field of health. In addition, we aimed to examine whether there were differences across groups based on demographic variables. The second part of the study aimed to assess the association between the frequency of use, the level of perceived knowledge, the level of risk perception, and the level of perception of ethics as predictive factors for participants’ attitudes toward the use of ChatGPT. Methods: A cross-sectional survey was conducted from May to June 2023 encompassing students of medicine, nursing, dentistry, nutrition, and laboratory science across the Americas. The study used descriptive analysis, chi-square tests, and ANOVA to assess statistical significance across different categories. The study used several ordinal logistic regression models to analyze the impact of predictive factors (frequency of use, perception of knowledge, perception of risk, and ethics perception scores) on attitude as the dependent variable. The models were adjusted for gender, institution type, major, and country. Stata was used to conduct all the analyses. Results: Of 2661 health care students, 42.99% (n=1144) were unaware of ChatGPT. The median score of knowledge was “minimal” (median 2.00, IQR 1.00-3.00). Most respondents (median 2.61, IQR 2.11-3.11) regarded ChatGPT as neither ethical nor unethical. Most participants (median 3.89, IQR 3.44-4.34) “somewhat agreed” that ChatGPT (1) benefits health care settings, (2) provides trustworthy data, (3) is a helpful tool for clinical and educational medical information access, and (4) makes the work easier. In total, 70% (7/10) of people used it for homework. As the perceived knowledge of ChatGPT increased, there was a stronger tendency with regard to having a favorable attitude toward ChatGPT. Higher ethical consideration perception ratings increased the likelihood of considering ChatGPT as a source of trustworthy health care information (odds ratio [OR] 1.620, 95% CI 1.498-1.752), beneficial in medical issues (OR 1.495, 95% CI 1.452-1.539), and useful for medical literature (OR 1.494, 95% CI 1.426-1.564; P<.001 for all results). Conclusions: Over 40% of American health care students (1144/2661, 42.99%) were unaware of ChatGPT despite its extensive use in the health field. Our data revealed the positive attitudes toward ChatGPT and the desire to learn more about it. Medical educators must explore how chatbots may be included in undergraduate health care education programs. %M 39137029 %R 10.2196/51757 %U https://mededu.jmir.org/2024/1/e51757 %U https://doi.org/10.2196/51757 %U http://www.ncbi.nlm.nih.gov/pubmed/39137029 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e59975 %T Ameliorating Racial Disparities in HIV Prevention via a Nurse-Led, AI-Enhanced Program for Pre-Exposure Prophylaxis Utilization Among Black Cisgender Women: Protocol for a Mixed Methods Study %A Zhang,Chen %A Wharton,Mitchell %A Liu,Yu %+ School of Nursing, University of Rochester, 255 Crittenden Boulevard, Hellen Wood Hall, Room 2w-218, Rochester, NY, 14622, United States, 1 5852766495, chen_zhang@urmc.rochester.edu %K artificial intelligence %K PrEP care %K PrEP %K pre-exposure prophylaxis %K nurse-led %K AI %K HIV prevention %K HIV %K prevention %K AIDS %K nurse %K Black cisgender women %K Black %K cisgender %K women %K HIV pre-exposure prophylaxis %K prophylaxis %K biomedical %K effectiveness %K medical mistrust %K Black women %K nurse practitioners %K chatbot %K socioeconomic %K HumanX technology %K health care interventions %D 2024 %7 13.8.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: HIV pre-exposure prophylaxis (PrEP) is a critical biomedical strategy to prevent HIV transmission among cisgender women. Despite its proven effectiveness, Black cisgender women remain significantly underrepresented throughout the PrEP care continuum, facing barriers such as limited access to care, medical mistrust, and intersectional racial or HIV stigma. Addressing these disparities is vital to improving HIV prevention outcomes within this community. On the other hand, nurse practitioners (NPs) play a pivotal role in PrEP utilization but are underrepresented due to a lack of awareness, a lack of human resources, and insufficient support. Equipped with the rapid evolution of artificial intelligence (AI) and advanced large language models, chatbots effectively facilitate health care communication and linkage to care in various domains, including HIV prevention and PrEP care. Objective: Our study harnesses NPs’ holistic care capabilities and the power of AI through natural language processing algorithms, providing targeted, patient-centered facilitation for PrEP care. Our overarching goal is to create a nurse-led, stakeholder-inclusive, and AI-powered program to facilitate PrEP utilization among Black cisgender women, ultimately enhancing HIV prevention efforts in this vulnerable group in 3 phases. This project aims to mitigate health disparities and advance innovative, technology-based solutions. Methods: The study uses a mixed methods design involving semistructured interviews with key stakeholders, including 50 PrEP-eligible Black women, 10 NPs, and a community advisory board representing various socioeconomic backgrounds. The AI-powered chatbot is developed using HumanX technology and SmartBot360’s Health Insurance Portability and Accountability Act–compliant framework to ensure data privacy and security. The study spans 18 months and consists of 3 phases: exploration, development, and evaluation. Results: As of May 2024, the institutional review board protocol for phase 1 has been approved. We plan to start recruitment for Black cisgender women and NPs in September 2024, with the aim to collect information to understand their preferences regarding chatbot development. While institutional review board approval for phases 2 and 3 is still in progress, we have made significant strides in networking for participant recruitment. We plan to conduct data collection soon, and further updates on the recruitment and data collection progress will be provided as the study advances. Conclusions: The AI-powered chatbot offers a novel approach to improving PrEP care utilization among Black cisgender women, with opportunities to reduce barriers to care and facilitate a stigma-free environment. However, challenges remain regarding health equity and the digital divide, emphasizing the need for culturally competent design and robust data privacy protocols. The implications of this study extend beyond PrEP care, presenting a scalable model that can address broader health disparities. International Registered Report Identifier (IRRID): PRR1-10.2196/59975 %M 39137028 %R 10.2196/59975 %U https://www.researchprotocols.org/2024/1/e59975 %U https://doi.org/10.2196/59975 %U http://www.ncbi.nlm.nih.gov/pubmed/39137028 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 13 %N %P e53672 %T Debate and Dilemmas Regarding Generative AI in Mental Health Care: Scoping Review %A Xian,Xuechang %A Chang,Angela %A Xiang,Yu-Tao %A Liu,Matthew Tingchi %+ Department of Communication, Faculty of Social Sciences, University of Macau, University Avenue, Taipa, Macau SAR, 999078, China, 86 88228991, wychang@um.edu.mo %K generative artificial intelligence %K GAI %K ChatGPT %K mental health %K scoping review %K artificial intelligence %K depression %K anxiety %K generative adversarial network %K GAN %K variational autoencoder %K VAE %D 2024 %7 12.8.2024 %9 Review %J Interact J Med Res %G English %X Background: Mental disorders have ranked among the top 10 prevalent causes of burden on a global scale. Generative artificial intelligence (GAI) has emerged as a promising and innovative technological advancement that has significant potential in the field of mental health care. Nevertheless, there is a scarcity of research dedicated to examining and understanding the application landscape of GAI within this domain. Objective: This review aims to inform the current state of GAI knowledge and identify its key uses in the mental health domain by consolidating relevant literature. Methods: Records were searched within 8 reputable sources including Web of Science, PubMed, IEEE Xplore, medRxiv, bioRxiv, Google Scholar, CNKI and Wanfang databases between 2013 and 2023. Our focus was on original, empirical research with either English or Chinese publications that use GAI technologies to benefit mental health. For an exhaustive search, we also checked the studies cited by relevant literature. Two reviewers were responsible for the data selection process, and all the extracted data were synthesized and summarized for brief and in-depth analyses depending on the GAI approaches used (traditional retrieval and rule-based techniques vs advanced GAI techniques). Results: In this review of 144 articles, 44 (30.6%) met the inclusion criteria for detailed analysis. Six key uses of advanced GAI emerged: mental disorder detection, counseling support, therapeutic application, clinical training, clinical decision-making support, and goal-driven optimization. Advanced GAI systems have been mainly focused on therapeutic applications (n=19, 43%) and counseling support (n=13, 30%), with clinical training being the least common. Most studies (n=28, 64%) focused broadly on mental health, while specific conditions such as anxiety (n=1, 2%), bipolar disorder (n=2, 5%), eating disorders (n=1, 2%), posttraumatic stress disorder (n=2, 5%), and schizophrenia (n=1, 2%) received limited attention. Despite prevalent use, the efficacy of ChatGPT in the detection of mental disorders remains insufficient. In addition, 100 articles on traditional GAI approaches were found, indicating diverse areas where advanced GAI could enhance mental health care. Conclusions: This study provides a comprehensive overview of the use of GAI in mental health care, which serves as a valuable guide for future research, practical applications, and policy development in this domain. While GAI demonstrates promise in augmenting mental health care services, its inherent limitations emphasize its role as a supplementary tool rather than a replacement for trained mental health providers. A conscientious and ethical integration of GAI techniques is necessary, ensuring a balanced approach that maximizes benefits while mitigating potential challenges in mental health care practices. %M 39133916 %R 10.2196/53672 %U https://www.i-jmr.org/2024/1/e53672 %U https://doi.org/10.2196/53672 %U http://www.ncbi.nlm.nih.gov/pubmed/39133916 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 10 %N %P e57276 %T Artificial Intelligence as a Potential Catalyst to a More Equitable Cancer Care %A Garcia-Saiso,Sebastian %A Marti,Myrna %A Pesce,Karina %A Luciani,Silvana %A Mujica,Oscar %A Hennis,Anselm %A D'Agostino,Marcelo %+ Pan American Health Organization, 525 23rd st NW, Washington, DC, 20037, United States, 1 7034737961, dagostim@paho.org %K digital health %K public health %K cancer %K artificial intelligence %K AI %K catalyst %K cancer care %K cost %K costs %K demographic %K epidemiological %K change %K changes %K healthcare %K equality %K health system %K mHealth %K mobile health %D 2024 %7 12.8.2024 %9 Viewpoint %J JMIR Cancer %G English %X As we enter the era of digital interdependence, artificial intelligence (AI) emerges as a key instrument to transform health care and address disparities and barriers in access to services. This viewpoint explores AI's potential to reduce inequalities in cancer care by improving diagnostic accuracy, optimizing resource allocation, and expanding access to medical care, especially in underserved communities. Despite persistent barriers, such as socioeconomic and geographical disparities, AI can significantly improve health care delivery. Key applications include AI-driven health equity monitoring, predictive analytics, mental health support, and personalized medicine. This viewpoint highlights the need for inclusive development practices and ethical considerations to ensure diverse data representation and equitable access. Emphasizing the role of AI in cancer care, especially in low- and middle-income countries, we underscore the importance of collaborative and multidisciplinary efforts to integrate AI effectively and ethically into health systems. This call to action highlights the need for further research on user experiences and the unique social, cultural, and political barriers to AI implementation in cancer care. %M 39133537 %R 10.2196/57276 %U https://cancer.jmir.org/2024/1/e57276 %U https://doi.org/10.2196/57276 %U http://www.ncbi.nlm.nih.gov/pubmed/39133537 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e57097 %T Recognition of Daily Activities in Adults With Wearable Inertial Sensors: Deep Learning Methods Study %A De Ramón Fernández,Alberto %A Ruiz Fernández,Daniel %A García Jaén,Miguel %A Cortell-Tormo,Juan M. %+ Department of Computer Technology, University of Alicante, Carretera San Vicente del Raspeig s/n, San Vicente del Raspeig, 03690, Spain, 34 965 90 9656 ext 3331, druiz@dtic.ua.es %K activities of daily living %K ADL %K ADLs %K deep learning %K deep learning models %K wearable inertial sensors %K clinical evaluation %K patient’s rehabilitation %K rehabilitation %K movement %K accelerometers %K accelerometer %K accelerometry %K wearable %K wearables %K sensor %K sensors %K gyroscopes %K gyroscope %K monitor %K monitoring %D 2024 %7 9.8.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Activities of daily living (ADL) are essential for independence and personal well-being, reflecting an individual’s functional status. Impairment in executing these tasks can limit autonomy and negatively affect quality of life. The assessment of physical function during ADL is crucial for the prevention and rehabilitation of movement limitations. Still, its traditional evaluation based on subjective observation has limitations in precision and objectivity. Objective: The primary objective of this study is to use innovative technology, specifically wearable inertial sensors combined with artificial intelligence techniques, to objectively and accurately evaluate human performance in ADL. It is proposed to overcome the limitations of traditional methods by implementing systems that allow dynamic and noninvasive monitoring of movements during daily activities. The approach seeks to provide an effective tool for the early detection of dysfunctions and the personalization of treatment and rehabilitation plans, thus promoting an improvement in the quality of life of individuals. Methods: To monitor movements, wearable inertial sensors were developed, which include accelerometers and triaxial gyroscopes. The developed sensors were used to create a proprietary database with 6 movements related to the shoulder and 3 related to the back. We registered 53,165 activity records in the database (consisting of accelerometer and gyroscope measurements), which were reduced to 52,600 after processing to remove null or abnormal values. Finally, 4 deep learning (DL) models were created by combining various processing layers to explore different approaches in ADL recognition. Results: The results revealed high performance of the 4 proposed models, with levels of accuracy, precision, recall, and F1-score ranging between 95% and 97% for all classes and an average loss of 0.10. These results indicate the great capacity of the models to accurately identify a variety of activities, with a good balance between precision and recall. Both the convolutional and bidirectional approaches achieved slightly superior results, although the bidirectional model reached convergence in a smaller number of epochs. Conclusions: The DL models implemented have demonstrated solid performance, indicating an effective ability to identify and classify various daily activities related to the shoulder and lumbar region. These results were achieved with minimal sensorization—being noninvasive and practically imperceptible to the user—which does not affect their daily routine and promotes acceptance and adherence to continuous monitoring, thus improving the reliability of the data collected. This research has the potential to have a significant impact on the clinical evaluation and rehabilitation of patients with movement limitations, by providing an objective and advanced tool to detect key movement patterns and joint dysfunctions. %M 39121473 %R 10.2196/57097 %U https://medinform.jmir.org/2024/1/e57097 %U https://doi.org/10.2196/57097 %U http://www.ncbi.nlm.nih.gov/pubmed/39121473 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e56413 %T Performance of Large Language Models in Patient Complaint Resolution: Web-Based Cross-Sectional Survey %A Yong,Lorraine Pei Xian %A Tung,Joshua Yi Min %A Lee,Zi Yao %A Kuan,Win Sen %A Chua,Mui Teng %+ Emergency Medicine Department, National University Hospital, National University Health System, 5 Lower Kent Ridge Road, Singapore, 119074, Singapore, 65 67725000, lorraineyong@nus.edu.sg %K ChatGPT %K large language models %K artificial intelligence %K patient complaint %K health care complaint %K empathy %K efficiency %K patient satisfaction %K resource allocation %D 2024 %7 9.8.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Patient complaints are a perennial challenge faced by health care institutions globally, requiring extensive time and effort from health care workers. Despite these efforts, patient dissatisfaction remains high. Recent studies on the use of large language models (LLMs) such as the GPT models developed by OpenAI in the health care sector have shown great promise, with the ability to provide more detailed and empathetic responses as compared to physicians. LLMs could potentially be used in responding to patient complaints to improve patient satisfaction and complaint response time. Objective: This study aims to evaluate the performance of LLMs in addressing patient complaints received by a tertiary health care institution, with the goal of enhancing patient satisfaction. Methods: Anonymized patient complaint emails and associated responses from the patient relations department were obtained. ChatGPT-4.0 (OpenAI, Inc) was provided with the same complaint email and tasked to generate a response. The complaints and the respective responses were uploaded onto a web-based questionnaire. Respondents were asked to rate both responses on a 10-point Likert scale for 4 items: appropriateness, completeness, empathy, and satisfaction. Participants were also asked to choose a preferred response at the end of each scenario. Results: There was a total of 188 respondents, of which 115 (61.2%) were health care workers. A majority of the respondents, including both health care and non–health care workers, preferred replies from ChatGPT (n=164, 87.2% to n=183, 97.3%). GPT-4.0 responses were rated higher in all 4 assessed items with all median scores of 8 (IQR 7-9) compared to human responses (appropriateness 5, IQR 3-7; empathy 4, IQR 3-6; quality 5, IQR 3-6; satisfaction 5, IQR 3-6; P<.001) and had higher average word counts as compared to human responses (238 vs 76 words). Regression analyses showed that a higher word count was a statistically significant predictor of higher score in all 4 items, with every 1-word increment resulting in an increase in scores of between 0.015 and 0.019 (all P<.001). However, on subgroup analysis by authorship, this only held true for responses written by patient relations department staff and not those generated by ChatGPT which received consistently high scores irrespective of response length. Conclusions: This study provides significant evidence supporting the effectiveness of LLMs in resolution of patient complaints. ChatGPT demonstrated superiority in terms of response appropriateness, empathy, quality, and overall satisfaction when compared against actual human responses to patient complaints. Future research can be done to measure the degree of improvement that artificial intelligence generated responses can bring in terms of time savings, cost-effectiveness, patient satisfaction, and stress reduction for the health care system. %M 39121468 %R 10.2196/56413 %U https://www.jmir.org/2024/1/e56413 %U https://doi.org/10.2196/56413 %U http://www.ncbi.nlm.nih.gov/pubmed/39121468 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e55151 %T Differences in Fear and Negativity Levels Between Formal and Informal Health-Related Websites: Analysis of Sentiments and Emotions %A Paradise Vit,Abigail %A Magid,Avi %+ Department of Information Systems, The Max Stern Yezreel Valley College, Emek Yezreel 1, Emek Yezreel, 1930600, Israel, 972 509903930, abigailparadise@gmail.com %K emotions %K sentiment %K health websites %K fear %D 2024 %7 9.8.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Searching for web-based health-related information is frequently performed by the public and may affect public behavior regarding health decision-making. Particularly, it may result in anxiety, erroneous, and harmful self-diagnosis. Most searched health-related topics are cancer, cardiovascular diseases, and infectious diseases. A health-related web-based search may result in either formal or informal medical website, both of which may evoke feelings of fear and negativity. Objective: Our study aimed to assess whether there is a difference in fear and negativity levels between information appearing on formal and informal health-related websites. Methods: A web search was performed to retrieve the contents of websites containing symptoms of selected diseases, using selected common symptoms. Retrieved websites were classified into formal and informal websites. Fear and negativity of each content were evaluated using 3 transformer models. A fourth transformer model was fine-tuned using an existing emotion data set obtained from a web-based health community. For formal and informal websites, fear and negativity levels were aggregated. t tests were conducted to evaluate the differences in fear and negativity levels between formal and informal websites. Results: In this study, unique websites (N=1448) were collected, of which 534 were considered formal and 914 were considered informal. There were 1820 result pages from formal websites and 1494 result pages from informal websites. According to our findings, fear levels were statistically higher (t2753=3.331; P<.001) on formal websites (mean 0.388, SD 0.177) than on informal websites (mean 0.366, SD 0.168). The results also show that the level of negativity was statistically higher (t2753=2.726; P=.006) on formal websites (mean 0.657, SD 0.211) than on informal websites (mean 0.636, SD 0.201). Conclusions: Positive texts may increase the credibility of formal health websites and increase their usage by the general public and the public’s compliance to the recommendations. Increasing the usage of natural language processing tools before publishing health-related information to achieve a more positive and less stressful text to be disseminated to the public is recommended. %M 39120928 %R 10.2196/55151 %U https://www.jmir.org/2024/1/e55151 %U https://doi.org/10.2196/55151 %U http://www.ncbi.nlm.nih.gov/pubmed/39120928 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51706 %T A 3D and Explainable Artificial Intelligence Model for Evaluation of Chronic Otitis Media Based on Temporal Bone Computed Tomography: Model Development, Validation, and Clinical Application %A Chen,Binjun %A Li,Yike %A Sun,Yu %A Sun,Haojie %A Wang,Yanmei %A Lyu,Jihan %A Guo,Jiajie %A Bao,Shunxing %A Cheng,Yushu %A Niu,Xun %A Yang,Lian %A Xu,Jianghong %A Yang,Juanmei %A Huang,Yibo %A Chi,Fanglu %A Liang,Bo %A Ren,Dongdong %+ Department of Otolaryngology—Head and Neck Surgery, Vanderbilt University Medical Center, 1215 Medical Center Drive, Nashville, TN, 37232, United States, 1 6153438146, yike.li.1@vumc.org %K artificial intelligence %K cholesteatoma %K deep learning %K otitis media %K tomography, x-ray computed %K machine learning %K mastoidectomy %K convolutional neural networks %K temporal bone %D 2024 %7 8.8.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Temporal bone computed tomography (CT) helps diagnose chronic otitis media (COM). However, its interpretation requires training and expertise. Artificial intelligence (AI) can help clinicians evaluate COM through CT scans, but existing models lack transparency and may not fully leverage multidimensional diagnostic information. Objective: We aimed to develop an explainable AI system based on 3D convolutional neural networks (CNNs) for automatic CT-based evaluation of COM. Methods: Temporal bone CT scans were retrospectively obtained from patients operated for COM between December 2015 and July 2021 at 2 independent institutes. A region of interest encompassing the middle ear was automatically segmented, and 3D CNNs were subsequently trained to identify pathological ears and cholesteatoma. An ablation study was performed to refine model architecture. Benchmark tests were conducted against a baseline 2D model and 7 clinical experts. Model performance was measured through cross-validation and external validation. Heat maps, generated using Gradient-Weighted Class Activation Mapping, were used to highlight critical decision-making regions. Finally, the AI system was assessed with a prospective cohort to aid clinicians in preoperative COM assessment. Results: Internal and external data sets contained 1661 and 108 patients (3153 and 211 eligible ears), respectively. The 3D model exhibited decent performance with mean areas under the receiver operating characteristic curves of 0.96 (SD 0.01) and 0.93 (SD 0.01), and mean accuracies of 0.878 (SD 0.017) and 0.843 (SD 0.015), respectively, for detecting pathological ears on the 2 data sets. Similar outcomes were observed for cholesteatoma identification (mean area under the receiver operating characteristic curve 0.85, SD 0.03 and 0.83, SD 0.05; mean accuracies 0.783, SD 0.04 and 0.813, SD 0.033, respectively). The proposed 3D model achieved a commendable balance between performance and network size relative to alternative models. It significantly outperformed the 2D approach in detecting COM (P≤.05) and exhibited a substantial gain in identifying cholesteatoma (P<.001). The model also demonstrated superior diagnostic capabilities over resident fellows and the attending otologist (P<.05), rivaling all senior clinicians in both tasks. The generated heat maps properly highlighted the middle ear and mastoid regions, aligning with human knowledge in interpreting temporal bone CT. The resulting AI system achieved an accuracy of 81.8% in generating preoperative diagnoses for 121 patients and contributed to clinical decision-making in 90.1% cases. Conclusions: We present a 3D CNN model trained to detect pathological changes and identify cholesteatoma via temporal bone CT scans. In both tasks, this model significantly outperforms the baseline 2D approach, achieving levels comparable with or surpassing those of human experts. The model also exhibits decent generalizability and enhanced comprehensibility. This AI system facilitates automatic COM assessment and shows promising viability in real-world clinical settings. These findings underscore AI’s potential as a valuable aid for clinicians in COM evaluation. Trial Registration: Chinese Clinical Trial Registry ChiCTR2000036300; https://www.chictr.org.cn/showprojEN.html?proj=58685 %M 39116439 %R 10.2196/51706 %U https://www.jmir.org/2024/1/e51706 %U https://doi.org/10.2196/51706 %U http://www.ncbi.nlm.nih.gov/pubmed/39116439 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e57830 %T Mapping Knowledge Landscapes and Emerging Trends in AI for Dementia Biomarkers: Bibliometric and Visualization Analysis %A Qi,Wenhao %A Zhu,Xiaohong %A He,Danni %A Wang,Bin %A Cao,Shihua %A Dong,Chaoqun %A Li,Yunhua %A Chen,Yanfei %A Wang,Bingsheng %A Shi,Yankai %A Jiang,Guowei %A Liu,Fang %A Boots,Lizzy M M %A Li,Jiaqi %A Lou,Xiajing %A Yao,Jiani %A Lu,Xiaodong %A Kang,Junling %+ School of Nursing, Hangzhou Normal University, No. 2318, Yuhangtang Road, Yuhang District, Hangzhou, 310021, China, 86 13777861361, csh@hznu.edu.cn %K artificial intelligence %K AI %K biomarker %K dementia %K machine learning %K bibliometric analysis %D 2024 %7 8.8.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: With the rise of artificial intelligence (AI) in the field of dementia biomarker research, exploring its current developmental trends and research focuses has become increasingly important. This study, using literature data mining, analyzes and assesses the key contributions and development scale of AI in dementia biomarker research. Objective: The aim of this study was to comprehensively evaluate the current state, hot topics, and future trends of AI in dementia biomarker research globally. Methods: This study thoroughly analyzed the literature in the application of AI to dementia biomarkers across various dimensions, such as publication volume, authors, institutions, journals, and countries, based on the Web of Science Core Collection. In addition, scales, trends, and potential connections between AI and biomarkers were extracted and deeply analyzed through multiple expert panels. Results: To date, the field includes 1070 publications across 362 journals, involving 74 countries and 1793 major research institutions, with a total of 6455 researchers. Notably, 69.41% (994/1432) of the researchers ceased their studies before 2019. The most prevalent algorithms used are support vector machines, random forests, and neural networks. Current research frequently focuses on biomarkers such as imaging biomarkers, cerebrospinal fluid biomarkers, genetic biomarkers, and blood biomarkers. Recent advances have highlighted significant discoveries in biomarkers related to imaging, genetics, and blood, with growth in studies on digital and ophthalmic biomarkers. Conclusions: The field is currently in a phase of stable development, receiving widespread attention from numerous countries, institutions, and researchers worldwide. Despite this, stable clusters of collaborative research have yet to be established, and there is a pressing need to enhance interdisciplinary collaboration. Algorithm development has shown prominence, especially the application of support vector machines and neural networks in imaging studies. Looking forward, newly discovered biomarkers are expected to undergo further validation, and new types, such as digital biomarkers, will garner increased research interest and attention. %M 39116438 %R 10.2196/57830 %U https://www.jmir.org/2024/1/e57830 %U https://doi.org/10.2196/57830 %U http://www.ncbi.nlm.nih.gov/pubmed/39116438 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e46800 %T Assessing ChatGPT’s Capability for Multiple Choice Questions Using RaschOnline: Observational Study %A Chow,Julie Chi %A Cheng,Teng Yun %A Chien,Tsair-Wei %A Chou,Willy %+ Department of Physical Medicine and Rehabilitation, Chi Mei Medical Center, No. 901, Chung Hwa Road, Yung Kung District, Tainan, 710, Taiwan, 886 937399106, smilewilly@mail.chimei.org.tw %K RaschOnline %K ChatGPT %K multiple choice questions %K differential item functioning %K Wright map %K KIDMAP %K website tool %K evaluation tool %K tool %K application %K artificial intelligence %K scoring %K testing %K college %K students %D 2024 %7 8.8.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: ChatGPT (OpenAI), a state-of-the-art large language model, has exhibited remarkable performance in various specialized applications. Despite the growing popularity and efficacy of artificial intelligence, there is a scarcity of studies that assess ChatGPT’s competence in addressing multiple-choice questions (MCQs) using KIDMAP of Rasch analysis—a website tool used to evaluate ChatGPT’s performance in MCQ answering. Objective: This study aims to (1) showcase the utility of the website (Rasch analysis, specifically RaschOnline), and (2) determine the grade achieved by ChatGPT when compared to a normal sample. Methods: The capability of ChatGPT was evaluated using 10 items from the English tests conducted for Taiwan college entrance examinations in 2023. Under a Rasch model, 300 simulated students with normal distributions were simulated to compete with ChatGPT’s responses. RaschOnline was used to generate 5 visual presentations, including item difficulties, differential item functioning, item characteristic curve, Wright map, and KIDMAP, to address the research objectives. Results: The findings revealed the following: (1) the difficulty of the 10 items increased in a monotonous pattern from easier to harder, represented by logits (–2.43, –1.78, –1.48, –0.64, –0.1, 0.33, 0.59, 1.34, 1.7, and 2.47); (2) evidence of differential item functioning was observed between gender groups for item 5 (P=.04); (3) item 5 displayed a good fit to the Rasch model (P=.61); (4) all items demonstrated a satisfactory fit to the Rasch model, indicated by Infit mean square errors below the threshold of 1.5; (5) no significant difference was found in the measures obtained between gender groups (P=.83); (6) a significant difference was observed among ability grades (P<.001); and (7) ChatGPT’s capability was graded as A, surpassing grades B to E. Conclusions: By using RaschOnline, this study provides evidence that ChatGPT possesses the ability to achieve a grade A when compared to a normal sample. It exhibits excellent proficiency in answering MCQs from the English tests conducted in 2023 for the Taiwan college entrance examinations. %M 39115919 %R 10.2196/46800 %U https://formative.jmir.org/2024/1/e46800 %U https://doi.org/10.2196/46800 %U http://www.ncbi.nlm.nih.gov/pubmed/39115919 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 11 %N %P e53108 %T Patients’ Attitudes Toward the Use of Artificial Intelligence as a Diagnostic Tool in Radiology in Saudi Arabia: Cross-Sectional Study %A Baghdadi,Leena R %A Mobeirek,Arwa A %A Alhudaithi,Dania R %A Albenmousa,Fatimah A %A Alhadlaq,Leen S %A Alaql,Maisa S %A Alhamlan,Sarah A %+ Department of Family and Community Medicine, College of Medicine, King Saud University, 3332 King Khalid Road, Riyadh, 12372, Saudi Arabia, 966 114670836, lbaghdadi@ksu.edu.sa %K artificial intelligence %K diagnostic radiology %K patients %K attitudes %K questionnaire %K patient %K attitude %K diagnostic tool %K diagnostic tools %K AI %K artificial intelligence %K radiologists %K prognosis %K treatment %K Saudi Arabia %K sociodemographic factors %K sociodemographic factor %K sociodemographic %K cross-sectional study %K participant %K men %K women %K analysis %K distrust %K trust %D 2024 %7 7.8.2024 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Artificial intelligence (AI) is widely used in various medical fields, including diagnostic radiology as a tool for greater efficiency, precision, and accuracy. The integration of AI as a radiological diagnostic tool has the potential to mitigate delays in diagnosis, which could, in turn, impact patients’ prognosis and treatment outcomes. The literature shows conflicting results regarding patients’ attitudes to AI as a diagnostic tool. To the best of our knowledge, no similar study has been conducted in Saudi Arabia. Objective: The objectives of this study are to examine patients’ attitudes toward the use of AI as a tool in diagnostic radiology at King Khalid University Hospital, Saudi Arabia. Additionally, we sought to explore potential associations between patients’ attitudes and various sociodemographic factors. Methods: This descriptive-analytical cross-sectional study was conducted in a tertiary care hospital. Data were collected from patients scheduled for radiological imaging through a validated self-administered questionnaire. The main outcome was to measure patients’ attitudes to the use of AI in radiology by calculating mean scores of 5 factors, distrust and accountability (factor 1), procedural knowledge (factor 2), personal interaction and communication (factor 3), efficiency (factor 4), and methods of providing information to patients (factor 5). Data were analyzed using the student t test, one-way analysis of variance followed by post hoc and multivariable analysis. Results: A total of 382 participants (n=273, 71.5% women and n=109, 28.5% men) completed the surveys and were included in the analysis. The mean age of the respondents was 39.51 (SD 13.26) years. Participants favored physicians over AI for procedural knowledge, personal interaction, and being informed. However, the participants demonstrated a neutral attitude for distrust and accountability and for efficiency. Marital status was found to be associated with distrust and accountability, procedural knowledge, and personal interaction. Associations were also found between self-reported health status and being informed and between the field of specialization and distrust and accountability. Conclusions: Patients were keen to understand the work of AI in radiology but favored personal interaction with a radiologist. Patients were impartial toward AI replacing radiologists and the efficiency of AI, which should be a consideration in future policy development and integration. Future research involving multicenter studies in different regions of Saudi Arabia is required. %M 39110973 %R 10.2196/53108 %U https://humanfactors.jmir.org/2024/1/e53108 %U https://doi.org/10.2196/53108 %U http://www.ncbi.nlm.nih.gov/pubmed/39110973 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e46407 %T Evaluating Artificial Intelligence in Clinical Settings—Let Us Not Reinvent the Wheel %A Cresswell,Kathrin %A de Keizer,Nicolette %A Magrabi,Farah %A Williams,Robin %A Rigby,Michael %A Prgomet,Mirela %A Kukhareva,Polina %A Wong,Zoie Shui-Yee %A Scott,Philip %A Craven,Catherine K %A Georgiou,Andrew %A Medlock,Stephanie %A Brender McNair,Jytte %A Ammenwerth,Elske %+ Usher Institute, The University of Edinburgh, Usher Building, 5-7 Little France Road, Edinburgh, EH16 4UX, United Kingdom, 44 131 650 6984, kathrin.cresswell@ed.ac.uk %K artificial intelligence %K evaluation %K theory %K patient safety %K optimisation %K health care %K optimization %D 2024 %7 7.8.2024 %9 Viewpoint %J J Med Internet Res %G English %X Given the requirement to minimize the risks and maximize the benefits of technology applications in health care provision, there is an urgent need to incorporate theory-informed health IT (HIT) evaluation frameworks into existing and emerging guidelines for the evaluation of artificial intelligence (AI). Such frameworks can help developers, implementers, and strategic decision makers to build on experience and the existing empirical evidence base. We provide a pragmatic conceptual overview of selected concrete examples of how existing theory-informed HIT evaluation frameworks may be used to inform the safe development and implementation of AI in health care settings. The list is not exhaustive and is intended to illustrate applications in line with various stakeholder requirements. Existing HIT evaluation frameworks can help to inform AI-based development and implementation by supporting developers and strategic decision makers in considering relevant technology, user, and organizational dimensions. This can facilitate the design of technologies, their implementation in user and organizational settings, and the sustainability and scalability of technologies. %M 39110494 %R 10.2196/46407 %U https://www.jmir.org/2024/1/e46407 %U https://doi.org/10.2196/46407 %U http://www.ncbi.nlm.nih.gov/pubmed/39110494 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e59434 %T Use of Generative AI for Improving Health Literacy in Reproductive Health: Case Study %A Burns,Christina %A Bakaj,Angela %A Berishaj,Amonda %A Hristidis,Vagelis %A Deak,Pamela %A Equils,Ozlem %+ MiOra, 17328 Ventura Boulevard Number 190, Encino, CA, 91316, United States, 1 3105954094, oequils@yahoo.com %K ChatGPT %K chatGPT %K chat-GPT %K chatbots %K chat-bot %K chat-bots %K artificial intelligence %K AI %K machine learning %K ML %K large language model %K large language models %K LLM %K LLMs %K natural language processing %K NLP %K deep learning %K chatbot %K Google Search %K internet %K communication %K English proficiency %K readability %K health literacy %K health information %K health education %K health related questions %K health information seeking %K health access %K reproductive health %K oral contraceptive %K birth control %K emergency contraceptive %K comparison %K clinical %K patients %D 2024 %7 6.8.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Patients find technology tools to be more approachable for seeking sensitive health-related information, such as reproductive health information. The inventive conversational ability of artificial intelligence (AI) chatbots, such as ChatGPT (OpenAI Inc), offers a potential means for patients to effectively locate answers to their health-related questions digitally. Objective: A pilot study was conducted to compare the novel ChatGPT with the existing Google Search technology for their ability to offer accurate, effective, and current information regarding proceeding action after missing a dose of oral contraceptive pill. Methods: A sequence of 11 questions, mimicking a patient inquiring about the action to take after missing a dose of an oral contraceptive pill, were input into ChatGPT as a cascade, given the conversational ability of ChatGPT. The questions were input into 4 different ChatGPT accounts, with the account holders being of various demographics, to evaluate potential differences and biases in the responses given to different account holders. The leading question, “what should I do if I missed a day of my oral contraception birth control?” alone was then input into Google Search, given its nonconversational nature. The results from the ChatGPT questions and the Google Search results for the leading question were evaluated on their readability, accuracy, and effective delivery of information. Results: The ChatGPT results were determined to be at an overall higher-grade reading level, with a longer reading duration, less accurate, less current, and with a less effective delivery of information. In contrast, the Google Search resulting answer box and snippets were at a lower-grade reading level, shorter reading duration, more current, able to reference the origin of the information (transparent), and provided the information in various formats in addition to text. Conclusions: ChatGPT has room for improvement in accuracy, transparency, recency, and reliability before it can equitably be implemented into health care information delivery and provide the potential benefits it poses. However, AI may be used as a tool for providers to educate their patients in preferred, creative, and efficient ways, such as using AI to generate accessible short educational videos from health care provider-vetted information. Larger studies representing a diverse group of users are needed. %M 38986153 %R 10.2196/59434 %U https://formative.jmir.org/2024/1/e59434 %U https://doi.org/10.2196/59434 %U http://www.ncbi.nlm.nih.gov/pubmed/38986153 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 11 %N %P e48584 %T Assessing Patient Trust in Automation in Health Care Systems: Within-Subjects Experimental Study %A Nare,Matthew %A Jurewicz,Katherina %+ School of Industrial Engineering and Management, Oklahoma State University, 329 Engineering North, Stillwater, OK, 74078, United States, 1 405 744 4167, katie.jurewicz@okstate.edu %K automation %K emergency department %K trust %K health care %K artificial intelligence %K emergency %K perceptions %K attitude %K opinions %K belief %K automated %K trust ratings %D 2024 %7 6.8.2024 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Health care technology has the ability to change patient outcomes for the betterment when designed appropriately. Automation is becoming smarter and is increasingly being integrated into health care work systems. Objective: This study focuses on investigating trust between patients and an automated cardiac risk assessment tool (CRAT) in a simulated emergency department setting. Methods: A within-subjects experimental study was performed to investigate differences in automation modes for the CRAT: (1) no automation, (2) automation only, and (3) semiautomation. Participants were asked to enter their simulated symptoms for each scenario into the CRAT as instructed by the experimenter, and they would automatically be classified as high, medium, or low risk depending on the symptoms entered. Participants were asked to provide their trust ratings for each combination of risk classification and automation mode on a scale of 1 to 10 (1=absolutely no trust and 10=complete trust). Results: Results from this study indicate that the participants significantly trusted the semiautomation condition more compared to the automation-only condition (P=.002), and they trusted the no automation condition significantly more than the automation-only condition (P=.03). Additionally, participants significantly trusted the CRAT more in the high-severity scenario compared to the medium-severity scenario (P=.004). Conclusions: The findings from this study emphasize the importance of the human component of automation when designing automated technology in health care systems. Automation and artificially intelligent systems are becoming more prevalent in health care systems, and this work emphasizes the need to consider the human element when designing automation into care delivery. %M 39106096 %R 10.2196/48584 %U https://humanfactors.jmir.org/2024/1/e48584 %U https://doi.org/10.2196/48584 %U http://www.ncbi.nlm.nih.gov/pubmed/39106096 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e56932 %T Enhancing Clinical Relevance of Pretrained Language Models Through Integration of External Knowledge: Case Study on Cardiovascular Diagnosis From Electronic Health Records %A Lu,Qiuhao %A Wen,Andrew %A Nguyen,Thien %A Liu,Hongfang %+ McWilliams School of Biomedical Informatics, University of Texas Health Science Center, 7000 Fannin Street, Houston, TX, 77030, United States, 1 713 500 4472, Hongfang.Liu@uth.tmc.edu %K knowledge integration %K pre-trained language models %K physician reasoning %K adapters %K physician %K physicians %K electronic health record %K electronic health records %K EHR %K healthcare %K heterogeneous %K healthcare institution %K healthcare institutions %K proprietary information %K healthcare data %K methodology %K text classification %K data privacy %K medical knowledge %D 2024 %7 6.8.2024 %9 Original Paper %J JMIR AI %G English %X Background: Despite their growing use in health care, pretrained language models (PLMs) often lack clinical relevance due to insufficient domain expertise and poor interpretability. A key strategy to overcome these challenges is integrating external knowledge into PLMs, enhancing their adaptability and clinical usefulness. Current biomedical knowledge graphs like UMLS (Unified Medical Language System), SNOMED CT (Systematized Medical Nomenclature for Medicine–Clinical Terminology), and HPO (Human Phenotype Ontology), while comprehensive, fail to effectively connect general biomedical knowledge with physician insights. There is an equally important need for a model that integrates diverse knowledge in a way that is both unified and compartmentalized. This approach not only addresses the heterogeneous nature of domain knowledge but also recognizes the unique data and knowledge repositories of individual health care institutions, necessitating careful and respectful management of proprietary information. Objective: This study aimed to enhance the clinical relevance and interpretability of PLMs by integrating external knowledge in a manner that respects the diversity and proprietary nature of health care data. We hypothesize that domain knowledge, when captured and distributed as stand-alone modules, can be effectively reintegrated into PLMs to significantly improve their adaptability and utility in clinical settings. Methods: We demonstrate that through adapters, small and lightweight neural networks that enable the integration of extra information without full model fine-tuning, we can inject diverse sources of external domain knowledge into language models and improve the overall performance with an increased level of interpretability. As a practical application of this methodology, we introduce a novel task, structured as a case study, that endeavors to capture physician knowledge in assigning cardiovascular diagnoses from clinical narratives, where we extract diagnosis-comment pairs from electronic health records (EHRs) and cast the problem as text classification. Results: The study demonstrates that integrating domain knowledge into PLMs significantly improves their performance. While improvements with ClinicalBERT are more modest, likely due to its pretraining on clinical texts, BERT (bidirectional encoder representations from transformer) equipped with knowledge adapters surprisingly matches or exceeds ClinicalBERT in several metrics. This underscores the effectiveness of knowledge adapters and highlights their potential in settings with strict data privacy constraints. This approach also increases the level of interpretability of these models in a clinical context, which enhances our ability to precisely identify and apply the most relevant domain knowledge for specific tasks, thereby optimizing the model’s performance and tailoring it to meet specific clinical needs. Conclusions: This research provides a basis for creating health knowledge graphs infused with physician knowledge, marking a significant step forward for PLMs in health care. Notably, the model balances integrating knowledge both comprehensively and selectively, addressing the heterogeneous nature of medical knowledge and the privacy needs of health care institutions. %M 39106099 %R 10.2196/56932 %U https://ai.jmir.org/2024/1/e56932 %U https://doi.org/10.2196/56932 %U http://www.ncbi.nlm.nih.gov/pubmed/39106099 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e56750 %T An Effective Deep Learning Framework for Fall Detection: Model Development and Study Design %A Zhang,Jinxi %A Li,Zhen %A Liu,Yu %A Li,Jian %A Qiu,Hualong %A Li,Mohan %A Hou,Guohui %A Zhou,Zhixiong %+ Institute of Artificial Intelligence in Sports, Capital University of Physical Education and Sports, 11 North Third Ring West Road, Haidian District, Beijing, 100191, China, 86 13552505679, zhouzhixiong@cupes.edu.cn %K fall detection %K deep learning %K self-attention %K accelerometer %K gyroscope %K human health %K wearable sensors %K Sisfall %K MobiFall %D 2024 %7 5.8.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Fall detection is of great significance in safeguarding human health. By monitoring the motion data, a fall detection system (FDS) can detect a fall accident. Recently, wearable sensors–based FDSs have become the mainstream of research, which can be categorized into threshold-based FDSs using experience, machine learning–based FDSs using manual feature extraction, and deep learning (DL)–based FDSs using automatic feature extraction. However, most FDSs focus on the global information of sensor data, neglecting the fact that different segments of the data contribute variably to fall detection. This shortcoming makes it challenging for FDSs to accurately distinguish between similar human motion patterns of actual falls and fall-like actions, leading to a decrease in detection accuracy. Objective: This study aims to develop and validate a DL framework to accurately detect falls using acceleration and gyroscope data from wearable sensors. We aim to explore the essential contributing features extracted from sensor data to distinguish falls from activities of daily life. The significance of this study lies in reforming the FDS by designing a weighted feature representation using DL methods to effectively differentiate between fall events and fall-like activities. Methods: Based on the 3-axis acceleration and gyroscope data, we proposed a new DL architecture, the dual-stream convolutional neural network self-attention (DSCS) model. Unlike previous studies, the used architecture can extract global feature information from acceleration and gyroscope data. Additionally, we incorporated a self-attention module to assign different weights to the original feature vector, enabling the model to learn the contribution effect of the sensor data and enhance classification accuracy. The proposed model was trained and tested on 2 public data sets: SisFall and MobiFall. In addition, 10 participants were recruited to carry out practical validation of the DSCS model. A total of 1700 trials were performed to test the generalization ability of the model. Results: The fall detection accuracy of the DSCS model was 99.32% (recall=99.15%; precision=98.58%) and 99.65% (recall=100%; precision=98.39%) on the test sets of SisFall and MobiFall, respectively. In the ablation experiment, we compared the DSCS model with state-of-the-art machine learning and DL models. On the SisFall data set, the DSCS model achieved the second-best accuracy; on the MobiFall data set, the DSCS model achieved the best accuracy, recall, and precision. In practical validation, the accuracy of the DSCS model was 96.41% (recall=95.12%; specificity=97.55%). Conclusions: This study demonstrates that the DSCS model can significantly improve the accuracy of fall detection on 2 publicly available data sets and performs robustly in practical validation. %M 39102676 %R 10.2196/56750 %U https://www.jmir.org/2024/1/e56750 %U https://doi.org/10.2196/56750 %U http://www.ncbi.nlm.nih.gov/pubmed/39102676 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e57224 %T Predictors of Health Care Practitioners’ Intention to Use AI-Enabled Clinical Decision Support Systems: Meta-Analysis Based on the Unified Theory of Acceptance and Use of Technology %A Dingel,Julius %A Kleine,Anne-Kathrin %A Cecil,Julia %A Sigl,Anna Leonie %A Lermer,Eva %A Gaube,Susanne %+ Human-AI-Interaction Group, Center for Leadership and People Management, Ludwig Maximilian University of Munich, Geschwister-Scholl-Platz 1, Munich, 80539, Germany, 49 8921809775, anne-kathrin.kleine@psy.lmu.de %K Unified Theory of Acceptance and Use of Technology %K UTAUT %K artificial intelligence–enabled clinical decision support systems %K AI-CDSSs %K meta-analysis %K health care practitioners %D 2024 %7 5.8.2024 %9 Review %J J Med Internet Res %G English %X Background: Artificial intelligence–enabled clinical decision support systems (AI-CDSSs) offer potential for improving health care outcomes, but their adoption among health care practitioners remains limited. Objective: This meta-analysis identified predictors influencing health care practitioners’ intention to use AI-CDSSs based on the Unified Theory of Acceptance and Use of Technology (UTAUT). Additional predictors were examined based on existing empirical evidence. Methods: The literature search using electronic databases, forward searches, conference programs, and personal correspondence yielded 7731 results, of which 17 (0.22%) studies met the inclusion criteria. Random-effects meta-analysis, relative weight analyses, and meta-analytic moderation and mediation analyses were used to examine the relationships between relevant predictor variables and the intention to use AI-CDSSs. Results: The meta-analysis results supported the application of the UTAUT to the context of the intention to use AI-CDSSs. The results showed that performance expectancy (r=0.66), effort expectancy (r=0.55), social influence (r=0.66), and facilitating conditions (r=0.66) were positively associated with the intention to use AI-CDSSs, in line with the predictions of the UTAUT. The meta-analysis further identified positive attitude (r=0.63), trust (r=0.73), anxiety (r=–0.41), perceived risk (r=–0.21), and innovativeness (r=0.54) as additional relevant predictors. Trust emerged as the most influential predictor overall. The results of the moderation analyses show that the relationship between social influence and use intention becomes weaker with increasing age. In addition, the relationship between effort expectancy and use intention was stronger for diagnostic AI-CDSSs than for devices that combined diagnostic and treatment recommendations. Finally, the relationship between facilitating conditions and use intention was mediated through performance and effort expectancy. Conclusions: This meta-analysis contributes to the understanding of the predictors of intention to use AI-CDSSs based on an extended UTAUT model. More research is needed to substantiate the identified relationships and explain the observed variations in effect sizes by identifying relevant moderating factors. The research findings bear important implications for the design and implementation of training programs for health care practitioners to ease the adoption of AI-CDSSs into their practice. %M 39102675 %R 10.2196/57224 %U https://www.jmir.org/2024/1/e57224 %U https://doi.org/10.2196/57224 %U http://www.ncbi.nlm.nih.gov/pubmed/39102675 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e56627 %T Advancing Accuracy in Multimodal Medical Tasks Through Bootstrapped Language-Image Pretraining (BioMedBLIP): Performance Evaluation Study %A Naseem,Usman %A Thapa,Surendrabikram %A Masood,Anum %+ Department of Circulation and Medical Imaging, Norwegian University of Science and Technology, B4-135, Realfagbygget Building., Gloshaugen Campus, Trondheim, 7491, Norway, 47 92093743, anum.msd@gmail.com %K biomedical text mining %K BioNLP %K vision-language pretraining %K multimodal models %K medical image analysis %D 2024 %7 5.8.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Medical image analysis, particularly in the context of visual question answering (VQA) and image captioning, is crucial for accurate diagnosis and educational purposes. Objective: Our study aims to introduce BioMedBLIP models, fine-tuned for VQA tasks using specialized medical data sets such as Radiology Objects in Context and Medical Information Mart for Intensive Care-Chest X-ray, and evaluate their performance in comparison to the state of the art (SOTA) original Bootstrapping Language-Image Pretraining (BLIP) model. Methods: We present 9 versions of BioMedBLIP across 3 downstream tasks in various data sets. The models are trained on a varying number of epochs. The findings indicate the strong overall performance of our models. We proposed BioMedBLIP for the VQA generation model, VQA classification model, and BioMedBLIP image caption model. We conducted pretraining in BLIP using medical data sets, producing an adapted BLIP model tailored for medical applications. Results: In VQA generation tasks, BioMedBLIP models outperformed the SOTA on the Semantically-Labeled Knowledge-Enhanced (SLAKE) data set, VQA in Radiology (VQA-RAD), and Image Cross-Language Evaluation Forum data sets. In VQA classification, our models consistently surpassed the SOTA on the SLAKE data set. Our models also showed competitive performance on the VQA-RAD and PathVQA data sets. Similarly, in image captioning tasks, our model beat the SOTA, suggesting the importance of pretraining with medical data sets. Overall, in 20 different data sets and task combinations, our BioMedBLIP excelled in 15 (75%) out of 20 tasks. BioMedBLIP represents a new SOTA in 15 (75%) out of 20 tasks, and our responses were rated higher in all 20 tasks (P<.005) in comparison to SOTA models. Conclusions: Our BioMedBLIP models show promising performance and suggest that incorporating medical knowledge through pretraining with domain-specific medical data sets helps models achieve higher performance. Our models thus demonstrate their potential to advance medical image analysis, impacting diagnosis, medical education, and research. However, data quality, task-specific variability, computational resources, and ethical considerations should be carefully addressed. In conclusion, our models represent a contribution toward the synergy of artificial intelligence and medicine. We have made BioMedBLIP freely available, which will help in further advancing research in multimodal medical tasks. %M 39102281 %R 10.2196/56627 %U https://medinform.jmir.org/2024/1/e56627 %U https://doi.org/10.2196/56627 %U http://www.ncbi.nlm.nih.gov/pubmed/39102281 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e60336 %T Patient-Representing Population's Perceptions of GPT-Generated Versus Standard Emergency Department Discharge Instructions: Randomized Blind Survey Assessment %A Huang,Thomas %A Safranek,Conrad %A Socrates,Vimig %A Chartash,David %A Wright,Donald %A Dilip,Monisha %A Sangal,Rohit B %A Taylor,Richard Andrew %+ Department of Emergency Medicine, Yale School of Medicine, 333 Cedar Street, New Haven, CT, 06510, United States, 1 2034324771, richard.taylor@yale.edu %K machine learning %K artificial intelligence %K large language models %K natural language processing %K ChatGPT %K discharge instructions %K emergency medicine %K emergency department %K discharge instructions %K surveys and questionaries %D 2024 %7 2.8.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Discharge instructions are a key form of documentation and patient communication in the time of transition from the emergency department (ED) to home. Discharge instructions are time-consuming and often underprioritized, especially in the ED, leading to discharge delays and possibly impersonal patient instructions. Generative artificial intelligence and large language models (LLMs) offer promising methods of creating high-quality and personalized discharge instructions; however, there exists a gap in understanding patient perspectives of LLM-generated discharge instructions. Objective: We aimed to assess the use of LLMs such as ChatGPT in synthesizing accurate and patient-accessible discharge instructions in the ED. Methods: We synthesized 5 unique, fictional ED encounters to emulate real ED encounters that included a diverse set of clinician history, physical notes, and nursing notes. These were passed to GPT-4 in Azure OpenAI Service (Microsoft) to generate LLM-generated discharge instructions. Standard discharge instructions were also generated for each of the 5 unique ED encounters. All GPT-generated and standard discharge instructions were then formatted into standardized after-visit summary documents. These after-visit summaries containing either GPT-generated or standard discharge instructions were randomly and blindly administered to Amazon MTurk respondents representing patient populations through Amazon MTurk Survey Distribution. Discharge instructions were assessed based on metrics of interpretability of significance, understandability, and satisfaction. Results: Our findings revealed that survey respondents’ perspectives regarding GPT-generated and standard discharge instructions were significantly (P=.01) more favorable toward GPT-generated return precautions, and all other sections were considered noninferior to standard discharge instructions. Of the 156 survey respondents, GPT-generated discharge instructions were assigned favorable ratings, “agree” and “strongly agree,” more frequently along the metric of interpretability of significance in discharge instruction subsections regarding diagnosis, procedures, treatment, post-ED medications or any changes to medications, and return precautions. Survey respondents found GPT-generated instructions to be more understandable when rating procedures, treatment, post-ED medications or medication changes, post-ED follow-up, and return precautions. Satisfaction with GPT-generated discharge instruction subsections was the most favorable in procedures, treatment, post-ED medications or medication changes, and return precautions. Wilcoxon rank-sum test of Likert responses revealed significant differences (P=.01) in the interpretability of significant return precautions in GPT-generated discharge instructions compared to standard discharge instructions but not for other evaluation metrics and discharge instruction subsections. Conclusions: This study demonstrates the potential for LLMs such as ChatGPT to act as a method of augmenting current documentation workflows in the ED to reduce the documentation burden of physicians. The ability of LLMs to provide tailored instructions for patients by improving readability and making instructions more applicable to patients could improve upon the methods of communication that currently exist. %M 39094112 %R 10.2196/60336 %U https://www.jmir.org/2024/1/e60336 %U https://doi.org/10.2196/60336 %U http://www.ncbi.nlm.nih.gov/pubmed/39094112 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e54482 %T Comparing the Efficacy and Efficiency of Human and Generative AI: Qualitative Thematic Analyses %A Prescott,Maximo R %A Yeager,Samantha %A Ham,Lillian %A Rivera Saldana,Carlos D %A Serrano,Vanessa %A Narez,Joey %A Paltin,Dafna %A Delgado,Jorge %A Moore,David J %A Montoya,Jessica %+ HIV Neurobehavioral Research Program, University of California, San Diego, 220 Dickinson Street, San Diego, CA, 92103, United States, 1 7602713336, mrprescott@health.ucsd.edu %K GenAI %K generative artificial intelligence %K ChatGPT %K Bard %K qualitative research %K thematic analysis %K digital health %D 2024 %7 2.8.2024 %9 Original Paper %J JMIR AI %G English %X Background: Qualitative methods are incredibly beneficial to the dissemination and implementation of new digital health interventions; however, these methods can be time intensive and slow down dissemination when timely knowledge from the data sources is needed in ever-changing health systems. Recent advancements in generative artificial intelligence (GenAI) and their underlying large language models (LLMs) may provide a promising opportunity to expedite the qualitative analysis of textual data, but their efficacy and reliability remain unknown. Objective: The primary objectives of our study were to evaluate the consistency in themes, reliability of coding, and time needed for inductive and deductive thematic analyses between GenAI (ie, ChatGPT and Bard) and human coders. Methods: The qualitative data for this study consisted of 40 brief SMS text message reminder prompts used in a digital health intervention for promoting antiretroviral medication adherence among people with HIV who use methamphetamine. Inductive and deductive thematic analyses of these SMS text messages were conducted by 2 independent teams of human coders. An independent human analyst conducted analyses following both approaches using ChatGPT and Bard. The consistency in themes (or the extent to which the themes were the same) and reliability (or agreement in coding of themes) between methods were compared. Results: The themes generated by GenAI (both ChatGPT and Bard) were consistent with 71% (5/7) of the themes identified by human analysts following inductive thematic analysis. The consistency in themes was lower between humans and GenAI following a deductive thematic analysis procedure (ChatGPT: 6/12, 50%; Bard: 7/12, 58%). The percentage agreement (or intercoder reliability) for these congruent themes between human coders and GenAI ranged from fair to moderate (ChatGPT, inductive: 31/66, 47%; ChatGPT, deductive: 22/59, 37%; Bard, inductive: 20/54, 37%; Bard, deductive: 21/58, 36%). In general, ChatGPT and Bard performed similarly to each other across both types of qualitative analyses in terms of consistency of themes (inductive: 6/6, 100%; deductive: 5/6, 83%) and reliability of coding (inductive: 23/62, 37%; deductive: 22/47, 47%). On average, GenAI required significantly less overall time than human coders when conducting qualitative analysis (20, SD 3.5 min vs 567, SD 106.5 min). Conclusions: The promising consistency in the themes generated by human coders and GenAI suggests that these technologies hold promise in reducing the resource intensiveness of qualitative thematic analysis; however, the relatively lower reliability in coding between them suggests that hybrid approaches are necessary. Human coders appeared to be better than GenAI at identifying nuanced and interpretative themes. Future studies should consider how these powerful technologies can be best used in collaboration with human coders to improve the efficiency of qualitative research in hybrid approaches while also mitigating potential ethical risks that they may pose. %M 39094113 %R 10.2196/54482 %U https://ai.jmir.org/2024/1/e54482 %U https://doi.org/10.2196/54482 %U http://www.ncbi.nlm.nih.gov/pubmed/39094113 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e49655 %T Implementing AI in Hospitals to Achieve a Learning Health System: Systematic Review of Current Enablers and Barriers %A Kamel Rahimi,Amir %A Pienaar,Oliver %A Ghadimi,Moji %A Canfell,Oliver J %A Pole,Jason D %A Shrapnel,Sally %A van der Vegt,Anton H %A Sullivan,Clair %+ Queensland Digital Health Centre, Faculty of Medicine, The University of Queensland, Health Sciences Building, Herston Campus, Brisbane, QLD4006, Australia, 61 0733465350, amir.kamel@uq.edu.au %K life cycle %K medical informatics %K decision support system %K clinical %K electronic health records %K artificial intelligence %K machine learning %K routinely collected health data %D 2024 %7 2.8.2024 %9 Review %J J Med Internet Res %G English %X Background: Efforts are underway to capitalize on the computational power of the data collected in electronic medical records (EMRs) to achieve a learning health system (LHS). Artificial intelligence (AI) in health care has promised to improve clinical outcomes, and many researchers are developing AI algorithms on retrospective data sets. Integrating these algorithms with real-time EMR data is rare. There is a poor understanding of the current enablers and barriers to empower this shift from data set–based use to real-time implementation of AI in health systems. Exploring these factors holds promise for uncovering actionable insights toward the successful integration of AI into clinical workflows. Objective: The first objective was to conduct a systematic literature review to identify the evidence of enablers and barriers regarding the real-world implementation of AI in hospital settings. The second objective was to map the identified enablers and barriers to a 3-horizon framework to enable the successful digital health transformation of hospitals to achieve an LHS. Methods: The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines were adhered to. PubMed, Scopus, Web of Science, and IEEE Xplore were searched for studies published between January 2010 and January 2022. Articles with case studies and guidelines on the implementation of AI analytics in hospital settings using EMR data were included. We excluded studies conducted in primary and community care settings. Quality assessment of the identified papers was conducted using the Mixed Methods Appraisal Tool and ADAPTE frameworks. We coded evidence from the included studies that related to enablers of and barriers to AI implementation. The findings were mapped to the 3-horizon framework to provide a road map for hospitals to integrate AI analytics. Results: Of the 1247 studies screened, 26 (2.09%) met the inclusion criteria. In total, 65% (17/26) of the studies implemented AI analytics for enhancing the care of hospitalized patients, whereas the remaining 35% (9/26) provided implementation guidelines. Of the final 26 papers, the quality of 21 (81%) was assessed as poor. A total of 28 enablers was identified; 8 (29%) were new in this study. A total of 18 barriers was identified; 5 (28%) were newly found. Most of these newly identified factors were related to information and technology. Actionable recommendations for the implementation of AI toward achieving an LHS were provided by mapping the findings to a 3-horizon framework. Conclusions: Significant issues exist in implementing AI in health care. Shifting from validating data sets to working with live data is challenging. This review incorporated the identified enablers and barriers into a 3-horizon framework, offering actionable recommendations for implementing AI analytics to achieve an LHS. The findings of this study can assist hospitals in steering their strategic planning toward successful adoption of AI. %M 39094106 %R 10.2196/49655 %U https://www.jmir.org/2024/1/e49655 %U https://doi.org/10.2196/49655 %U http://www.ncbi.nlm.nih.gov/pubmed/39094106 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 11 %N %P e58129 %T Large Language Models Versus Expert Clinicians in Crisis Prediction Among Telemental Health Patients: Comparative Study %A Lee,Christine %A Mohebbi,Matthew %A O'Callaghan,Erin %A Winsberg,Mirène %+ Brightside Health, 2261 Market Street, STE 10222, San Francisco, CA, 94114, United States, 1 415 279 2042, mimi.winsberg@brightside.com %K mental health %K telehealth %K PHQ-9 %K Patient Health Questionnaire-9 %K suicidal ideation %K AI %K LLM %K OpenAI %K GPT-4 %K generative pretrained transformer 4 %K tele-mental health %K large language model %K clinician %K clinicians %K artificial intelligence %K patient information %K suicide %K suicidal %K mental disorder %K suicide attempt %K psychologist %K psychologists %K psychiatrist %K psychiatrists %K psychiatry %K clinical setting %K self-reported %K treatment %K medication %K digital mental health %K machine learning %K language model %K suicide %K crisis %K telemental health %K tele health %K e-health %K digital health %D 2024 %7 2.8.2024 %9 Original Paper %J JMIR Ment Health %G English %X Background: Due to recent advances in artificial intelligence, large language models (LLMs) have emerged as a powerful tool for a variety of language-related tasks, including sentiment analysis, and summarization of provider-patient interactions. However, there is limited research on these models in the area of crisis prediction. Objective: This study aimed to evaluate the performance of LLMs, specifically OpenAI’s generative pretrained transformer 4 (GPT-4), in predicting current and future mental health crisis episodes using patient-provided information at intake among users of a national telemental health platform. Methods: Deidentified patient-provided data were pulled from specific intake questions of the Brightside telehealth platform, including the chief complaint, for 140 patients who indicated suicidal ideation (SI), and another 120 patients who later indicated SI with a plan during the course of treatment. Similar data were pulled for 200 randomly selected patients, treated during the same time period, who never endorsed SI. In total, 6 senior Brightside clinicians (3 psychologists and 3 psychiatrists) were shown patients’ self-reported chief complaint and self-reported suicide attempt history but were blinded to the future course of treatment and other reported symptoms, including SI. They were asked a simple yes or no question regarding their prediction of endorsement of SI with plan, along with their confidence level about the prediction. GPT-4 was provided with similar information and asked to answer the same questions, enabling us to directly compare the performance of artificial intelligence and clinicians. Results: Overall, the clinicians’ average precision (0.7) was higher than that of GPT-4 (0.6) in identifying the SI with plan at intake (n=140) versus no SI (n=200) when using the chief complaint alone, while sensitivity was higher for the GPT-4 (0.62) than the clinicians’ average (0.53). The addition of suicide attempt history increased the clinicians’ average sensitivity (0.59) and precision (0.77) while increasing the GPT-4 sensitivity (0.59) but decreasing the GPT-4 precision (0.54). Performance decreased comparatively when predicting future SI with plan (n=120) versus no SI (n=200) with a chief complaint only for the clinicians (average sensitivity=0.4; average precision=0.59) and the GPT-4 (sensitivity=0.46; precision=0.48). The addition of suicide attempt history increased performance comparatively for the clinicians (average sensitivity=0.46; average precision=0.69) and the GPT-4 (sensitivity=0.74; precision=0.48). Conclusions: GPT-4, with a simple prompt design, produced results on some metrics that approached those of a trained clinician. Additional work must be done before such a model can be piloted in a clinical setting. The model should undergo safety checks for bias, given evidence that LLMs can perpetuate the biases of the underlying data on which they are trained. We believe that LLMs hold promise for augmenting the identification of higher-risk patients at intake and potentially delivering more timely care to patients. %M 38876484 %R 10.2196/58129 %U https://mental.jmir.org/2024/1/e58129 %U https://doi.org/10.2196/58129 %U http://www.ncbi.nlm.nih.gov/pubmed/38876484 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e55840 %T Predicting Workers’ Stress: Application of a High-Performance Algorithm Using Working-Style Characteristics %A Iwamoto,Hiroki %A Nakano,Saki %A Tajima,Ryotaro %A Kiguchi,Ryo %A Yoshida,Yuki %A Kitanishi,Yoshitake %A Aoki,Yasunori %+ Shionogi & Co., Ltd., Awajimachi Office 4F, Midousuji MTR Building, 6-3 Awajimachi 3-chome, Chuo-ku, Osaka, 541-0047, Japan, 81 90 9540 3570, hiroki.iwamoto@shionogi.co.jp %K high-performance algorithm %K Japan %K questionnaire %K stress prediction model %K teleworking %K wearable device %D 2024 %7 2.8.2024 %9 Original Paper %J JMIR AI %G English %X Background: Work characteristics, such as teleworking rate, have been studied in relation to stress. However, the use of work-related data to improve a high-performance stress prediction model that suits an individual’s lifestyle has not been evaluated. Objective: This study aims to develop a novel, high-performance algorithm to predict an employee’s stress among a group of employees with similar working characteristics. Methods: This prospective observational study evaluated participants’ responses to web‑based questionnaires, including attendance records and data collected using a wearable device. Data spanning 12 weeks (between January 17, 2022, and April 10, 2022) were collected from 194 Shionogi Group employees. Participants wore the Fitbit Charge 4 wearable device, which collected data on daily sleep, activity, and heart rate. Daily work shift data included details of working hours. Weekly questionnaire responses included the K6 questionnaire for depression/anxiety, a behavioral questionnaire, and the number of days lunch was missed. The proposed prediction model used a neighborhood cluster (N=20) with working-style characteristics similar to those of the prediction target person. Data from the previous week predicted stress levels the following week. Three models were compared by selecting appropriate training data: (1) single model, (2) proposed method 1, and (3) proposed method 2. Shapley Additive Explanations (SHAP) were calculated for the top 10 extracted features from the Extreme Gradient Boosting (XGBoost) model to evaluate the amount and contribution direction categorized by teleworking rates (mean): low: <0.2 (more than 4 days/week in office), middle: 0.2 to <0.6 (2 to 4 days/week in office), and high: ≥0.6 (less than 2 days/week in office). Results: Data from 190 participants were used, with a teleworking rate ranging from 0% to 79%. The area under the curve (AUC) of the proposed method 2 was 0.84 (true positive vs false positive: 0.77 vs 0.26). Among participants with low teleworking rates, most features extracted were related to sleep, followed by activity and work. Among participants with high teleworking rates, most features were related to activity, followed by sleep and work. SHAP analysis showed that for participants with high teleworking rates, skipping lunch, working more/less than scheduled, higher fluctuations in heart rate, and lower mean sleep duration contributed to stress. In participants with low teleworking rates, coming too early or late to work (before/after 9 AM), a higher/lower than mean heart rate, lower fluctuations in heart rate, and burning more/fewer calories than normal contributed to stress. Conclusions: Forming a neighborhood cluster with similar working styles based on teleworking rates and using it as training data improved the prediction performance. The validity of the neighborhood cluster approach is indicated by differences in the contributing features and their contribution directions among teleworking levels. Trial Registration: UMIN UMIN000046394; https://www.umin.ac.jp/ctr/index.htm %M 39093604 %R 10.2196/55840 %U https://ai.jmir.org/2024/1/e55840 %U https://doi.org/10.2196/55840 %U http://www.ncbi.nlm.nih.gov/pubmed/39093604 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 11 %N %P e56924 %T The Impact of Information Relevancy and Interactivity on Intensivists’ Trust in a Machine Learning–Based Bacteremia Prediction System: Simulation Study %A Katzburg,Omer %A Roimi,Michael %A Frenkel,Amit %A Ilan,Roy %A Bitan,Yuval %K user-interface design %K user-interface designs %K user interface %K human-automation interaction %K human-automation interactions %K trust in automation %K automation %K human-computer interaction %K human-computer interactions %K human-ML %K human-ML interaction %K human-ML interactions %K decision making %K decision support system %K clinical decision support %K decision support %K decision support systems %K machine learning %K ML %K artificial intelligence %K AI %K machine learning algorithm %K machine learning algorithms %K digitization %K digitization of information %D 2024 %7 1.8.2024 %9 %J JMIR Hum Factors %G English %X Background: The exponential growth in computing power and the increasing digitization of information have substantially advanced the machine learning (ML) research field. However, ML algorithms are often considered “black boxes,” and this fosters distrust. In medical domains, in which mistakes can result in fatal outcomes, practitioners may be especially reluctant to trust ML algorithms. Objective: The aim of this study is to explore the effect of user-interface design features on intensivists’ trust in an ML-based clinical decision support system. Methods: A total of 47 physicians from critical care specialties were presented with 3 patient cases of bacteremia in the setting of an ML-based simulation system. Three conditions of the simulation were tested according to combinations of information relevancy and interactivity. Participants’ trust in the system was assessed by their agreement with the system’s prediction and a postexperiment questionnaire. Linear regression models were applied to measure the effects. Results: Participants’ agreement with the system’s prediction did not differ according to the experimental conditions. However, in the postexperiment questionnaire, higher information relevancy ratings and interactivity ratings were associated with higher perceived trust in the system (P<.001 for both). The explicit visual presentation of the features of the ML algorithm on the user interface resulted in lower trust among the participants (P=.05). Conclusions: Information relevancy and interactivity features should be considered in the design of the user interface of ML-based clinical decision support systems to enhance intensivists’ trust. This study sheds light on the connection between information relevancy, interactivity, and trust in human-ML interaction, specifically in the intensive care unit environment. %R 10.2196/56924 %U https://humanfactors.jmir.org/2024/1/e56924 %U https://doi.org/10.2196/56924 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e53562 %T Automated Behavioral Coding to Enhance the Effectiveness of Motivational Interviewing in a Chat-Based Suicide Prevention Helpline: Secondary Analysis of a Clinical Trial %A Pellemans,Mathijs %A Salmi,Salim %A Mérelle,Saskia %A Janssen,Wilco %A van der Mei,Rob %+ Department of Mathematics, Vrije Universiteit Amsterdam, De Boelelaan 1111, Amsterdam, 1081 HV, Netherlands, 31 20 5987700, m.j.pellemans@vu.nl %K motivational interviewing %K behavioral coding %K suicide prevention %K artificial intelligence %K effectiveness %K counseling %K support tool %K online help %K mental health %D 2024 %7 1.8.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: With the rise of computer science and artificial intelligence, analyzing large data sets promises enormous potential in gaining insights for developing and improving evidence-based health interventions. One such intervention is the counseling strategy motivational interviewing (MI), which has been found effective in improving a wide range of health-related behaviors. Despite the simplicity of its principles, MI can be a challenging skill to learn and requires expertise to apply effectively. Objective: This study aims to investigate the performance of artificial intelligence models in classifying MI behavior and explore the feasibility of using these models in online helplines for mental health as an automated support tool for counselors in clinical practice. Methods: We used a coded data set of 253 MI counseling chat sessions from the 113 Suicide Prevention helpline. With 23,982 messages coded with the MI Sequential Code for Observing Process Exchanges codebook, we trained and evaluated 4 machine learning models and 1 deep learning model to classify client- and counselor MI behavior based on language use. Results: The deep learning model BERTje outperformed all machine learning models, accurately predicting counselor behavior (accuracy=0.72, area under the curve [AUC]=0.95, Cohen κ=0.69). It differentiated MI congruent and incongruent counselor behavior (AUC=0.92, κ=0.65) and evocative and nonevocative language (AUC=0.92, κ=0.66). For client behavior, the model achieved an accuracy of 0.70 (AUC=0.89, κ=0.55). The model’s interpretable predictions discerned client change talk and sustain talk, counselor affirmations, and reflection types, facilitating valuable counselor feedback. Conclusions: The results of this study demonstrate that artificial intelligence techniques can accurately classify MI behavior, indicating their potential as a valuable tool for enhancing MI proficiency in online helplines for mental health. Provided that the data set size is sufficiently large with enough training samples for each behavioral code, these methods can be trained and applied to other domains and languages, offering a scalable and cost-effective way to evaluate MI adherence, accelerate behavioral coding, and provide therapists with personalized, quick, and objective feedback. %M 39088244 %R 10.2196/53562 %U https://www.jmir.org/2024/1/e53562 %U https://doi.org/10.2196/53562 %U http://www.ncbi.nlm.nih.gov/pubmed/39088244 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e50236 %T Classification of Patients’ Judgments of Their Physicians in Web-Based Written Reviews Using Natural Language Processing: Algorithm Development and Validation %A Madanay,Farrah %A Tu,Karissa %A Campagna,Ada %A Davis,J Kelly %A Doerstling,Steven S %A Chen,Felicia %A Ubel,Peter A %+ Center for Bioethics and Social Sciences in Medicine, University of Michigan Medical School, 2800 Plymouth Rd, Bldg 14, G016, Ann Arbor, MI, 48109, United States, 1 8083524196, madanafl@med.umich.edu %K web-based physician reviews %K patient judgments %K RoBERTa %K natural language processing %K text classification %K machine learning %K patient experience %K patient-authored reviews %K healthcare quality %K patient care %K psychology %D 2024 %7 1.8.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Patients increasingly rely on web-based physician reviews to choose a physician and share their experiences. However, the unstructured text of these written reviews presents a challenge for researchers seeking to make inferences about patients’ judgments. Methods previously used to identify patient judgments within reviews, such as hand-coding and dictionary-based approaches, have posed limitations to sample size and classification accuracy. Advanced natural language processing methods can help overcome these limitations and promote further analysis of physician reviews on these popular platforms. Objective: This study aims to train, test, and validate an advanced natural language processing algorithm for classifying the presence and valence of 2 dimensions of patient judgments in web-based physician reviews: interpersonal manner and technical competence. Methods: We sampled 345,053 reviews for 167,150 physicians across the United States from Healthgrades.com, a commercial web-based physician rating and review website. We hand-coded 2000 written reviews and used those reviews to train and test a transformer classification algorithm called the Robustly Optimized BERT (Bidirectional Encoder Representations from Transformers) Pretraining Approach (RoBERTa). The 2 fine-tuned models coded the reviews for the presence and positive or negative valence of patients’ interpersonal manner or technical competence judgments of their physicians. We evaluated the performance of the 2 models against 200 hand-coded reviews and validated the models using the full sample of 345,053 RoBERTa-coded reviews. Results: The interpersonal manner model was 90% accurate with precision of 0.89, recall of 0.90, and weighted F1-score of 0.89. The technical competence model was 90% accurate with precision of 0.91, recall of 0.90, and weighted F1-score of 0.90. Positive-valence judgments were associated with higher review star ratings whereas negative-valence judgments were associated with lower star ratings. Analysis of the data by review rating and physician gender corresponded with findings in prior literature. Conclusions: Our 2 classification models coded interpersonal manner and technical competence judgments with high precision, recall, and accuracy. These models were validated using review star ratings and results from previous research. RoBERTa can accurately classify unstructured, web-based review text at scale. Future work could explore the use of this algorithm with other textual data, such as social media posts and electronic health records. %M 39088259 %R 10.2196/50236 %U https://www.jmir.org/2024/1/e50236 %U https://doi.org/10.2196/50236 %U http://www.ncbi.nlm.nih.gov/pubmed/39088259 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e54345 %T Reference Hallucination Score for Medical Artificial Intelligence Chatbots: Development and Usability Study %A Aljamaan,Fadi %A Temsah,Mohamad-Hani %A Altamimi,Ibraheem %A Al-Eyadhy,Ayman %A Jamal,Amr %A Alhasan,Khalid %A Mesallam,Tamer A %A Farahat,Mohamed %A Malki,Khalid H %+ Department of Otolaryngology, College of Medicine, Research Chair of Voice, Swallowing, and Communication Disorders, King Saud University, 12629 Abdulaziz Rd, Al Malaz, Riyadh, P.BOX 2925 Zip 11461, Saudi Arabia, 966 114876100, kalmalki@ksu.edu.sa %K artificial intelligence (AI) chatbots %K reference hallucination %K bibliographic verification %K ChatGPT %K Perplexity %K SciSpace %K Elicit %K Bing %D 2024 %7 31.7.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Artificial intelligence (AI) chatbots have recently gained use in medical practice by health care practitioners. Interestingly, the output of these AI chatbots was found to have varying degrees of hallucination in content and references. Such hallucinations generate doubts about their output and their implementation. Objective: The aim of our study was to propose a reference hallucination score (RHS) to evaluate the authenticity of AI chatbots’ citations. Methods: Six AI chatbots were challenged with the same 10 medical prompts, requesting 10 references per prompt. The RHS is composed of 6 bibliographic items and the reference’s relevance to prompts’ keywords. RHS was calculated for each reference, prompt, and type of prompt (basic vs complex). The average RHS was calculated for each AI chatbot and compared across the different types of prompts and AI chatbots. Results: Bard failed to generate any references. ChatGPT 3.5 and Bing generated the highest RHS (score=11), while Elicit and SciSpace generated the lowest RHS (score=1), and Perplexity generated a middle RHS (score=7). The highest degree of hallucination was observed for reference relevancy to the prompt keywords (308/500, 61.6%), while the lowest was for reference titles (169/500, 33.8%). ChatGPT and Bing had comparable RHS (β coefficient=–0.069; P=.32), while Perplexity had significantly lower RHS than ChatGPT (β coefficient=–0.345; P<.001). AI chatbots generally had significantly higher RHS when prompted with scenarios or complex format prompts (β coefficient=0.486; P<.001). Conclusions: The variation in RHS underscores the necessity for a robust reference evaluation tool to improve the authenticity of AI chatbots. Further, the variations highlight the importance of verifying their output and citations. Elicit and SciSpace had negligible hallucination, while ChatGPT and Bing had critical hallucination levels. The proposed AI chatbots’ RHS could contribute to ongoing efforts to enhance AI’s general reliability in medical research. %M 39083799 %R 10.2196/54345 %U https://medinform.jmir.org/2024/1/e54345 %U https://doi.org/10.2196/54345 %U http://www.ncbi.nlm.nih.gov/pubmed/39083799 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e58764 %T The McMaster Health Information Research Unit: Over a Quarter-Century of Health Informatics Supporting Evidence-Based Medicine %A Lokker,Cynthia %A McKibbon,K Ann %A Afzal,Muhammad %A Navarro,Tamara %A Linkins,Lori-Ann %A Haynes,R Brian %A Iorio,Alfonso %+ Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, 1280 Main St W, CRL 137, Hamilton, ON, L8S 4K1, Canada, 1 2897883272, lokkerc@mcmaster.ca %K health informatics %K evidence-based medicine %K information retrieval %K evidence-based %K health information %K Boolean %K natural language processing %K NLP %K journal %K article %K Health Information Research Unit %K HiRU %D 2024 %7 31.7.2024 %9 Viewpoint %J J Med Internet Res %G English %X Evidence-based medicine (EBM) emerged from McMaster University in the 1980-1990s, which emphasizes the integration of the best research evidence with clinical expertise and patient values. The Health Information Research Unit (HiRU) was created at McMaster University in 1985 to support EBM. Early on, digital health informatics took the form of teaching clinicians how to search MEDLINE with modems and phone lines. Searching and retrieval of published articles were transformed as electronic platforms provided greater access to clinically relevant studies, systematic reviews, and clinical practice guidelines, with PubMed playing a pivotal role. In the early 2000s, the HiRU introduced Clinical Queries—validated search filters derived from the curated, gold-standard, human-appraised Hedges dataset—to enhance the precision of searches, allowing clinicians to hone their queries based on study design, population, and outcomes. Currently, almost 1 million articles are added to PubMed annually. To filter through this volume of heterogenous publications for clinically important articles, the HiRU team and other researchers have been applying classical machine learning, deep learning, and, increasingly, large language models (LLMs). These approaches are built upon the foundation of gold-standard annotated datasets and humans in the loop for active machine learning. In this viewpoint, we explore the evolution of health informatics in supporting evidence search and retrieval processes over the past 25+ years within the HiRU, including the evolving roles of LLMs and responsible artificial intelligence, as we continue to facilitate the dissemination of knowledge, enabling clinicians to integrate the best available evidence into their clinical practice. %M 39083765 %R 10.2196/58764 %U https://www.jmir.org/2024/1/e58764 %U https://doi.org/10.2196/58764 %U http://www.ncbi.nlm.nih.gov/pubmed/39083765 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e50067 %T A Machine Learning Model for Predicting In-Hospital Mortality in Chinese Patients With ST-Segment Elevation Myocardial Infarction: Findings From the China Myocardial Infarction Registry %A Yang,Jingang %A Li,Yingxue %A Li,Xiang %A Tao,Shuiying %A Zhang,Yuan %A Chen,Tiange %A Xie,Guotong %A Xu,Haiyan %A Gao,Xiaojin %A Yang,Yuejin %+ State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences & Peking Union Medical College, No 167, Beilishi Road, Xicheng District, Beijing, 100037, China, 86 13701151408, Yangyjfw@126.com %K ST-elevation myocardial infarction %K in-hospital mortality %K risk prediction %K explainable machine learning %K machine learning %K acute myocardial infarction %K myocardial infarction %K mortality %K risk %K predication model %K china %K clinical practice %K validation %K patient management %K management %D 2024 %7 30.7.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Machine learning (ML) risk prediction models, although much more accurate than traditional statistical methods, are inconvenient to use in clinical practice due to their nontransparency and requirement of a large number of input variables. Objective: We aimed to develop a precise, explainable, and flexible ML model to predict the risk of in-hospital mortality in patients with ST-segment elevation myocardial infarction (STEMI). Methods: This study recruited 18,744 patients enrolled in the 2013 China Acute Myocardial Infarction (CAMI) registry and 12,018 patients from the China Patient-Centered Evaluative Assessment of Cardiac Events (PEACE)-Retrospective Acute Myocardial Infarction Study. The Extreme Gradient Boosting (XGBoost) model was derived from 9616 patients in the CAMI registry (2014, 89 variables) with 5-fold cross-validation and validated on both the 9125 patients in the CAMI registry (89 variables) and the independent China PEACE cohort (10 variables). The Shapley Additive Explanations (SHAP) approach was employed to interpret the complex relationships embedded in the proposed model. Results: In the XGBoost model for predicting all-cause in-hospital mortality, the variables with the top 8 most important scores were age, left ventricular ejection fraction, Killip class, heart rate, creatinine, blood glucose, white blood cell count, and use of angiotensin-converting enzyme inhibitors (ACEIs) and angiotensin II receptor blockers (ARBs). The area under the curve (AUC) on the CAMI validation set was 0.896 (95% CI 0.884-0.909), significantly higher than the previous models. The AUC for the Global Registry of Acute Coronary Events (GRACE) model was 0.809 (95% CI 0.790-0.828), and for the TIMI model, it was 0.782 (95% CI 0.763-0.800). Despite the China PEACE validation set only having 10 available variables, the AUC reached 0.840 (0.829-0.852), showing a substantial improvement to the GRACE (0.762, 95% CI 0.748-0.776) and TIMI (0.789, 95% CI 0.776-0.803) scores. Several novel and nonlinear relationships were discovered between patients’ characteristics and in-hospital mortality, including a U-shape pattern of high-density lipoprotein cholesterol (HDL-C). Conclusions: The proposed ML risk prediction model was highly accurate in predicting in-hospital mortality. Its flexible and explainable characteristics make the model convenient to use in clinical practice and could help guide patient management. Trial Registration: ClinicalTrials.gov NCT01874691; https://clinicaltrials.gov/study/NCT01874691 %M 39079111 %R 10.2196/50067 %U https://www.jmir.org/2024/1/e50067 %U https://doi.org/10.2196/50067 %U http://www.ncbi.nlm.nih.gov/pubmed/39079111 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e48595 %T Early Detection of Pulmonary Embolism in a General Patient Population Immediately Upon Hospital Admission Using Machine Learning to Identify New, Unidentified Risk Factors: Model Development Study %A Ben Yehuda,Ori %A Itelman,Edward %A Vaisman,Adva %A Segal,Gad %A Lerner,Boaz %+ Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, POB 653, Beer-Sheva, 84105, Israel, 972 +972544399763, boaz@bgu.ac.il %K pulmonary embolism %K deep vein thrombosis %K venous thromboembolism %K imbalanced data %K clustering %K risk factors %K Wells score %K revised Genova score %K hospital admission %K machine learning %D 2024 %7 30.7.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Under- or late identification of pulmonary embolism (PE)—a thrombosis of 1 or more pulmonary arteries that seriously threatens patients’ lives—is a major challenge confronting modern medicine. Objective: We aimed to establish accurate and informative machine learning (ML) models to identify patients at high risk for PE as they are admitted to the hospital, before their initial clinical checkup, by using only the information in their medical records. Methods: We collected demographics, comorbidities, and medications data for 2568 patients with PE and 52,598 control patients. We focused on data available prior to emergency department admission, as these are the most universally accessible data. We trained an ML random forest algorithm to detect PE at the earliest possible time during a patient’s hospitalization—at the time of his or her admission. We developed and applied 2 ML-based methods specifically to address the data imbalance between PE and non-PE patients, which causes misdiagnosis of PE. Results: The resulting models predicted PE based on age, sex, BMI, past clinical PE events, chronic lung disease, past thrombotic events, and usage of anticoagulants, obtaining an 80% geometric mean value for the PE and non-PE classification accuracies. Although on hospital admission only 4% (1942/46,639) of the patients had a diagnosis of PE, we identified 2 clustering schemes comprising subgroups with more than 61% (705/1120 in clustering scheme 1; 427/701 and 340/549 in clustering scheme 2) positive patients for PE. One subgroup in the first clustering scheme included 36% (705/1942) of all patients with PE who were characterized by a definite past PE diagnosis, a 6-fold higher prevalence of deep vein thrombosis, and a 3-fold higher prevalence of pneumonia, compared with patients of the other subgroups in this scheme. In the second clustering scheme, 2 subgroups (1 of only men and 1 of only women) included patients who all had a past PE diagnosis and a relatively high prevalence of pneumonia, and a third subgroup included only those patients with a past diagnosis of pneumonia. Conclusions: This study established an ML tool for early diagnosis of PE almost immediately upon hospital admission. Despite the highly imbalanced scenario undermining accurate PE prediction and using information available only from the patient’s medical history, our models were both accurate and informative, enabling the identification of patients already at high risk for PE upon hospital admission, even before the initial clinical checkup was performed. The fact that we did not restrict our patients to those at high risk for PE according to previously published scales (eg, Wells or revised Genova scores) enabled us to accurately assess the application of ML on raw medical data and identify new, previously unidentified risk factors for PE, such as previous pulmonary disease, in general populations. %M 39079116 %R 10.2196/48595 %U https://www.jmir.org/2024/1/e48595 %U https://doi.org/10.2196/48595 %U http://www.ncbi.nlm.nih.gov/pubmed/39079116 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e52500 %T Can Large Language Models Replace Therapists? Evaluating Performance at Simple Cognitive Behavioral Therapy Tasks %A Hodson,Nathan %A Williamson,Simon %+ Warwick Medical School, University of Warwick, Warwick Medical School, Gibbett Hill Road, Coventry, CV4 7AL, United Kingdom, 44 02476574880, nathan.hodson@warwick.ac.uk %K mental health %K psychotherapy %K digital therapy %K CBT %K ChatGPT %K cognitive behavioral therapy %K cognitive behavioural therapy %K LLM %K LLMs %K language model %K language models %K NLP %K natural language processing %K artificial intelligence %K performance %K chatbot %K chatbots %K conversational agent %K conversational agents %D 2024 %7 30.7.2024 %9 Research Letter %J JMIR AI %G English %X The advent of large language models (LLMs) such as ChatGPT has potential implications for psychological therapies such as cognitive behavioral therapy (CBT). We systematically investigated whether LLMs could recognize an unhelpful thought, examine its validity, and reframe it to a more helpful one. LLMs currently have the potential to offer reasonable suggestions for the identification and reframing of unhelpful thoughts but should not be relied on to lead CBT delivery. %M 39078696 %R 10.2196/52500 %U https://ai.jmir.org/2024/1/e52500 %U https://doi.org/10.2196/52500 %U http://www.ncbi.nlm.nih.gov/pubmed/39078696 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 11 %N %P e59479 %T The Opportunities and Risks of Large Language Models in Mental Health %A Lawrence,Hannah R %A Schneider,Renee A %A Rubin,Susan B %A Matarić,Maja J %A McDuff,Daniel J %A Jones Bell,Megan %K artificial intelligence %K AI %K generative AI %K large language models %K mental health %K mental health education %K language model %K mental health care %K health equity %K ethical %K development %K deployment %D 2024 %7 29.7.2024 %9 %J JMIR Ment Health %G English %X Global rates of mental health concerns are rising, and there is increasing realization that existing models of mental health care will not adequately expand to meet the demand. With the emergence of large language models (LLMs) has come great optimism regarding their promise to create novel, large-scale solutions to support mental health. Despite their nascence, LLMs have already been applied to mental health–related tasks. In this paper, we summarize the extant literature on efforts to use LLMs to provide mental health education, assessment, and intervention and highlight key opportunities for positive impact in each area. We then highlight risks associated with LLMs’ application to mental health and encourage the adoption of strategies to mitigate these risks. The urgent need for mental health support must be balanced with responsible development, testing, and deployment of mental health LLMs. It is especially critical to ensure that mental health LLMs are fine-tuned for mental health, enhance mental health equity, and adhere to ethical standards and that people, including those with lived experience with mental health concerns, are involved in all stages from development through deployment. Prioritizing these efforts will minimize potential harms to mental health and maximize the likelihood that LLMs will positively impact mental health globally. %R 10.2196/59479 %U https://mental.jmir.org/2024/1/e59479 %U https://doi.org/10.2196/59479 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e50800 %T Optimizing Clinical Trial Eligibility Design Using Natural Language Processing Models and Real-World Data: Algorithm Development and Validation %A Lee,Kyeryoung %A Liu,Zongzhi %A Mai,Yun %A Jun,Tomi %A Ma,Meng %A Wang,Tongyu %A Ai,Lei %A Calay,Ediz %A Oh,William %A Stolovitzky,Gustavo %A Schadt,Eric %A Wang,Xiaoyan %+ GendDx (Sema4), 333 Ludlow Street, Stamford, CT, 06902, United States, 1 (844) 241 1233, xw108@caa.columbia.edu %K natural language processing %K real-world data %K clinical trial eligibility criteria %K eligibility criteria–specific ontology %K clinical trial protocol optimization %K data-driven approach %D 2024 %7 29.7.2024 %9 Original Paper %J JMIR AI %G English %X Background: Clinical trials are vital for developing new therapies but can also delay drug development. Efficient trial data management, optimized trial protocol, and accurate patient identification are critical for reducing trial timelines. Natural language processing (NLP) has the potential to achieve these objectives. Objective: This study aims to assess the feasibility of using data-driven approaches to optimize clinical trial protocol design and identify eligible patients. This involves creating a comprehensive eligibility criteria knowledge base integrated within electronic health records using deep learning–based NLP techniques. Methods: We obtained data of 3281 industry-sponsored phase 2 or 3 interventional clinical trials recruiting patients with non–small cell lung cancer, prostate cancer, breast cancer, multiple myeloma, ulcerative colitis, and Crohn disease from ClinicalTrials.gov, spanning the period between 2013 and 2020. A customized bidirectional long short-term memory– and conditional random field–based NLP pipeline was used to extract all eligibility criteria attributes and convert hypernym concepts into computable hyponyms along with their corresponding values. To illustrate the simulation of clinical trial design for optimization purposes, we selected a subset of patients with non–small cell lung cancer (n=2775), curated from the Mount Sinai Health System, as a pilot study. Results: We manually annotated the clinical trial eligibility corpus (485/3281, 14.78% trials) and constructed an eligibility criteria–specific ontology. Our customized NLP pipeline, developed based on the eligibility criteria–specific ontology that we created through manual annotation, achieved high precision (0.91, range 0.67-1.00) and recall (0.79, range 0.50-1) scores, as well as a high F1-score (0.83, range 0.67-1), enabling the efficient extraction of granular criteria entities and relevant attributes from 3281 clinical trials. A standardized eligibility criteria knowledge base, compatible with electronic health records, was developed by transforming hypernym concepts into machine-interpretable hyponyms along with their corresponding values. In addition, an interface prototype demonstrated the practicality of leveraging real-world data for optimizing clinical trial protocols and identifying eligible patients. Conclusions: Our customized NLP pipeline successfully generated a standardized eligibility criteria knowledge base by transforming hypernym criteria into machine-readable hyponyms along with their corresponding values. A prototype interface integrating real-world patient information allows us to assess the impact of each eligibility criterion on the number of patients eligible for the trial. Leveraging NLP and real-world data in a data-driven approach holds promise for streamlining the overall clinical trial process, optimizing processes, and improving efficiency in patient identification. %M 39073872 %R 10.2196/50800 %U https://ai.jmir.org/2024/1/e50800 %U https://doi.org/10.2196/50800 %U http://www.ncbi.nlm.nih.gov/pubmed/39073872 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e46871 %T Regulatory Frameworks for AI-Enabled Medical Device Software in China: Comparative Analysis and Review of Implications for Global Manufacturer %A Han,Yu %A Ceross,Aaron %A Bergmann,Jeroen %+ University of Oxford, Old Road Campus, Oxford, OX1 2JD, United Kingdom, 44 789314203, yu.han@eng.ox.ac.uk %K NMPA %K medical device software %K device registration %K registration pathway %K artificial intelligence %K machine learning %K medical device %K device development %K China %K regulations %K medical software %D 2024 %7 29.7.2024 %9 Viewpoint %J JMIR AI %G English %X The China State Council released the new generation artificial intelligence (AI) development plan, outlining China's ambitious aspiration to assume global leadership in AI by the year 2030. This initiative underscores the extensive applicability of AI across diverse domains, including manufacturing, law, and medicine. With China establishing itself as a major producer and consumer of medical devices, there has been a notable increase in software registrations. This study aims to study the proliferation of health care–related software development within China. This work presents an overview of the Chinese regulatory framework for medical device software. The analysis covers both software as a medical device and software in a medical device. A comparative approach is employed to examine the regulations governing medical devices with AI and machine learning in China, the United States, and Europe. The study highlights the significant proliferation of health care–related software development within China, which has led to an increased demand for comprehensive regulatory guidance, particularly for international manufacturers. The comparative analysis reveals distinct regulatory frameworks and requirements across the three regions. This paper provides a useful outline of the current state of regulations for medical software in China and identifies the regulatory challenges posed by the rapid advancements in AI and machine learning technologies. Understanding these challenges is crucial for international manufacturers and stakeholders aiming to navigate the complex regulatory landscape. %M 39073860 %R 10.2196/46871 %U https://ai.jmir.org/2024/1/e46871 %U https://doi.org/10.2196/46871 %U http://www.ncbi.nlm.nih.gov/pubmed/39073860 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e54577 %T Use of Machine Learning Models to Differentiate Neurodevelopment Conditions Through Digitally Collected Data: Cross-Sectional Questionnaire Study %A Grazioli,Silvia %A Crippa,Alessandro %A Buo,Noemi %A Busti Ceccarelli,Silvia %A Molteni,Massimo %A Nobile,Maria %A Salandi,Antonio %A Trabattoni,Sara %A Caselli,Gabriele %A Colombo,Paola %+ Child Psychopathology Unit, Scientific Institute IRCCS Eugenio Medea, Via Don Luigi Monza, 20, Bosisio Parini, 23842, Italy, 39 031877593, alessandro.crippa@lanostrafamiglia.it %K digital-aided clinical assessment %K machine learning %K random forest %K logistic regression %K computational psychometrics %K telemedicine %K neurodevelopmental conditions %K parent-report questionnaires %K attention-deficit/hyperactivity disorder %K autism spectrum disorder %K ASD %K autism %K autistic %K attention deficit %K hyperactivity %K classification %D 2024 %7 29.7.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Diagnosis of child and adolescent psychopathologies involves a multifaceted approach, integrating clinical observations, behavioral assessments, medical history, cognitive testing, and familial context information. Digital technologies, especially internet-based platforms for administering caregiver-rated questionnaires, are increasingly used in this field, particularly during the screening phase. The ascent of digital platforms for data collection has propelled advanced psychopathology classification methods such as supervised machine learning (ML) into the forefront of both research and clinical environments. This shift, recently called psycho-informatics, has been facilitated by gradually incorporating computational devices into clinical workflows. However, an actual integration between telemedicine and the ML approach has yet to be fulfilled. Objective: Under these premises, exploring the potential of ML applications for analyzing digitally collected data may have significant implications for supporting the clinical practice of diagnosing early psychopathology. The purpose of this study was, therefore, to exploit ML models for the classification of attention-deficit/hyperactivity disorder (ADHD) and autism spectrum disorder (ASD) using internet-based parent-reported socio-anamnestic data, aiming at obtaining accurate predictive models for new help-seeking families. Methods: In this retrospective, single-center observational study, socio-anamnestic data were collected from 1688 children and adolescents referred for suspected neurodevelopmental conditions. The data included sociodemographic, clinical, environmental, and developmental factors, collected remotely through the first Italian internet-based screening tool for neurodevelopmental disorders, the Medea Information and Clinical Assessment On-Line (MedicalBIT). Random forest (RF), decision tree, and logistic regression models were developed and evaluated using classification accuracy, sensitivity, specificity, and importance of independent variables. Results: The RF model demonstrated robust accuracy, achieving 84% (95% CI 82-85; P<.001) for ADHD and 86% (95% CI 84-87; P<.001) for ASD classifications. Sensitivities were also high, with 93% for ADHD and 95% for ASD. In contrast, the DT and LR models exhibited lower accuracy (DT 74%, 95% CI 71-77; P<.001 for ADHD; DT 79%, 95% CI 77-82; P<.001 for ASD; LR 61%, 95% CI 57-64; P<.001 for ADHD; LR 63%, 95% CI 60-67; P<.001 for ASD) and sensitivities (DT: 82% for ADHD and 88% for ASD; LR: 62% for ADHD and 68% for ASD). The independent variables considered for classification differed in importance between the 2 models, reflecting the distinct characteristics of the 3 ML approaches. Conclusions: This study highlights the potential of ML models, particularly RF, in enhancing the diagnostic process of child and adolescent psychopathology. Altogether, the current findings underscore the significance of leveraging digital platforms and computational techniques in the diagnostic process. While interpretability remains crucial, the developed approach might provide valuable screening tools for clinicians, highlighting the significance of embedding computational techniques in the diagnostic process. %M 39073858 %R 10.2196/54577 %U https://formative.jmir.org/2024/1/e54577 %U https://doi.org/10.2196/54577 %U http://www.ncbi.nlm.nih.gov/pubmed/39073858 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e52896 %T Unsupervised Feature Selection to Identify Important ICD-10 and ATC Codes for Machine Learning on a Cohort of Patients With Coronary Heart Disease: Retrospective Study %A Ghasemi,Peyman %A Lee,Joon %K unsupervised feature selection %K ICD-10 %K International Classification of Diseases %K ATC %K Anatomical Therapeutic Chemical %K concrete autoencoder %K Laplacian score %K unsupervised feature selection for multicluster data %K autoencoder-inspired unsupervised feature selection %K principal feature analysis %K machine learning %K artificial intelligence %K case study %K coronary artery disease %K artery disease %K patient cohort %K artery %K mortality prediction %K mortality %K data set %K interpretability %K International Classification of Diseases, Tenth Revision %D 2024 %7 26.7.2024 %9 %J JMIR Med Inform %G English %X Background: The application of machine learning in health care often necessitates the use of hierarchical codes such as the International Classification of Diseases (ICD) and Anatomical Therapeutic Chemical (ATC) systems. These codes classify diseases and medications, respectively, thereby forming extensive data dimensions. Unsupervised feature selection tackles the “curse of dimensionality” and helps to improve the accuracy and performance of supervised learning models by reducing the number of irrelevant or redundant features and avoiding overfitting. Techniques for unsupervised feature selection, such as filter, wrapper, and embedded methods, are implemented to select the most important features with the most intrinsic information. However, they face challenges due to the sheer volume of ICD and ATC codes and the hierarchical structures of these systems. Objective: The objective of this study was to compare several unsupervised feature selection methods for ICD and ATC code databases of patients with coronary artery disease in different aspects of performance and complexity and select the best set of features representing these patients. Methods: We compared several unsupervised feature selection methods for 2 ICD and 1 ATC code databases of 51,506 patients with coronary artery disease in Alberta, Canada. Specifically, we used the Laplacian score, unsupervised feature selection for multicluster data, autoencoder-inspired unsupervised feature selection, principal feature analysis, and concrete autoencoders with and without ICD or ATC tree weight adjustment to select the 100 best features from over 9000 ICD and 2000 ATC codes. We assessed the selected features based on their ability to reconstruct the initial feature space and predict 90-day mortality following discharge. We also compared the complexity of the selected features by mean code level in the ICD or ATC tree and the interpretability of the features in the mortality prediction task using Shapley analysis. Results: In feature space reconstruction and mortality prediction, the concrete autoencoder–based methods outperformed other techniques. Particularly, a weight-adjusted concrete autoencoder variant demonstrated improved reconstruction accuracy and significant predictive performance enhancement, confirmed by DeLong and McNemar tests (P<.05). Concrete autoencoders preferred more general codes, and they consistently reconstructed all features accurately. Additionally, features selected by weight-adjusted concrete autoencoders yielded higher Shapley values in mortality prediction than most alternatives. Conclusions: This study scrutinized 5 feature selection methods in ICD and ATC code data sets in an unsupervised context. Our findings underscore the superiority of the concrete autoencoder method in selecting salient features that represent the entire data set, offering a potential asset for subsequent machine learning research. We also present a novel weight adjustment approach for the concrete autoencoders specifically tailored for ICD and ATC code data sets to enhance the generalizability and interpretability of the selected features. %R 10.2196/52896 %U https://medinform.jmir.org/2024/1/e52896 %U https://doi.org/10.2196/52896 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 7 %N %P e54872 %T Development and Validation of an Explainable Machine Learning Model for Predicting Myocardial Injury After Noncardiac Surgery in Two Centers in China: Retrospective Study %A Liu,Chang %A Zhang,Kai %A Yang,Xiaodong %A Meng,Bingbing %A Lou,Jingsheng %A Liu,Yanhong %A Cao,Jiangbei %A Liu,Kexuan %A Mi,Weidong %A Li,Hao %K myocardial injury after noncardiac surgery %K older patients %K machine learning %K personalized prediction %K myocardial injury %K risk prediction %K noncardiac surgery %D 2024 %7 26.7.2024 %9 %J JMIR Aging %G English %X Background: Myocardial injury after noncardiac surgery (MINS) is an easily overlooked complication but closely related to postoperative cardiovascular adverse outcomes; therefore, the early diagnosis and prediction are particularly important. Objective: We aimed to develop and validate an explainable machine learning (ML) model for predicting MINS among older patients undergoing noncardiac surgery. Methods: The retrospective cohort study included older patients who had noncardiac surgery from 1 northern center and 1 southern center in China. The data sets from center 1 were divided into a training set and an internal validation set. The data set from center 2 was used as an external validation set. Before modeling, the least absolute shrinkage and selection operator and recursive feature elimination methods were used to reduce dimensions of data and select key features from all variables. Prediction models were developed based on the extracted features using several ML algorithms, including category boosting, random forest, logistic regression, naïve Bayes, light gradient boosting machine, extreme gradient boosting, support vector machine, and decision tree. Prediction performance was assessed by the area under the receiver operating characteristic (AUROC) curve as the main evaluation metric to select the best algorithms. The model performance was verified by internal and external validation data sets with the best algorithm and compared to the Revised Cardiac Risk Index. The Shapley Additive Explanations (SHAP) method was applied to calculate values for each feature, representing the contribution to the predicted risk of complication, and generate personalized explanations. Results: A total of 19,463 eligible patients were included; among those, 12,464 patients in center 1 were included as the training set; 4754 patients in center 1 were included as the internal validation set; and 2245 in center 2 were included as the external validation set. The best-performing model for prediction was the CatBoost algorithm, achieving the highest AUROC of 0.805 (95% CI 0.778‐0.831) in the training set, validating with an AUROC of 0.780 in the internal validation set and 0.70 in external validation set. Additionally, CatBoost demonstrated superior performance compared to the Revised Cardiac Risk Index (AUROC 0.636; P<.001). The SHAP values indicated the ranking of the level of importance of each variable, with preoperative serum creatinine concentration, red blood cell distribution width, and age accounting for the top three. The results from the SHAP method can predict events with positive values or nonevents with negative values, providing an explicit explanation of individualized risk predictions. Conclusions: The ML models can provide a personalized and fairly accurate risk prediction of MINS, and the explainable perspective can help identify potentially modifiable sources of risk at the patient level. %R 10.2196/54872 %U https://aging.jmir.org/2024/1/e54872 %U https://doi.org/10.2196/54872 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e47645 %T Pitfalls in Developing Machine Learning Models for Predicting Cardiovascular Diseases: Challenge and Solutions %A Cai,Yu-Qing %A Gong,Da-Xin %A Tang,Li-Ying %A Cai,Yue %A Li,Hui-Jun %A Jing,Tian-Ci %A Gong,Mengchun %A Hu,Wei %A Zhang,Zhen-Wei %A Zhang,Xingang %A Zhang,Guang-Wei %+ Smart Hospital Management Department, The First Hospital of China Medical University, , Shenyang, , China, 86 24 88283350, gwzhang@cmu.edu.cn %K cardiovascular diseases %K risk prediction models %K machine learning %K problem %K solution %D 2024 %7 26.7.2024 %9 Viewpoint %J J Med Internet Res %G English %X In recent years, there has been explosive development in artificial intelligence (AI), which has been widely applied in the health care field. As a typical AI technology, machine learning models have emerged with great potential in predicting cardiovascular diseases by leveraging large amounts of medical data for training and optimization, which are expected to play a crucial role in reducing the incidence and mortality rates of cardiovascular diseases. Although the field has become a research hot spot, there are still many pitfalls that researchers need to pay close attention to. These pitfalls may affect the predictive performance, credibility, reliability, and reproducibility of the studied models, ultimately reducing the value of the research and affecting the prospects for clinical application. Therefore, identifying and avoiding these pitfalls is a crucial task before implementing the research. However, there is currently a lack of a comprehensive summary on this topic. This viewpoint aims to analyze the existing problems in terms of data quality, data set characteristics, model design, and statistical methods, as well as clinical implications, and provide possible solutions to these problems, such as gathering objective data, improving training, repeating measurements, increasing sample size, preventing overfitting using statistical methods, using specific AI algorithms to address targeted issues, standardizing outcomes and evaluation criteria, and enhancing fairness and replicability, with the goal of offering reference and assistance to researchers, algorithm developers, policy makers, and clinical practitioners. %M 38869157 %R 10.2196/47645 %U https://www.jmir.org/2024/1/e47645 %U https://doi.org/10.2196/47645 %U http://www.ncbi.nlm.nih.gov/pubmed/38869157 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e55933 %T Impact of Large Language Models on Medical Education and Teaching Adaptations %A Zhui,Li %A Yhap,Nina %A Liping,Liu %A Zhengjie,Wang %A Zhonghao,Xiong %A Xiaoshu,Yuan %A Hong,Cui %A Xuexiu,Liu %A Wei,Ren %K large language models %K medical education %K opportunities %K challenges %K critical thinking %K educator %D 2024 %7 25.7.2024 %9 %J JMIR Med Inform %G English %X This viewpoint article explores the transformative role of large language models (LLMs) in the field of medical education, highlighting their potential to enhance teaching quality, promote personalized learning paths, strengthen clinical skills training, optimize teaching assessment processes, boost the efficiency of medical research, and support continuing medical education. However, the use of LLMs entails certain challenges, such as questions regarding the accuracy of information, the risk of overreliance on technology, a lack of emotional recognition capabilities, and concerns related to ethics, privacy, and data security. This article emphasizes that to maximize the potential of LLMs and overcome these challenges, educators must exhibit leadership in medical education, adjust their teaching strategies flexibly, cultivate students’ critical thinking, and emphasize the importance of practical experience, thus ensuring that students can use LLMs correctly and effectively. By adopting such a comprehensive and balanced approach, educators can train health care professionals who are proficient in the use of advanced technologies and who exhibit solid professional ethics and practical skills, thus laying a strong foundation for these professionals to overcome future challenges in the health care sector. %R 10.2196/55933 %U https://medinform.jmir.org/2024/1/e55933 %U https://doi.org/10.2196/55933 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e54885 %T Use of Deep Neural Networks to Predict Obesity With Short Audio Recordings: Development and Usability Study %A Huang,Jingyi %A Guo,Peiqi %A Zhang,Sheng %A Ji,Mengmeng %A An,Ruopeng %+ School of Journalism and Communication, Shanghai University of Sport, 650 Hengren Road, Yangpu District, Shanghai, 200000, China, 86 18017355353, zhsheng1@126.com %K obesity %K obese %K overweight %K voice %K vocal %K vocal cord %K vocal cords %K voice-based %K machine learning %K ML %K artificial intelligence %K AI %K algorithm %K algorithms %K predictive model %K predictive models %K predictive analytics %K predictive system %K practical model %K practical models %K early warning %K early detection %K deep neural network %K deep neural networks %K DNN %K artificial neural network %K artificial neural networks %K deep learning %D 2024 %7 25.7.2024 %9 Original Paper %J JMIR AI %G English %X Background: The escalating global prevalence of obesity has necessitated the exploration of novel diagnostic approaches. Recent scientific inquiries have indicated potential alterations in voice characteristics associated with obesity, suggesting the feasibility of using voice as a noninvasive biomarker for obesity detection. Objective: This study aims to use deep neural networks to predict obesity status through the analysis of short audio recordings, investigating the relationship between vocal characteristics and obesity. Methods: A pilot study was conducted with 696 participants, using self-reported BMI to classify individuals into obesity and nonobesity groups. Audio recordings of participants reading a short script were transformed into spectrograms and analyzed using an adapted YOLOv8 model (Ultralytics). The model performance was evaluated using accuracy, recall, precision, and F1-scores. Results: The adapted YOLOv8 model demonstrated a global accuracy of 0.70 and a macro F1-score of 0.65. It was more effective in identifying nonobesity (F1-score of 0.77) than obesity (F1-score of 0.53). This moderate level of accuracy highlights the potential and challenges in using vocal biomarkers for obesity detection. Conclusions: While the study shows promise in the field of voice-based medical diagnostics for obesity, it faces limitations such as reliance on self-reported BMI data and a small, homogenous sample size. These factors, coupled with variability in recording quality, necessitate further research with more robust methodologies and diverse samples to enhance the validity of this novel approach. The findings lay a foundational step for future investigations in using voice as a noninvasive biomarker for obesity detection. %M 39052997 %R 10.2196/54885 %U https://ai.jmir.org/2024/1/e54885 %U https://doi.org/10.2196/54885 %U http://www.ncbi.nlm.nih.gov/pubmed/39052997 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e59050 %T ChatGPT for Automated Qualitative Research: Content Analysis %A Bijker,Rimke %A Merkouris,Stephanie S %A Dowling,Nicki A %A Rodda,Simone N %+ Department of Psychology and Neuroscience, Auckland University of Technology, 90 Akoranga Drive, Auckland, 0627, New Zealand, 64 9921 9999 ext 29079, simone.rodda@aut.ac.nz %K ChatGPT %K natural language processing %K qualitative content analysis %K Theoretical Domains Framework %D 2024 %7 25.7.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Data analysis approaches such as qualitative content analysis are notoriously time and labor intensive because of the time to detect, assess, and code a large amount of data. Tools such as ChatGPT may have tremendous potential in automating at least some of the analysis. Objective: The aim of this study was to explore the utility of ChatGPT in conducting qualitative content analysis through the analysis of forum posts from people sharing their experiences on reducing their sugar consumption. Methods: Inductive and deductive content analysis were performed on 537 forum posts to detect mechanisms of behavior change. Thorough prompt engineering provided appropriate instructions for ChatGPT to execute data analysis tasks. Data identification involved extracting change mechanisms from a subset of forum posts. The precision of the extracted data was assessed through comparison with human coding. On the basis of the identified change mechanisms, coding schemes were developed with ChatGPT using data-driven (inductive) and theory-driven (deductive) content analysis approaches. The deductive approach was informed by the Theoretical Domains Framework using both an unconstrained coding scheme and a structured coding matrix. In total, 10 coding schemes were created from a subset of data and then applied to the full data set in 10 new conversations, resulting in 100 conversations each for inductive and unconstrained deductive analysis. A total of 10 further conversations coded the full data set into the structured coding matrix. Intercoder agreement was evaluated across and within coding schemes. ChatGPT output was also evaluated by the researchers to assess whether it reflected prompt instructions. Results: The precision of detecting change mechanisms in the data subset ranged from 66% to 88%. Overall κ scores for intercoder agreement ranged from 0.72 to 0.82 across inductive coding schemes and from 0.58 to 0.73 across unconstrained coding schemes and structured coding matrix. Coding into the best-performing coding scheme resulted in category-specific κ scores ranging from 0.67 to 0.95 for the inductive approach and from 0.13 to 0.87 for the deductive approaches. ChatGPT largely followed prompt instructions in producing a description of each coding scheme, although the wording for the inductively developed coding schemes was lengthier than specified. Conclusions: ChatGPT appears fairly reliable in assisting with qualitative analysis. ChatGPT performed better in developing an inductive coding scheme that emerged from the data than adapting an existing framework into an unconstrained coding scheme or coding directly into a structured matrix. The potential for ChatGPT to act as a second coder also appears promising, with almost perfect agreement in at least 1 coding scheme. The findings suggest that ChatGPT could prove useful as a tool to assist in each phase of qualitative content analysis, but multiple iterations are required to determine the reliability of each stage of analysis. %M 39052327 %R 10.2196/59050 %U https://www.jmir.org/2024/1/e59050 %U https://doi.org/10.2196/59050 %U http://www.ncbi.nlm.nih.gov/pubmed/39052327 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e57721 %T Comparison of the Quality of Discharge Letters Written by Large Language Models and Junior Clinicians: Single-Blinded Study %A Tung,Joshua Yi Min %A Gill,Sunil Ravinder %A Sng,Gerald Gui Ren %A Lim,Daniel Yan Zheng %A Ke,Yuhe %A Tan,Ting Fang %A Jin,Liyuan %A Elangovan,Kabilan %A Ong,Jasmine Chiat Ling %A Abdullah,Hairil Rizal %A Ting,Daniel Shu Wei %A Chong,Tsung Wen %+ Department of Urology, Singapore General Hospital, 16 College Road, Block 4 Level 1, Singapore, 169854, Singapore, 65 62223322, joshua.tung@gmail.com %K artificial intelligence %K AI %K discharge summaries %K continuity of care %K large language model %K LLM %K junior clinician %K letter writing %K single-blinded %K ChatGPT %K urology %K primary care %K fictional electronic record %K consultation note %K referral letter %K simulated environment %D 2024 %7 24.7.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Discharge letters are a critical component in the continuity of care between specialists and primary care providers. However, these letters are time-consuming to write, underprioritized in comparison to direct clinical care, and are often tasked to junior doctors. Prior studies assessing the quality of discharge summaries written for inpatient hospital admissions show inadequacies in many domains. Large language models such as GPT have the ability to summarize large volumes of unstructured free text such as electronic medical records and have the potential to automate such tasks, providing time savings and consistency in quality. Objective: The aim of this study was to assess the performance of GPT-4 in generating discharge letters written from urology specialist outpatient clinics to primary care providers and to compare their quality against letters written by junior clinicians. Methods: Fictional electronic records were written by physicians simulating 5 common urology outpatient cases with long-term follow-up. Records comprised simulated consultation notes, referral letters and replies, and relevant discharge summaries from inpatient admissions. GPT-4 was tasked to write discharge letters for these cases with a specified target audience of primary care providers who would be continuing the patient’s care. Prompts were written for safety, content, and style. Concurrently, junior clinicians were provided with the same case records and instructional prompts. GPT-4 output was assessed for instances of hallucination. A blinded panel of primary care physicians then evaluated the letters using a standardized questionnaire tool. Results: GPT-4 outperformed human counterparts in information provision (mean 4.32, SD 0.95 vs 3.70, SD 1.27; P=.03) and had no instances of hallucination. There were no statistically significant differences in the mean clarity (4.16, SD 0.95 vs 3.68, SD 1.24; P=.12), collegiality (4.36, SD 1.00 vs 3.84, SD 1.22; P=.05), conciseness (3.60, SD 1.12 vs 3.64, SD 1.27; P=.71), follow-up recommendations (4.16, SD 1.03 vs 3.72, SD 1.13; P=.08), and overall satisfaction (3.96, SD 1.14 vs 3.62, SD 1.34; P=.36) between the letters generated by GPT-4 and humans, respectively. Conclusions: Discharge letters written by GPT-4 had equivalent quality to those written by junior clinicians, without any hallucinations. This study provides a proof of concept that large language models can be useful and safe tools in clinical documentation. %M 39047282 %R 10.2196/57721 %U https://www.jmir.org/2024/1/e57721 %U https://doi.org/10.2196/57721 %U http://www.ncbi.nlm.nih.gov/pubmed/39047282 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e49142 %T Prediction of In-Hospital Cardiac Arrest in the Intensive Care Unit: Machine Learning–Based Multimodal Approach %A Lee,Hsin-Ying %A Kuo,Po-Chih %A Qian,Frank %A Li,Chien-Hung %A Hu,Jiun-Ruey %A Hsu,Wan-Ting %A Jhou,Hong-Jie %A Chen,Po-Huang %A Lee,Cho-Hao %A Su,Chin-Hua %A Liao,Po-Chun %A Wu,I-Ju %A Lee,Chien-Chang %K cardiac arrest %K machine learning %K intensive care %K mortality %K medical emergency team %K early warning scores %D 2024 %7 23.7.2024 %9 %J JMIR Med Inform %G English %X Background: Early identification of impending in-hospital cardiac arrest (IHCA) improves clinical outcomes but remains elusive for practicing clinicians. Objective: We aimed to develop a multimodal machine learning algorithm based on ensemble techniques to predict the occurrence of IHCA. Methods: Our model was developed by the Multiparameter Intelligent Monitoring of Intensive Care (MIMIC)–IV database and validated in the Electronic Intensive Care Unit Collaborative Research Database (eICU-CRD). Baseline features consisting of patient demographics, presenting illness, and comorbidities were collected to train a random forest model. Next, vital signs were extracted to train a long short-term memory model. A support vector machine algorithm then stacked the results to form the final prediction model. Results: Of 23,909 patients in the MIMIC-IV database and 10,049 patients in the eICU-CRD database, 452 and 85 patients, respectively, had IHCA. At 13 hours in advance of an IHCA event, our algorithm had already demonstrated an area under the receiver operating characteristic curve of 0.85 (95% CI 0.815‐0.885) in the MIMIC-IV database. External validation with the eICU-CRD and National Taiwan University Hospital databases also presented satisfactory results, showing area under the receiver operating characteristic curve values of 0.81 (95% CI 0.763-0.851) and 0.945 (95% CI 0.934-0.956), respectively. Conclusions: Using only vital signs and information available in the electronic medical record, our model demonstrates it is possible to detect a trajectory of clinical deterioration up to 13 hours in advance. This predictive tool, which has undergone external validation, could forewarn and help clinicians identify patients in need of assessment to improve their overall prognosis. %R 10.2196/49142 %U https://medinform.jmir.org/2024/1/e49142 %U https://doi.org/10.2196/49142 %0 Journal Article %@ 2292-9495 %I %V 11 %N %P e51086 %T AI Hesitancy and Acceptability—Perceptions of AI Chatbots for Chronic Health Management and Long COVID Support: Survey Study %A Wu,Philip Fei %A Summers,Charlotte %A Panesar,Arjun %A Kaura,Amit %A Zhang,Li %K AI hesitancy %K chatbot %K long COVID %K diabetes %K chronic disease management %K technology acceptance %K post–COVID-19 condition %K artificial intelligence %D 2024 %7 23.7.2024 %9 %J JMIR Hum Factors %G English %X Background: Artificial intelligence (AI) chatbots have the potential to assist individuals with chronic health conditions by providing tailored information, monitoring symptoms, and offering mental health support. Despite their potential benefits, research on public attitudes toward health care chatbots is still limited. To effectively support individuals with long-term health conditions like long COVID (or post–COVID-19 condition), it is crucial to understand their perspectives and preferences regarding the use of AI chatbots. Objective: This study has two main objectives: (1) provide insights into AI chatbot acceptance among people with chronic health conditions, particularly adults older than 55 years and (2) explore the perceptions of using AI chatbots for health self-management and long COVID support. Methods: A web-based survey study was conducted between January and March 2023, specifically targeting individuals with diabetes and other chronic conditions. This particular population was chosen due to their potential awareness and ability to self-manage their condition. The survey aimed to capture data at multiple intervals, taking into consideration the public launch of ChatGPT, which could have potentially impacted public opinions during the project timeline. The survey received 1310 clicks and garnered 900 responses, resulting in a total of 888 usable data points. Results: Although past experience with chatbots (P<.001, 95% CI .110-.302) and online information seeking (P<.001, 95% CI .039-.084) are strong indicators of respondents’ future adoption of health chatbots, they are in general skeptical or unsure about the use of AI chatbots for health care purposes. Less than one-third of the respondents (n=203, 30.1%) indicated that they were likely to use a health chatbot in the next 12 months if available. Most were uncertain about a chatbot’s capability to provide accurate medical advice. However, people seemed more receptive to using voice-based chatbots for mental well-being, health data collection, and analysis. Half of the respondents with long COVID showed interest in using emotionally intelligent chatbots. Conclusions: AI hesitancy is not uniform across all health domains and user groups. Despite persistent AI hesitancy, there are promising opportunities for chatbots to offer support for chronic conditions in areas of lifestyle enhancement and mental well-being, potentially through voice-based user interfaces. %R 10.2196/51086 %U https://humanfactors.jmir.org/2024/1/e51086 %U https://doi.org/10.2196/51086 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e50130 %T Human-AI Teaming in Critical Care: A Comparative Analysis of Data Scientists’ and Clinicians’ Perspectives on AI Augmentation and Automation %A Bienefeld,Nadine %A Keller,Emanuela %A Grote,Gudela %+ Department of Management, Technology, and Economics, ETH Zurich, , Zurich, Switzerland, 41 44 633 45 95, nbienefeld@ethz.ch %K AI in health care %K human-AI teaming %K sociotechnical systems %K intensive care %K ICU %K AI adoption %K AI implementation %K augmentation %K automation, health care policy and regulatory foresight %K explainable AI %K explainable %K human-AI %K human-computer %K human-machine %K ethical implications of AI in health care %K ethical %K ethic %K ethics %K artificial intelligence %K policy %K foresight %K policies %K recommendation %K recommendations %K policy maker %K policy makers %K Delphi %K sociotechnical %D 2024 %7 22.7.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) holds immense potential for enhancing clinical and administrative health care tasks. However, slow adoption and implementation challenges highlight the need to consider how humans can effectively collaborate with AI within broader socio-technical systems in health care. Objective: In the example of intensive care units (ICUs), we compare data scientists’ and clinicians’ assessments of the optimal utilization of human and AI capabilities by determining suitable levels of human-AI teaming for safely and meaningfully augmenting or automating 6 core tasks. The goal is to provide actionable recommendations for policy makers and health care practitioners regarding AI design and implementation. Methods: In this multimethod study, we combine a systematic task analysis across 6 ICUs with an international Delphi survey involving 19 health data scientists from the industry and academia and 61 ICU clinicians (25 physicians and 36 nurses) to define and assess optimal levels of human-AI teaming (level 1=no performance benefits; level 2=AI augments human performance; level 3=humans augment AI performance; level 4=AI performs without human input). Stakeholder groups also considered ethical and social implications. Results: Both stakeholder groups chose level 2 and 3 human-AI teaming for 4 out of 6 core tasks in the ICU. For one task (monitoring), level 4 was the preferred design choice. For the task of patient interactions, both data scientists and clinicians agreed that AI should not be used regardless of technological feasibility due to the importance of the physician-patient and nurse-patient relationship and ethical concerns. Human-AI design choices rely on interpretability, predictability, and control over AI systems. If these conditions are not met and AI performs below human-level reliability, a reduction to level 1 or shifting accountability away from human end users is advised. If AI performs at or beyond human-level reliability and these conditions are not met, shifting to level 4 automation should be considered to ensure safe and efficient human-AI teaming. Conclusions: By considering the sociotechnical system and determining appropriate levels of human-AI teaming, our study showcases the potential for improving the safety and effectiveness of AI usage in ICUs and broader health care settings. Regulatory measures should prioritize interpretability, predictability, and control if clinicians hold full accountability. Ethical and social implications must be carefully evaluated to ensure effective collaboration between humans and AI, particularly considering the most recent advancements in generative AI. %M 39038285 %R 10.2196/50130 %U https://www.jmir.org/2024/1/e50130 %U https://doi.org/10.2196/50130 %U http://www.ncbi.nlm.nih.gov/pubmed/39038285 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e58158 %T Evaluating and Enhancing Large Language Models’ Performance in Domain-Specific Medicine: Development and Usability Study With DocOA %A Chen,Xi %A Wang,Li %A You,MingKe %A Liu,WeiZhi %A Fu,Yu %A Xu,Jie %A Zhang,Shaoting %A Chen,Gang %A Li,Kang %A Li,Jian %+ Sports Medicine Center, West China Hospital, Sichuan University, No. 37, Guoxue Alley, Wuhou District, Chengdu, 610041, China, 86 18980601388, lijian_sportsmed@163.com %K large language model %K retrieval-augmented generation %K domain-specific benchmark framework %K osteoarthritis management %D 2024 %7 22.7.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: The efficacy of large language models (LLMs) in domain-specific medicine, particularly for managing complex diseases such as osteoarthritis (OA), remains largely unexplored. Objective: This study focused on evaluating and enhancing the clinical capabilities and explainability of LLMs in specific domains, using OA management as a case study. Methods: A domain-specific benchmark framework was developed to evaluate LLMs across a spectrum from domain-specific knowledge to clinical applications in real-world clinical scenarios. DocOA, a specialized LLM designed for OA management integrating retrieval-augmented generation and instructional prompts, was developed. It can identify the clinical evidence upon which its answers are based through retrieval-augmented generation, thereby demonstrating the explainability of those answers. The study compared the performance of GPT-3.5, GPT-4, and a specialized assistant, DocOA, using objective and human evaluations. Results: Results showed that general LLMs such as GPT-3.5 and GPT-4 were less effective in the specialized domain of OA management, particularly in providing personalized treatment recommendations. However, DocOA showed significant improvements. Conclusions: This study introduces a novel benchmark framework that assesses the domain-specific abilities of LLMs in multiple aspects, highlights the limitations of generalized LLMs in clinical contexts, and demonstrates the potential of tailored approaches for developing domain-specific medical LLMs. %M 38833165 %R 10.2196/58158 %U https://www.jmir.org/2024/1/e58158 %U https://doi.org/10.2196/58158 %U http://www.ncbi.nlm.nih.gov/pubmed/38833165 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 10 %N %P e43070 %T Artificial Intelligence–Based Co-Facilitator (AICF) for Detecting and Monitoring Group Cohesion Outcomes in Web-Based Cancer Support Groups: Single-Arm Trial Study %A Leung,Yvonne W %A Wouterloot,Elise %A Adikari,Achini %A Hong,Jinny %A Asokan,Veenaajaa %A Duan,Lauren %A Lam,Claire %A Kim,Carlina %A Chan,Kai P %A De Silva,Daswin %A Trachtenberg,Lianne %A Rennie,Heather %A Wong,Jiahui %A Esplen,Mary Jane %+ de Souza Institute, University Health Network, de Souza Institute c/o Toronto General Hospital, 200 Elizabeth St RFE 3-440, Toronto, ON, M5G 2C4, Canada, 1 647 299 1360, yw.leung@utoronto.ca %K group cohesion %K LIWC %K online support group %K natural language processing %K NLP %K emotion analysis %K machine learning %K sentiment analysis %K emotion detection %K integrating human knowledge %K emotion lining %K cancer %K oncology %K support group %K artificial intelligence %K AI %K therapy %K online therapist %K emotion %K affect %K speech tagging %K speech tag %K topic modeling %K named entity recognition %K spoken language processing %K focus group %K corpus %K language %K linguistic %D 2024 %7 22.7.2024 %9 Original Paper %J JMIR Cancer %G English %X Background: Commonly offered as supportive care, therapist-led online support groups (OSGs) are a cost-effective way to provide support to individuals affected by cancer. One important indicator of a successful OSG session is group cohesion; however, monitoring group cohesion can be challenging due to the lack of nonverbal cues and in-person interactions in text-based OSGs. The Artificial Intelligence–based Co-Facilitator (AICF) was designed to contextually identify therapeutic outcomes from conversations and produce real-time analytics. Objective: The aim of this study was to develop a method to train and evaluate AICF’s capacity to monitor group cohesion. Methods: AICF used a text classification approach to extract the mentions of group cohesion within conversations. A sample of data was annotated by human scorers, which was used as the training data to build the classification model. The annotations were further supported by finding contextually similar group cohesion expressions using word embedding models as well. AICF performance was also compared against the natural language processing software Linguistic Inquiry Word Count (LIWC). Results: AICF was trained on 80,000 messages obtained from Cancer Chat Canada. We tested AICF on 34,048 messages. Human experts scored 6797 (20%) of the messages to evaluate the ability of AICF to classify group cohesion. Results showed that machine learning algorithms combined with human input could detect group cohesion, a clinically meaningful indicator of effective OSGs. After retraining with human input, AICF reached an F1-score of 0.82. AICF performed slightly better at identifying group cohesion compared to LIWC. Conclusions: AICF has the potential to assist therapists by detecting discord in the group amenable to real-time intervention. Overall, AICF presents a unique opportunity to strengthen patient-centered care in web-based settings by attending to individual needs. International Registered Report Identifier (IRRID): RR2-10.2196/21453 %M 39037754 %R 10.2196/43070 %U https://cancer.jmir.org/2024/1/e43070 %U https://doi.org/10.2196/43070 %U http://www.ncbi.nlm.nih.gov/pubmed/39037754 %0 Journal Article %@ 2562-7600 %I JMIR Publications %V 7 %N %P e54810 %T Identifying Depression Through Machine Learning Analysis of Omics Data: Scoping Review %A Taylor,Brittany %A Hobensack,Mollie %A Niño de Rivera,Stephanie %A Zhao,Yihong %A Masterson Creber,Ruth %A Cato,Kenrick %+ School of Nursing, Columbia University, 560 W 168th St, New York, NY, 10032, United States, 1 2123424172, bt2542@cumc.columbia.edu %K machine learning %K depression %K omics %K review %K mental health %K nurses %D 2024 %7 19.7.2024 %9 Review %J JMIR Nursing %G English %X Background: Depression is one of the most common mental disorders that affects >300 million people worldwide. There is a shortage of providers trained in the provision of mental health care, and the nursing workforce is essential in filling this gap. The diagnosis of depression relies heavily on self-reported symptoms and clinical interviews, which are subject to implicit biases. The omics methods, including genomics, transcriptomics, epigenomics, and microbiomics, are novel methods for identifying the biological underpinnings of depression. Machine learning is used to analyze genomic data that includes large, heterogeneous, and multidimensional data sets. Objective: This scoping review aims to review the existing literature on machine learning methods for omics data analysis to identify individuals with depression, with the goal of providing insight into alternative objective and driven insights into the diagnostic process for depression. Methods: This scoping review was reported following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. Searches were conducted in 3 databases to identify relevant publications. A total of 3 independent researchers performed screening, and discrepancies were resolved by consensus. Critical appraisal was performed using the Joanna Briggs Institute Critical Appraisal Checklist for Analytical Cross-Sectional Studies. Results: The screening process identified 15 relevant papers. The omics methods included genomics, transcriptomics, epigenomics, multiomics, and microbiomics, and machine learning methods included random forest, support vector machine, k-nearest neighbor, and artificial neural network. Conclusions: The findings of this scoping review indicate that the omics methods had similar performance in identifying omics variants associated with depression. All machine learning methods performed well based on their performance metrics. When variants in omics data are associated with an increased risk of depression, the important next step is for clinicians, especially nurses, to assess individuals for symptoms of depression and provide a diagnosis and any necessary treatment. %M 39028994 %R 10.2196/54810 %U https://nursing.jmir.org/2024/1/e54810 %U https://doi.org/10.2196/54810 %U http://www.ncbi.nlm.nih.gov/pubmed/39028994 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e48600 %T Acceptance of AI in Health Care for Short- and Long-Term Treatments: Pilot Development Study of an Integrated Theoretical Model %A Wichmann,Johannes %A Gesk,Tanja Sophie %A Leyer,Michael %+ Working group Digitalization and Process Management, Department of Business, Philipps-University Marburg, Barfuessertor 2, Marburg, 35037, Germany, 49 64212823712, johannes.wichmann@wiwi.uni-marburg.de %K health information systems %K integrated theoretical model %K artificial intelligence %K health care %K technology acceptance %K long-term treatments %K short-term treatments %K mobile phone %D 2024 %7 18.7.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: As digital technologies and especially artificial intelligence (AI) become increasingly important in health care, it is essential to determine whether and why potential users intend to use related health information systems (HIS). Several theories exist, but they focus mainly on aspects of health care or information systems, in addition to general psychological theories, and hence provide a small number of variables to explain future behavior. Thus, research that provides a larger number of variables by combining several theories from health care, information systems, and psychology is necessary. Objective: This study aims to investigate the intention to use new HIS for decisions concerning short- and long-term medical treatments using an integrated approach with several variables to explain future behavior. Methods: We developed an integrated theoretical model based on theories from health care, information systems, and psychology that allowed us to analyze the duality approach of adaptive and nonadaptive appraisals and their influence on the intention to use HIS. We applied the integrated theoretical model to the short-term treatment using AI-based HIS for surgery and the long-term treatment of diabetes tracking using survey data with structured equation modeling. To differentiate between certain levels of AI involvement, we used several scenarios that include treatments by physicians only, physicians with AI support, and AI only to understand how individuals perceive the influence of AI. Results: Our results showed that for short- and long-term treatments, the variables perceived threats, fear (disease), perceived efficacy, attitude (HIS), and perceived norms are important to consider when determining the intention to use AI-based HIS. Furthermore, the results revealed that perceived efficacy and attitude (HIS) are the most important variables to determine intention to use for all treatments and scenarios. In contrast, abilities (HIS) were important for short-term treatments only. For our 9 scenarios, adaptive and nonadaptive appraisals were both important to determine intention to use, depending on whether the treatment is known. Furthermore, we determined R² values that varied between 57.9% and 81.7% for our scenarios, which showed that the explanation power of our model is medium to good. Conclusions: We contribute to HIS literature by highlighting the importance of integrating disease- and technology-related factors and by providing an integrated theoretical model. As such, we show how adaptive and nonadaptive appraisals should be arranged to report on medical decisions in the future, especially in the short and long terms. Physicians and HIS developers can use our insights to identify promising rationale for HIS adoption concerning short- and long-term treatments and adapt and develop HIS accordingly. Specifically, HIS developers should ensure that future HIS act in terms of HIS functions, as our study shows that efficient HIS lead to a positive attitude toward the HIS and ultimately to a higher intention to use. %M 39024565 %R 10.2196/48600 %U https://formative.jmir.org/2024/1/e48600 %U https://doi.org/10.2196/48600 %U http://www.ncbi.nlm.nih.gov/pubmed/39024565 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e54365 %T An Approach to Potentially Increasing Adoption of an Artificial Intelligence–Enabled Electronic Medical Record Encounter in Canadian Primary Care: Protocol for a User-Centered Design %A Francisco,Krizia Mae %A Burns,Catherine M %+ Department of Systems Design Engineering, University of Waterloo, 200 University Avenue West, Waterloo, ON, N2L3G1, Canada, 1 5198884567, krizia.mae.francisco@gmail.com %K primary care %K electronic medical record %K EMR %K artificial intelligence %K AI %K contextual design %K user-centered %K design %K electronic health record %K EHR %K Canada %K Canadian %K primary care %K physicians %K burnout %K user %K users %K tools %K provider-centered design %K decision-making %K AI tools %K technology acceptance model %K TAM %D 2024 %7 18.7.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Primary care physicians are at the forefront of the clinical process that can lead to diagnosis, referral, and treatment. With electronic medical records (EMRs) being introduced and, over time, gaining acceptance by primary care users, they have now become a standard part of care. EMRs have the potential to be further optimized with the introduction of artificial intelligence (AI). There has yet to be a widespread exploration of the use of AI in primary health care and how clinicians envision AI use to encourage further uptake. Objective: The primary objective of this research is to understand if the user-centered design approach, rooted in contextual design, can lead to an increased likelihood of adoption of an AI-enabled encounter module embedded in a primary care EMR. In this study, we use human factor models and the technology acceptance model to understand the results. Methods: To accomplish this, a partnership has been established with an industry partner, TELUS Health, to use their EMR, the collaborative health record. The overall intention is to understand how to improve the user experience by using user-centered design to inform how AI should be embedded in an EMR encounter. Given this intention, a user-centered approach will be used to accomplish it. The approach of user-centered design requires qualitative interviewing to gain a clear understanding of users’ approaches, intentions, and other key insights to inform the design process. A total of 5 phases have been designed for this study. Results: As of March 2024, a total of 14 primary care clinician participants have been recruited and interviewed. First-cycle coding of all qualitative data results is being conducted to inform redesign considerations. Conclusions: Some limitations need to be acknowledged related to the approach of this study. There is a lack of market maturity of AI-enabled EMR encounters in primary care, requiring research to take place through scenario-based interviews. However, this participant group will still help inform design considerations for this tool. This study is targeted for completion in the late fall of 2024. International Registered Report Identifier (IRRID): DERR1-10.2196/54365 %M 39024011 %R 10.2196/54365 %U https://www.researchprotocols.org/2024/1/e54365 %U https://doi.org/10.2196/54365 %U http://www.ncbi.nlm.nih.gov/pubmed/39024011 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e52353 %T A Multimorbidity Analysis of Hospitalized Patients With COVID-19 in Northwest Italy: Longitudinal Study Using Evolutionary Machine Learning and Health Administrative Data %A Benny,Dayana %A Giacobini,Mario %A Catalano,Alberto %A Costa,Giuseppe %A Gnavi,Roberto %A Ricceri,Fulvio %+ Centre for Biostatistics, Epidemiology, and Public Health, Department of Clinical and Biological Sciences, University of Turin, Regione Gonzole 10, Orbassano, Turin, 10043, Italy, 39 0116705440, dayana.benny@unito.it %K machine learning %K evolutionary algorithm %K multimorbidity %K data analysis %K epidemiology %K feature bins %K COVID-19 %K long COVID %K ICD %K ATC %K polypharmacy %K sparse binary data %K feature engineering %K public health %K severity %K epidemiology %K coronavirus %K SARS-CoV-2 %K risk assessments %K risk assessment %K data %K data mining %K big data %K longitudinal study %K longitudinal analysis %K longitudinal analyses %K health data %K Italy %D 2024 %7 18.7.2024 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Multimorbidity is a significant public health concern, characterized by the coexistence and interaction of multiple preexisting medical conditions. This complex condition has been associated with an increased risk of COVID-19. Individuals with multimorbidity who contract COVID-19 often face a significant reduction in life expectancy. The postpandemic period has also highlighted an increase in frailty, emphasizing the importance of integrating existing multimorbidity details into epidemiological risk assessments. Managing clinical data that include medical histories presents significant challenges, particularly due to the sparsity of data arising from the rarity of multimorbidity conditions. Also, the complex enumeration of combinatorial multimorbidity features introduces challenges associated with combinatorial explosions. Objective: This study aims to assess the severity of COVID-19 in individuals with multiple medical conditions, considering their demographic characteristics such as age and sex. We propose an evolutionary machine learning model designed to handle sparsity, analyzing preexisting multimorbidity profiles of patients hospitalized with COVID-19 based on their medical history. Our objective is to identify the optimal set of multimorbidity feature combinations strongly associated with COVID-19 severity. We also apply the Apriori algorithm to these evolutionarily derived predictive feature combinations to identify those with high support. Methods: We used data from 3 administrative sources in Piedmont, Italy, involving 12,793 individuals aged 45-74 years who tested positive for COVID-19 between February and May 2020. From their 5-year pre–COVID-19 medical histories, we extracted multimorbidity features, including drug prescriptions, disease diagnoses, sex, and age. Focusing on COVID-19 hospitalization, we segmented the data into 4 cohorts based on age and sex. Addressing data imbalance through random resampling, we compared various machine learning algorithms to identify the optimal classification model for our evolutionary approach. Using 5-fold cross-validation, we evaluated each model’s performance. Our evolutionary algorithm, utilizing a deep learning classifier, generated prediction-based fitness scores to pinpoint multimorbidity combinations associated with COVID-19 hospitalization risk. Eventually, the Apriori algorithm was applied to identify frequent combinations with high support. Results: We identified multimorbidity predictors associated with COVID-19 hospitalization, indicating more severe COVID-19 outcomes. Frequently occurring morbidity features in the final evolved combinations were age>53, R03BA (glucocorticoid inhalants), and N03AX (other antiepileptics) in cohort 1; A10BA (biguanide or metformin) and N02BE (anilides) in cohort 2; N02AX (other opioids) and M04AA (preparations inhibiting uric acid production) in cohort 3; and G04CA (Alpha-adrenoreceptor antagonists) in cohort 4. Conclusions: When combined with other multimorbidity features, even less prevalent medical conditions show associations with the outcome. This study provides insights beyond COVID-19, demonstrating how repurposed administrative data can be adapted and contribute to enhanced risk assessment for vulnerable populations. %M 39024001 %R 10.2196/52353 %U https://publichealth.jmir.org/2024/1/e52353 %U https://doi.org/10.2196/52353 %U http://www.ncbi.nlm.nih.gov/pubmed/39024001 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e56700 %T Enhancing Type 2 Diabetes Treatment Decisions With Interpretable Machine Learning Models for Predicting Hemoglobin A1c Changes: Machine Learning Model Development %A Kurasawa,Hisashi %A Waki,Kayo %A Seki,Tomohisa %A Chiba,Akihiro %A Fujino,Akinori %A Hayashi,Katsuyoshi %A Nakahara,Eri %A Haga,Tsuneyuki %A Noguchi,Takashi %A Ohe,Kazuhiko %+ The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan, 81 358006427, kwaki-tky@m.u-tokyo.ac.jp %K AI %K artificial intelligence %K attention weight %K type 2 diabetes %K blood glucose control %K machine learning %K transformer %D 2024 %7 18.7.2024 %9 Original Paper %J JMIR AI %G English %X Background: Type 2 diabetes (T2D) is a significant global health challenge. Physicians need to assess whether future glycemic control will be poor on the current trajectory of usual care and usual-care treatment intensifications so that they can consider taking extra treatment measures to prevent poor outcomes. Predicting poor glycemic control from trends in hemoglobin A1c (HbA1c) levels is difficult due to the influence of seasonal fluctuations and other factors. Objective: We sought to develop a model that accurately predicts poor glycemic control among patients with T2D receiving usual care. Methods: Our machine learning model predicts poor glycemic control (HbA1c≥8%) using the transformer architecture, incorporating an attention mechanism to process irregularly spaced HbA1c time series and quantify temporal relationships of past HbA1c levels at each time point. We assessed the model using HbA1c levels from 7787 patients with T2D seeing specialist physicians at the University of Tokyo Hospital. The training data include instances of poor glycemic control occurring during usual care with usual-care treatment intensifications. We compared prediction accuracy, assessed with the area under the receiver operating characteristic curve, the area under the precision-recall curve, and the accuracy rate, to that of LightGBM. Results: The area under the receiver operating characteristic curve, the area under the precision-recall curve, and the accuracy rate (95% confidence limits) of the proposed model were 0.925 (95% CI 0.923-0.928), 0.864 (95% CI 0.852-0.875), and 0.864 (95% CI 0.86-0.869), respectively. The proposed model achieved high prediction accuracy comparable to or surpassing LightGBM’s performance. The model prioritized the most recent HbA1c levels for predictions. Older HbA1c levels in patients with poor glycemic control were slightly more influential in predictions compared to patients with good glycemic control. Conclusions: The proposed model accurately predicts poor glycemic control for patients with T2D receiving usual care, including patients receiving usual-care treatment intensifications, allowing physicians to identify cases warranting extraordinary treatment intensifications. If used by a nonspecialist, the model’s indication of likely future poor glycemic control may warrant a referral to a specialist. Future efforts could incorporate diverse and large-scale clinical data for improved accuracy. %M 39024008 %R 10.2196/56700 %U https://ai.jmir.org/2024/1/e56700 %U https://doi.org/10.2196/56700 %U http://www.ncbi.nlm.nih.gov/pubmed/39024008 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e55575 %T Prediction of Mild Cognitive Impairment Status: Pilot Study of Machine Learning Models Based on Longitudinal Data From Fitness Trackers %A Xu,Qidi %A Kim,Yejin %A Chung,Karen %A Schulz,Paul %A Gottlieb,Assaf %+ McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin St, Houston, TX, 77030, United States, 1 7135003698, assaf.gottlieb@uth.tmc.edu %K mild cognitive impairment %K Fitbits %K fitness trackers %K sleep %K physical activity %D 2024 %7 18.7.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Early signs of Alzheimer disease (AD) are difficult to detect, causing diagnoses to be significantly delayed to time points when brain damage has already occurred and current experimental treatments have little effect on slowing disease progression. Tracking cognitive decline at early stages is critical for patients to make lifestyle changes and consider new and experimental therapies. Frequently studied biomarkers are invasive and costly and are limited for predicting conversion from normal to mild cognitive impairment (MCI). Objective: This study aimed to use data collected from fitness trackers to predict MCI status. Methods: In this pilot study, fitness trackers were worn by 20 participants: 12 patients with MCI and 8 age-matched controls. We collected physical activity, heart rate, and sleep data from each participant for up to 1 month and further developed a machine learning model to predict MCI status. Results: Our machine learning model was able to perfectly separate between MCI and controls (area under the curve=1.0). The top predictive features from the model included peak, cardio, and fat burn heart rate zones; resting heart rate; average deep sleep time; and total light activity time. Conclusions: Our results suggest that a longitudinal digital biomarker differentiates between controls and patients with MCI in a very cost-effective and noninvasive way and hence may be very useful for identifying patients with very early AD who can benefit from clinical trials and new, disease-modifying therapies. %M 39024003 %R 10.2196/55575 %U https://formative.jmir.org/2024/1/e55575 %U https://doi.org/10.2196/55575 %U http://www.ncbi.nlm.nih.gov/pubmed/39024003 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e59794 %T Ethics of the Use of Social Media as Training Data for AI Models Used for Digital Phenotyping %A Jaiswal,Aditi %A Shah,Aekta %A Harjadi,Christopher %A Windgassen,Erik %A Washington,Peter %+ Department of Information and Computer Sciences, University of Hawaii at Manoa, 1680 East-West Road, Honolulu, HI, 96822, United States, 1 8088296359, pyw@hawaii.edu %K social media analytics %K machine learning %K ethics %K research ethics %K consent %K scientific integrity %D 2024 %7 17.7.2024 %9 Commentary %J JMIR Form Res %G English %X Digital phenotyping, or personal sensing, is a field of research that seeks to quantify traits and characteristics of people using digital technologies, usually for health care purposes. In this commentary, we discuss emerging ethical issues regarding the use of social media as training data for artificial intelligence (AI) models used for digital phenotyping. In particular, we describe the ethical need for explicit consent from social media users, particularly in cases where sensitive information such as labels related to neurodiversity are scraped. We also advocate for the use of community-based participatory design principles when developing health care AI models using social media data. %M 39018549 %R 10.2196/59794 %U https://formative.jmir.org/2024/1/e59794 %U https://doi.org/10.2196/59794 %U http://www.ncbi.nlm.nih.gov/pubmed/39018549 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e55496 %T Accessible Ecosystem for Clinical Research (Federated Learning for Everyone): Development and Usability Study %A Pirmani,Ashkan %A Oldenhof,Martijn %A Peeters,Liesbet M %A De Brouwer,Edward %A Moreau,Yves %+ ESAT-STADIUS, KU Leuven, Kasteelpark Arenberg 10, Leuven, 3001, Belgium, 32 16 32 86 45, Yves.Moreau@esat.kuleuven.be %K federated learning %K multistakeholder collaboration %K real-world data %K integrity %K reliability %K clinical research %K implementation %K inclusivity %K inclusive %K accessible %K ecosystem %K design effectiveness %D 2024 %7 17.7.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: The integrity and reliability of clinical research outcomes rely heavily on access to vast amounts of data. However, the fragmented distribution of these data across multiple institutions, along with ethical and regulatory barriers, presents significant challenges to accessing relevant data. While federated learning offers a promising solution to leverage insights from fragmented data sets, its adoption faces hurdles due to implementation complexities, scalability issues, and inclusivity challenges. Objective: This paper introduces Federated Learning for Everyone (FL4E), an accessible framework facilitating multistakeholder collaboration in clinical research. It focuses on simplifying federated learning through an innovative ecosystem-based approach. Methods: The “degree of federation” is a fundamental concept of FL4E, allowing for flexible integration of federated and centralized learning models. This feature provides a customizable solution by enabling users to choose the level of data decentralization based on specific health care settings or project needs, making federated learning more adaptable and efficient. By using an ecosystem-based collaborative learning strategy, FL4E encourages a comprehensive platform for managing real-world data, enhancing collaboration and knowledge sharing among its stakeholders. Results: Evaluating FL4E’s effectiveness using real-world health care data sets has highlighted its ecosystem-oriented and inclusive design. By applying hybrid models to 2 distinct analytical tasks—classification and survival analysis—within real-world settings, we have effectively measured the “degree of federation” across various contexts. These evaluations show that FL4E’s hybrid models not only match the performance of fully federated models but also avoid the substantial overhead usually linked with these models. Achieving this balance greatly enhances collaborative initiatives and broadens the scope of analytical possibilities within the ecosystem. Conclusions: FL4E represents a significant step forward in collaborative clinical research by merging the benefits of centralized and federated learning. Its modular ecosystem-based design and the “degree of federation” feature make it an inclusive, customizable framework suitable for a wide array of clinical research scenarios, promising to revolutionize the field through improved collaboration and data use. Detailed implementation and analyses are available on the associated GitHub repository. %M 39018557 %R 10.2196/55496 %U https://formative.jmir.org/2024/1/e55496 %U https://doi.org/10.2196/55496 %U http://www.ncbi.nlm.nih.gov/pubmed/39018557 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 12 %N %P e55094 %T Wearable Data From Subjects Playing Super Mario, Taking University Exams, or Performing Physical Exercise Help Detect Acute Mood Disorder Episodes via Self-Supervised Learning: Prospective, Exploratory, Observational Study %A Corponi,Filippo %A Li,Bryan M %A Anmella,Gerard %A Valenzuela-Pascual,Clàudia %A Mas,Ariadna %A Pacchiarotti,Isabella %A Valentí,Marc %A Grande,Iria %A Benabarre,Antoni %A Garriga,Marina %A Vieta,Eduard %A Young,Allan H %A Lawrie,Stephen M %A Whalley,Heather C %A Hidalgo-Mazzei,Diego %A Vergari,Antonio %+ School of Informatics, University of Edinburgh, Informatics Forum, 10 Crichton St, Newington, Edinburgh, EH89AB, United Kingdom, 44 131 651 5661, filippo.corponi@ed.ac.uk %K mood disorder %K time-series classification %K wearable %K personal sensing %K deep learning %K self-supervised learning %K transformer %D 2024 %7 17.7.2024 %9 Original Paper %J JMIR Mhealth Uhealth %G English %X Background: Personal sensing, leveraging data passively and near-continuously collected with wearables from patients in their ecological environment, is a promising paradigm to monitor mood disorders (MDs), a major determinant of the worldwide disease burden. However, collecting and annotating wearable data is resource intensive. Studies of this kind can thus typically afford to recruit only a few dozen patients. This constitutes one of the major obstacles to applying modern supervised machine learning techniques to MD detection. Objective: In this paper, we overcame this data bottleneck and advanced the detection of acute MD episodes from wearables’ data on the back of recent advances in self-supervised learning (SSL). This approach leverages unlabeled data to learn representations during pretraining, subsequently exploited for a supervised task. Methods: We collected open access data sets recording with the Empatica E4 wristband spanning different, unrelated to MD monitoring, personal sensing tasks—from emotion recognition in Super Mario players to stress detection in undergraduates—and devised a preprocessing pipeline performing on-/off-body detection, sleep/wake detection, segmentation, and (optionally) feature extraction. With 161 E4-recorded subjects, we introduced E4SelfLearning, the largest-to-date open access collection, and its preprocessing pipeline. We developed a novel E4-tailored transformer (E4mer) architecture, serving as the blueprint for both SSL and fully supervised learning; we assessed whether and under which conditions self-supervised pretraining led to an improvement over fully supervised baselines (ie, the fully supervised E4mer and pre–deep learning algorithms) in detecting acute MD episodes from recording segments taken in 64 (n=32, 50%, acute, n=32, 50%, stable) patients. Results: SSL significantly outperformed fully supervised pipelines using either our novel E4mer or extreme gradient boosting (XGBoost): n=3353 (81.23%) against n=3110 (75.35%; E4mer) and n=2973 (72.02%; XGBoost) correctly classified recording segments from a total of 4128 segments. SSL performance was strongly associated with the specific surrogate task used for pretraining, as well as with unlabeled data availability. Conclusions: We showed that SSL, a paradigm where a model is pretrained on unlabeled data with no need for human annotations before deployment on the supervised target task of interest, helps overcome the annotation bottleneck; the choice of the pretraining surrogate task and the size of unlabeled data for pretraining are key determinants of SSL success. We introduced E4mer, which can be used for SSL, and shared the E4SelfLearning collection, along with its preprocessing pipeline, which can foster and expedite future research into SSL for personal sensing. %M 39018100 %R 10.2196/55094 %U https://mhealth.jmir.org/2024/1/e55094 %U https://doi.org/10.2196/55094 %U http://www.ncbi.nlm.nih.gov/pubmed/39018100 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e56114 %T Evaluating the Potential and Pitfalls of AI-Powered Conversational Agents as Humanlike Virtual Health Carers in the Remote Management of Noncommunicable Diseases: Scoping Review %A Anisha,Sadia Azmin %A Sen,Arkendu %A Bain,Chris %+ Jeffrey Cheah School of Medicine & Health Sciences, Monash University Malaysia, Jalan Lagoon Selatan, Bandar Sunway, 47500, Malaysia, 60 3 551 46325, arkendu.sen@monash.edu %K conversational agents %K artificial intelligence %K noncommunicable disease %K self-management %K remote monitoring %K mobile phone %D 2024 %7 16.7.2024 %9 Review %J J Med Internet Res %G English %X Background: The rising prevalence of noncommunicable diseases (NCDs) worldwide and the high recent mortality rates (74.4%) associated with them, especially in low- and middle-income countries, is causing a substantial global burden of disease, necessitating innovative and sustainable long-term care solutions. Objective: This scoping review aims to investigate the impact of artificial intelligence (AI)–based conversational agents (CAs)—including chatbots, voicebots, and anthropomorphic digital avatars—as human-like health caregivers in the remote management of NCDs as well as identify critical areas for future research and provide insights into how these technologies might be used effectively in health care to personalize NCD management strategies. Methods: A broad literature search was conducted in July 2023 in 6 electronic databases—Ovid MEDLINE, Embase, PsycINFO, PubMed, CINAHL, and Web of Science—using the search terms “conversational agents,” “artificial intelligence,” and “noncommunicable diseases,” including their associated synonyms. We also manually searched gray literature using sources such as ProQuest Central, ResearchGate, ACM Digital Library, and Google Scholar. We included empirical studies published in English from January 2010 to July 2023 focusing solely on health care–oriented applications of CAs used for remote management of NCDs. The narrative synthesis approach was used to collate and summarize the relevant information extracted from the included studies. Results: The literature search yielded a total of 43 studies that matched the inclusion criteria. Our review unveiled four significant findings: (1) higher user acceptance and compliance with anthropomorphic and avatar-based CAs for remote care; (2) an existing gap in the development of personalized, empathetic, and contextually aware CAs for effective emotional and social interaction with users, along with limited consideration of ethical concerns such as data privacy and patient safety; (3) inadequate evidence of the efficacy of CAs in NCD self-management despite a moderate to high level of optimism among health care professionals regarding CAs’ potential in remote health care; and (4) CAs primarily being used for supporting nonpharmacological interventions such as behavioral or lifestyle modifications and patient education for the self-management of NCDs. Conclusions: This review makes a unique contribution to the field by not only providing a quantifiable impact analysis but also identifying the areas requiring imminent scholarly attention for the ethical, empathetic, and efficacious implementation of AI in NCD care. This serves as an academic cornerstone for future research in AI-assisted health care for NCD management. Trial Registration: Open Science Framework; https://doi.org/10.17605/OSF.IO/GU5PX %M 39012688 %R 10.2196/56114 %U https://www.jmir.org/2024/1/e56114 %U https://doi.org/10.2196/56114 %U http://www.ncbi.nlm.nih.gov/pubmed/39012688 %0 Journal Article %@ 2291-9694 %I %V 12 %N %P e56361 %T Diagnostic Accuracy of Artificial Intelligence in Endoscopy: Umbrella Review %A Zha,Bowen %A Cai,Angshu %A Wang,Guiqi %K endoscopy %K artificial intelligence %K umbrella review %K meta-analyses %K AI %K diagnostic %K researchers %K researcher %K tools %K tool %K assessment %D 2024 %7 15.7.2024 %9 %J JMIR Med Inform %G English %X Background: Some research has already reported the diagnostic value of artificial intelligence (AI) in different endoscopy outcomes. However, the evidence is confusing and of varying quality. Objective: This review aimed to comprehensively evaluate the credibility of the evidence of AI’s diagnostic accuracy in endoscopy. Methods: Before the study began, the protocol was registered on PROSPERO (CRD42023483073). First, 2 researchers searched PubMed, Web of Science, Embase, and Cochrane Library using comprehensive search terms. Then, researchers screened the articles and extracted information. We used A Measurement Tool to Assess Systematic Reviews 2 (AMSTAR2) to evaluate the quality of the articles. When there were multiple studies aiming at the same result, we chose the study with higher-quality evaluations for further analysis. To ensure the reliability of the conclusions, we recalculated each outcome. Finally, the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) was used to evaluate the credibility of the outcomes. Results: A total of 21 studies were included for analysis. Through AMSTAR2, it was found that 8 research methodologies were of moderate quality, while other studies were regarded as having low or critically low quality. The sensitivity and specificity of 17 different outcomes were analyzed. There were 4 studies on esophagus, 4 studies on stomach, and 4 studies on colorectal regions. Two studies were associated with capsule endoscopy, two were related to laryngoscopy, and one was related to ultrasonic endoscopy. In terms of sensitivity, gastroesophageal reflux disease had the highest accuracy rate, reaching 97%, while the invasion depth of colon neoplasia, with 71%, had the lowest accuracy rate. On the other hand, the specificity of colorectal cancer was the highest, reaching 98%, while the gastrointestinal stromal tumor, with only 80%, had the lowest specificity. The GRADE evaluation suggested that the reliability of most outcomes was low or very low. Conclusions: AI proved valuabe in endoscopic diagnoses, especially in esophageal and colorectal diseases. These findings provide a theoretical basis for developing and evaluating AI-assisted systems, which are aimed at assisting endoscopists in carrying out examinations, leading to improved patient health outcomes. However, further high-quality research is needed in the future to fully validate AI’s effectiveness. %R 10.2196/56361 %U https://medinform.jmir.org/2024/1/e56361 %U https://doi.org/10.2196/56361 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e48156 %T AI as a Medical Device Adverse Event Reporting in Regulatory Databases: Protocol for a Systematic Review %A Kale,Aditya U %A Dattani,Riya %A Tabansi,Ashley %A Hogg,Henry David Jeffry %A Pearson,Russell %A Glocker,Ben %A Golder,Su %A Waring,Justin %A Liu,Xiaoxuan %A Moore,David J %A Denniston,Alastair K %+ Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom, 44 1213713243, a.denniston@bham.ac.uk %K adverse event %K artificial intelligence %K regulatory science %K regulatory database %K safety issue %K feedback %K health care product %K artificial intelligence health technology %K reporting system %K safety %K medical devices %K safety monitoring %K risks %K descriptive analysis %D 2024 %7 11.7.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: The reporting of adverse events (AEs) relating to medical devices is a long-standing area of concern, with suboptimal reporting due to a range of factors including a failure to recognize the association of AEs with medical devices, lack of knowledge of how to report AEs, and a general culture of nonreporting. The introduction of artificial intelligence as a medical device (AIaMD) requires a robust safety monitoring environment that recognizes both generic risks of a medical device and some of the increasingly recognized risks of AIaMD (such as algorithmic bias). There is an urgent need to understand the limitations of current AE reporting systems and explore potential mechanisms for how AEs could be detected, attributed, and reported with a view to improving the early detection of safety signals. Objective: The systematic review outlined in this protocol aims to yield insights into the frequency and severity of AEs while characterizing the events using existing regulatory guidance. Methods: Publicly accessible AE databases will be searched to identify AE reports for AIaMD. Scoping searches have identified 3 regulatory territories for which public access to AE reports is provided: the United States, the United Kingdom, and Australia. AEs will be included for analysis if an artificial intelligence (AI) medical device is involved. Software as a medical device without AI is not within the scope of this review. Data extraction will be conducted using a data extraction tool designed for this review and will be done independently by AUK and a second reviewer. Descriptive analysis will be conducted to identify the types of AEs being reported, and their frequency, for different types of AIaMD. AEs will be analyzed and characterized according to existing regulatory guidance. Results: Scoping searches are being conducted with screening to begin in April 2024. Data extraction and synthesis will commence in May 2024, with planned completion by August 2024. The review will highlight the types of AEs being reported for different types of AI medical devices and where the gaps are. It is anticipated that there will be particularly low rates of reporting for indirect harms associated with AIaMD. Conclusions: To our knowledge, this will be the first systematic review of 3 different regulatory sources reporting AEs associated with AIaMD. The review will focus on real-world evidence, which brings certain limitations, compounded by the opacity of regulatory databases generally. The review will outline the characteristics and frequency of AEs reported for AIaMD and help regulators and policy makers to continue developing robust safety monitoring processes. International Registered Report Identifier (IRRID): PRR1-10.2196/48156 %M 38990628 %R 10.2196/48156 %U https://www.researchprotocols.org/2024/1/e48156 %U https://doi.org/10.2196/48156 %U http://www.ncbi.nlm.nih.gov/pubmed/38990628 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e53396 %T Harnessing Artificial Intelligence to Predict Ovarian Stimulation Outcomes in In Vitro Fertilization: Scoping Review %A AlSaad,Rawan %A Abd-alrazaq,Alaa %A Choucair,Fadi %A Ahmed,Arfan %A Aziz,Sarah %A Sheikh,Javaid %+ AI Center for Precision Health, Weill Cornell Medicine-Qatar, Education City, Street 2700, Doha, Qatar, 974 44928830, rta4003@qatar-med.cornell.edu %K artificial intelligence %K AI %K AI models %K AI model %K in vitro fertilization %K IVF %K ovarian stimulation %K infertility %K fertility %K ovary %K ovaries %K reproductive %K reproduction %K gynecology %K prediction %K predictions %K predictive %K prediction model %K ovarian %K adverse outcome %K fertilization %K pregnancy %D 2024 %7 5.7.2024 %9 Review %J J Med Internet Res %G English %X Background: In the realm of in vitro fertilization (IVF), artificial intelligence (AI) models serve as invaluable tools for clinicians, offering predictive insights into ovarian stimulation outcomes. Predicting and understanding a patient’s response to ovarian stimulation can help in personalizing doses of drugs, preventing adverse outcomes (eg, hyperstimulation), and improving the likelihood of successful fertilization and pregnancy. Given the pivotal role of accurate predictions in IVF procedures, it becomes important to investigate the landscape of AI models that are being used to predict the outcomes of ovarian stimulation. Objective: The objective of this review is to comprehensively examine the literature to explore the characteristics of AI models used for predicting ovarian stimulation outcomes in the context of IVF. Methods: A total of 6 electronic databases were searched for peer-reviewed literature published before August 2023, using the concepts of IVF and AI, along with their related terms. Records were independently screened by 2 reviewers against the eligibility criteria. The extracted data were then consolidated and presented through narrative synthesis. Results: Upon reviewing 1348 articles, 30 met the predetermined inclusion criteria. The literature primarily focused on the number of oocytes retrieved as the main predicted outcome. Microscopy images stood out as the primary ground truth reference. The reviewed studies also highlighted that the most frequently adopted stimulation protocol was the gonadotropin-releasing hormone (GnRH) antagonist. In terms of using trigger medication, human chorionic gonadotropin (hCG) was the most commonly selected option. Among the machine learning techniques, the favored choice was the support vector machine. As for the validation of AI algorithms, the hold-out cross-validation method was the most prevalent. The area under the curve was highlighted as the primary evaluation metric. The literature exhibited a wide variation in the number of features used for AI algorithm development, ranging from 2 to 28,054 features. Data were mostly sourced from patient demographics, followed by laboratory data, specifically hormonal levels. Notably, the vast majority of studies were restricted to a single infertility clinic and exclusively relied on nonpublic data sets. Conclusions: These insights highlight an urgent need to diversify data sources and explore varied AI techniques for improved prediction accuracy and generalizability of AI models for the prediction of ovarian stimulation outcomes. Future research should prioritize multiclinic collaborations and consider leveraging public data sets, aiming for more precise AI-driven predictions that ultimately boost patient care and IVF success rates. %M 38967964 %R 10.2196/53396 %U https://www.jmir.org/2024/1/e53396 %U https://doi.org/10.2196/53396 %U http://www.ncbi.nlm.nih.gov/pubmed/38967964 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 11 %N %P e52045 %T Bayesian Networks for Prescreening in Depression: Algorithm Development and Validation %A Maekawa,Eduardo %A Grua,Eoin Martino %A Nakamura,Carina Akemi %A Scazufca,Marcia %A Araya,Ricardo %A Peters,Tim %A van de Ven,Pepijn %+ Department of Electronic and Computer Engineering, University of Limerick, Plassey Park Road, Limerick, V94 T9PX, Ireland, 353 830150601, eduardo.maekawa@ul.ie %K Bayesian network %K target depressive symptomatology %K probabilistic machine learning %K stochastic gradient descent %K patient screening %K depressive symptom %K machine learning model %K machine learning %K survey %K prediction %K socioeconomic data sets %K utilization %K depression %K mental health %K digital mental health %K artificial intelligence %K AI %K prediction %K prediction modeling %K patient %K mood %K anxiety %K mood disorders %K mood disorder %K eHealth %K mobile health %K mHealth %K telehealth %D 2024 %7 4.7.2024 %9 Original Paper %J JMIR Ment Health %G English %X Background: Identifying individuals with depressive symptomatology (DS) promptly and effectively is of paramount importance for providing timely treatment. Machine learning models have shown promise in this area; however, studies often fall short in demonstrating the practical benefits of using these models and fail to provide tangible real-world applications. Objective: This study aims to establish a novel methodology for identifying individuals likely to exhibit DS, identify the most influential features in a more explainable way via probabilistic measures, and propose tools that can be used in real-world applications. Methods: The study used 3 data sets: PROACTIVE, the Brazilian National Health Survey (Pesquisa Nacional de Saúde [PNS]) 2013, and PNS 2019, comprising sociodemographic and health-related features. A Bayesian network was used for feature selection. Selected features were then used to train machine learning models to predict DS, operationalized as a score of ≥10 on the 9-item Patient Health Questionnaire. The study also analyzed the impact of varying sensitivity rates on the reduction of screening interviews compared to a random approach. Results: The methodology allows the users to make an informed trade-off among sensitivity, specificity, and a reduction in the number of interviews. At the thresholds of 0.444, 0.412, and 0.472, determined by maximizing the Youden index, the models achieved sensitivities of 0.717, 0.741, and 0.718, and specificities of 0.644, 0.737, and 0.766 for PROACTIVE, PNS 2013, and PNS 2019, respectively. The area under the receiver operating characteristic curve was 0.736, 0.801, and 0.809 for these 3 data sets, respectively. For the PROACTIVE data set, the most influential features identified were postural balance, shortness of breath, and how old people feel they are. In the PNS 2013 data set, the features were the ability to do usual activities, chest pain, sleep problems, and chronic back problems. The PNS 2019 data set shared 3 of the most influential features with the PNS 2013 data set. However, the difference was the replacement of chronic back problems with verbal abuse. It is important to note that the features contained in the PNS data sets differ from those found in the PROACTIVE data set. An empirical analysis demonstrated that using the proposed model led to a potential reduction in screening interviews of up to 52% while maintaining a sensitivity of 0.80. Conclusions: This study developed a novel methodology for identifying individuals with DS, demonstrating the utility of using Bayesian networks to identify the most significant features. Moreover, this approach has the potential to substantially reduce the number of screening interviews while maintaining high sensitivity, thereby facilitating improved early identification and intervention strategies for individuals experiencing DS. %M 38963925 %R 10.2196/52045 %U https://mental.jmir.org/2024/1/e52045 %U https://doi.org/10.2196/52045 %U http://www.ncbi.nlm.nih.gov/pubmed/38963925 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52139 %T Artificial Intelligence–Based Electrocardiographic Biomarker for Outcome Prediction in Patients With Acute Heart Failure: Prospective Cohort Study %A Cho,Youngjin %A Yoon,Minjae %A Kim,Joonghee %A Lee,Ji Hyun %A Oh,Il-Young %A Lee,Chan Joo %A Kang,Seok-Min %A Choi,Dong-Ju %+ Division of Cardiology, Department of Internal Medicine, Seoul National University Bundang Hospital, Seoul National University College of Medicine, 82 Gumi-ro 173 Beon-gil, Bundang-gu, Seongnam, Gyeonggi-do, 13620, Republic of Korea, 82 317877007, djchoi@snubh.org %K acute heart failure %K electrocardiography %K artificial intelligence %K deep learning %D 2024 %7 3.7.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Although several biomarkers exist for patients with heart failure (HF), their use in routine clinical practice is often constrained by high costs and limited availability. Objective: We examined the utility of an artificial intelligence (AI) algorithm that analyzes printed electrocardiograms (ECGs) for outcome prediction in patients with acute HF. Methods: We retrospectively analyzed prospectively collected data of patients with acute HF at two tertiary centers in Korea. Baseline ECGs were analyzed using a deep-learning system called Quantitative ECG (QCG), which was trained to detect several urgent clinical conditions, including shock, cardiac arrest, and reduced left ventricular ejection fraction (LVEF). Results: Among the 1254 patients enrolled, in-hospital cardiac death occurred in 53 (4.2%) patients, and the QCG score for critical events (QCG-Critical) was significantly higher in these patients than in survivors (mean 0.57, SD 0.23 vs mean 0.29, SD 0.20; P<.001). The QCG-Critical score was an independent predictor of in-hospital cardiac death after adjustment for age, sex, comorbidities, HF etiology/type, atrial fibrillation, and QRS widening (adjusted odds ratio [OR] 1.68, 95% CI 1.47-1.92 per 0.1 increase; P<.001), and remained a significant predictor after additional adjustments for echocardiographic LVEF and N-terminal prohormone of brain natriuretic peptide level (adjusted OR 1.59, 95% CI 1.36-1.87 per 0.1 increase; P<.001). During long-term follow-up, patients with higher QCG-Critical scores (>0.5) had higher mortality rates than those with low QCG-Critical scores (<0.25) (adjusted hazard ratio 2.69, 95% CI 2.14-3.38; P<.001). Conclusions: Predicting outcomes in patients with acute HF using the QCG-Critical score is feasible, indicating that this AI-based ECG score may be a novel biomarker for these patients. Trial Registration: ClinicalTrials.gov NCT01389843; https://clinicaltrials.gov/study/NCT01389843 %M 38959500 %R 10.2196/52139 %U https://www.jmir.org/2024/1/e52139 %U https://doi.org/10.2196/52139 %U http://www.ncbi.nlm.nih.gov/pubmed/38959500 %0 Journal Article %@ 2368-7959 %I %V 11 %N %P e56569 %T The Role of Humanization and Robustness of Large Language Models in Conversational Artificial Intelligence for Individuals With Depression: A Critical Analysis %A Ferrario,Andrea %A Sedlakova,Jana %A Trachsel,Manuel %K generative AI %K large language models %K large language model %K LLM %K LLMs %K machine learning %K ML %K natural language processing %K NLP %K deep learning %K depression %K mental health %K mental illness %K mental disease %K mental diseases %K mental illnesses %K artificial intelligence %K AI %K digital health %K digital technology %K digital intervention %K digital interventions %K ethics %D 2024 %7 2.7.2024 %9 %J JMIR Ment Health %G English %X Large language model (LLM)–powered services are gaining popularity in various applications due to their exceptional performance in many tasks, such as sentiment analysis and answering questions. Recently, research has been exploring their potential use in digital health contexts, particularly in the mental health domain. However, implementing LLM-enhanced conversational artificial intelligence (CAI) presents significant ethical, technical, and clinical challenges. In this viewpoint paper, we discuss 2 challenges that affect the use of LLM-enhanced CAI for individuals with mental health issues, focusing on the use case of patients with depression: the tendency to humanize LLM-enhanced CAI and their lack of contextualized robustness. Our approach is interdisciplinary, relying on considerations from philosophy, psychology, and computer science. We argue that the humanization of LLM-enhanced CAI hinges on the reflection of what it means to simulate “human-like” features with LLMs and what role these systems should play in interactions with humans. Further, ensuring the contextualization of the robustness of LLMs requires considering the specificities of language production in individuals with depression, as well as its evolution over time. Finally, we provide a series of recommendations to foster the responsible design and deployment of LLM-enhanced CAI for the therapeutic support of individuals with depression. %R 10.2196/56569 %U https://mental.jmir.org/2024/1/e56569 %U https://doi.org/10.2196/56569 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 7 %N %P e48811 %T Efficacy of an Artificial Intelligence App (Aysa) in Dermatological Diagnosis: Cross-Sectional Analysis %A Marri,Shiva Shankar %A Albadri,Warood %A Hyder,Mohammed Salman %A Janagond,Ajit B %A Inamadar,Arun C %+ Department of Dermatology, Venereology and Leprosy, Shri B M Patil Medical College, Hospital and Research Centre, BLDE (Deemed to be) University, Bangaramma Sajjan Campus, Vijayapura, Karnataka, 586103, India, 91 9448102920, aruninamadar@gmail.com %K artificial intelligence %K AI %K AI-aided diagnosis %K dermatology %K mobile app %K application %K neural network %K machine learning %K dermatological %K skin %K computer-aided diagnosis %K diagnostic %K imaging %K lesion %D 2024 %7 2.7.2024 %9 Original Paper %J JMIR Dermatol %G English %X Background: Dermatology is an ideal specialty for artificial intelligence (AI)–driven image recognition to improve diagnostic accuracy and patient care. Lack of dermatologists in many parts of the world and the high frequency of cutaneous disorders and malignancies highlight the increasing need for AI-aided diagnosis. Although AI-based applications for the identification of dermatological conditions are widely available, research assessing their reliability and accuracy is lacking. Objective: The aim of this study was to analyze the efficacy of the Aysa AI app as a preliminary diagnostic tool for various dermatological conditions in a semiurban town in India. Methods: This observational cross-sectional study included patients over the age of 2 years who visited the dermatology clinic. Images of lesions from individuals with various skin disorders were uploaded to the app after obtaining informed consent. The app was used to make a patient profile, identify lesion morphology, plot the location on a human model, and answer questions regarding duration and symptoms. The app presented eight differential diagnoses, which were compared with the clinical diagnosis. The model’s performance was evaluated using sensitivity, specificity, accuracy, positive predictive value, negative predictive value, and F1-score. Comparison of categorical variables was performed with the χ2 test and statistical significance was considered at P<.05. Results: A total of 700 patients were part of the study. A wide variety of skin conditions were grouped into 12 categories. The AI model had a mean top-1 sensitivity of 71% (95% CI 61.5%-74.3%), top-3 sensitivity of 86.1% (95% CI 83.4%-88.6%), and all-8 sensitivity of 95.1% (95% CI 93.3%-96.6%). The top-1 sensitivities for diagnosis of skin infestations, disorders of keratinization, other inflammatory conditions, and bacterial infections were 85.7%, 85.7%, 82.7%, and 81.8%, respectively. In the case of photodermatoses and malignant tumors, the top-1 sensitivities were 33.3% and 10%, respectively. Each category had a strong correlation between the clinical diagnosis and the probable diagnoses (P<.001). Conclusions: The Aysa app showed promising results in identifying most dermatoses. %M 38954807 %R 10.2196/48811 %U https://derma.jmir.org/2024/1/e48811 %U https://doi.org/10.2196/48811 %U http://www.ncbi.nlm.nih.gov/pubmed/38954807 %0 Journal Article %I %V %N %P %T %D %7 .. %9 %J %G English %X %U %0 Journal Article %@ 2291-9694 %I %V 12 %N %P e57674 %T Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation %A Xu,Jie %A Lu,Lu %A Peng,Xinwei %A Pang,Jiali %A Ding,Jinru %A Yang,Lingrui %A Song,Huan %A Li,Kang %A Sun,Xin %A Zhang,Shaoting %K ChatGPT %K LLM %K assessment %K data set %K benchmark %K medicine %D 2024 %7 28.6.2024 %9 %J JMIR Med Inform %G English %X Background: Large language models (LLMs) have achieved great progress in natural language processing tasks and demonstrated the potential for use in clinical applications. Despite their capabilities, LLMs in the medical domain are prone to generating hallucinations (not fully reliable responses). Hallucinations in LLMs’ responses create substantial risks, potentially threatening patients’ physical safety. Thus, to perceive and prevent this safety risk, it is essential to evaluate LLMs in the medical domain and build a systematic evaluation. Objective: We developed a comprehensive evaluation system, MedGPTEval, composed of criteria, medical data sets in Chinese, and publicly available benchmarks. Methods: First, a set of evaluation criteria was designed based on a comprehensive literature review. Second, existing candidate criteria were optimized by using a Delphi method with 5 experts in medicine and engineering. Third, 3 clinical experts designed medical data sets to interact with LLMs. Finally, benchmarking experiments were conducted on the data sets. The responses generated by chatbots based on LLMs were recorded for blind evaluations by 5 licensed medical experts. The evaluation criteria that were obtained covered medical professional capabilities, social comprehensive capabilities, contextual capabilities, and computational robustness, with 16 detailed indicators. The medical data sets include 27 medical dialogues and 7 case reports in Chinese. Three chatbots were evaluated: ChatGPT by OpenAI; ERNIE Bot by Baidu, Inc; and Doctor PuJiang (Dr PJ) by Shanghai Artificial Intelligence Laboratory. Results: Dr PJ outperformed ChatGPT and ERNIE Bot in the multiple-turn medical dialogues and case report scenarios. Dr PJ also outperformed ChatGPT in the semantic consistency rate and complete error rate category, indicating better robustness. However, Dr PJ had slightly lower scores in medical professional capabilities compared with ChatGPT in the multiple-turn dialogue scenario. Conclusions: MedGPTEval provides comprehensive criteria to evaluate chatbots by LLMs in the medical domain, open-source data sets, and benchmarks assessing 3 LLMs. Experimental results demonstrate that Dr PJ outperforms ChatGPT and ERNIE Bot in social and professional contexts. Therefore, such an assessment system can be easily adopted by researchers in this community to augment an open-source data set. %R 10.2196/57674 %U https://medinform.jmir.org/2024/1/e57674 %U https://doi.org/10.2196/57674 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e50295 %T Resilient Artificial Intelligence in Health: Synthesis and Research Agenda Toward Next-Generation Trustworthy Clinical Decision Support %A Sáez,Carlos %A Ferri,Pablo %A García-Gómez,Juan M %+ Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Camino de Vera s/n, Valencia, 46022, Spain, 34 963877000 ext 85247, carsaesi@upv.es %K artificial intelligence %K clinical decision support %K resilience %K clinical medicine %K machine learning %K data quality %K fairness %K trustworthy AI %K regulation %K AI regulation %K AI Act %K EHDS %K European Health Data Space %K emergency medical dispatch %K clinical decision support systems %D 2024 %7 28.6.2024 %9 Viewpoint %J J Med Internet Res %G English %X Artificial intelligence (AI)–based clinical decision support systems are gaining momentum by relying on a greater volume and variety of secondary use data. However, the uncertainty, variability, and biases in real-world data environments still pose significant challenges to the development of health AI, its routine clinical use, and its regulatory frameworks. Health AI should be resilient against real-world environments throughout its lifecycle, including the training and prediction phases and maintenance during production, and health AI regulations should evolve accordingly. Data quality issues, variability over time or across sites, information uncertainty, human-computer interaction, and fundamental rights assurance are among the most relevant challenges. If health AI is not designed resiliently with regard to these real-world data effects, potentially biased data-driven medical decisions can risk the safety and fundamental rights of millions of people. In this viewpoint, we review the challenges, requirements, and methods for resilient AI in health and provide a research framework to improve the trustworthiness of next-generation AI-based clinical decision support. %M 38941134 %R 10.2196/50295 %U https://www.jmir.org/2024/1/e50295 %U https://doi.org/10.2196/50295 %U http://www.ncbi.nlm.nih.gov/pubmed/38941134 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e58491 %T AI: Bridging Ancient Wisdom and Modern Innovation in Traditional Chinese Medicine %A Lu,Linken %A Lu,Tangsheng %A Tian,Chunyu %A Zhang,Xiujun %+ School of Psychology and Mental Health, North China University of Science and Technology, 21 Bohai Avenue, Caofeidian New Town, Tangshan, Hebei Province, 063210, China, 86 0315 8805970, zhxj@ncst.edu.cn %K traditional Chinese medicine %K TCM %K artificial intelligence %K AI %K diagnosis %D 2024 %7 28.6.2024 %9 Viewpoint %J JMIR Med Inform %G English %X The pursuit of groundbreaking health care innovations has led to the convergence of artificial intelligence (AI) and traditional Chinese medicine (TCM), thus marking a new frontier that demonstrates the promise of combining the advantages of ancient healing practices with cutting-edge advancements in modern technology. TCM, which is a holistic medical system with >2000 years of empirical support, uses unique diagnostic methods such as inspection, auscultation and olfaction, inquiry, and palpation. AI is the simulation of human intelligence processes by machines, especially via computer systems. TCM is experience oriented, holistic, and subjective, and its combination with AI has beneficial effects, which presumably arises from the perspectives of diagnostic accuracy, treatment efficacy, and prognostic veracity. The role of AI in TCM is highlighted by its use in diagnostics, with machine learning enhancing the precision of treatment through complex pattern recognition. This is exemplified by the greater accuracy of TCM syndrome differentiation via tongue images that are analyzed by AI. However, integrating AI into TCM also presents multifaceted challenges, such as data quality and ethical issues; thus, a unified strategy, such as the use of standardized data sets, is required to improve AI understanding and application of TCM principles. The evolution of TCM through the integration of AI is a key factor for elucidating new horizons in health care. As research continues to evolve, it is imperative that technologists and TCM practitioners collaborate to drive innovative solutions that push the boundaries of medical science and honor the profound legacy of TCM. We can chart a future course wherein AI-augmented TCM practices contribute to more systematic, effective, and accessible health care systems for all individuals. %M 38941141 %R 10.2196/58491 %U https://medinform.jmir.org/2024/1/e58491 %U https://doi.org/10.2196/58491 %U http://www.ncbi.nlm.nih.gov/pubmed/38941141 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54571 %T Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4 %A Lahat,Adi %A Sharif,Kassem %A Zoabi,Narmin %A Shneor Patt,Yonatan %A Sharif,Yousra %A Fisher,Lior %A Shani,Uria %A Arow,Mohamad %A Levin,Roni %A Klang,Eyal %+ Department of Gastroenterology, Chaim Sheba Medical Center, Affiliated with Tel Aviv University, Tel Hashomer, Ramat Gan, 5262100, Israel, 972 5302060, zokadi@gmail.com %K ChatGPT %K chat-GPT %K chatbot %K chatbots %K chat-bot %K chat-bots %K natural language processing %K NLP %K artificial intelligence %K AI %K machine learning %K ML %K algorithm %K algorithms %K predictive model %K predictive models %K predictive analytics %K predictive system %K practical model %K practical models %K internal medicine %K ethics %K ethical %K ethical dilemma %K ethical dilemmas %K bioethics %K emergency medicine %K EM medicine %K ED physician %K emergency physician %K emergency doctor %D 2024 %7 27.6.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence, particularly chatbot systems, is becoming an instrumental tool in health care, aiding clinical decision-making and patient engagement. Objective: This study aims to analyze the performance of ChatGPT-3.5 and ChatGPT-4 in addressing complex clinical and ethical dilemmas, and to illustrate their potential role in health care decision-making while comparing seniors’ and residents’ ratings, and specific question types. Methods: A total of 4 specialized physicians formulated 176 real-world clinical questions. A total of 8 senior physicians and residents assessed responses from GPT-3.5 and GPT-4 on a 1-5 scale across 5 categories: accuracy, relevance, clarity, utility, and comprehensiveness. Evaluations were conducted within internal medicine, emergency medicine, and ethics. Comparisons were made globally, between seniors and residents, and across classifications. Results: Both GPT models received high mean scores (4.4, SD 0.8 for GPT-4 and 4.1, SD 1.0 for GPT-3.5). GPT-4 outperformed GPT-3.5 across all rating dimensions, with seniors consistently rating responses higher than residents for both models. Specifically, seniors rated GPT-4 as more beneficial and complete (mean 4.6 vs 4.0 and 4.6 vs 4.1, respectively; P<.001), and GPT-3.5 similarly (mean 4.1 vs 3.7 and 3.9 vs 3.5, respectively; P<.001). Ethical queries received the highest ratings for both models, with mean scores reflecting consistency across accuracy and completeness criteria. Distinctions among question types were significant, particularly for the GPT-4 mean scores in completeness across emergency, internal, and ethical questions (4.2, SD 1.0; 4.3, SD 0.8; and 4.5, SD 0.7, respectively; P<.001), and for GPT-3.5’s accuracy, beneficial, and completeness dimensions. Conclusions: ChatGPT’s potential to assist physicians with medical issues is promising, with prospects to enhance diagnostics, treatments, and ethics. While integration into clinical workflows may be valuable, it must complement, not replace, human expertise. Continued research is essential to ensure safe and effective implementation in clinical environments. %M 38935937 %R 10.2196/54571 %U https://www.jmir.org/2024/1/e54571 %U https://doi.org/10.2196/54571 %U http://www.ncbi.nlm.nih.gov/pubmed/38935937 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e55855 %T Using Automated Machine Learning to Predict Necessary Upcoming Therapy Changes in Patients With Psoriasis Vulgaris and Psoriatic Arthritis and Uncover New Influences on Disease Progression: Retrospective Study %A Schaffert,Daniel %A Bibi,Igor %A Blauth,Mara %A Lull,Christian %A von Ahnen,Jan Alwin %A Gross,Georg %A Schulze-Hagen,Theresa %A Knitza,Johannes %A Kuhn,Sebastian %A Benecke,Johannes %A Schmieder,Astrid %A Leipe,Jan %A Olsavszky,Victor %+ Department of Dermatology, Venereology and Allergology, University Medical Center and Medical Faculty Mannheim, University of Heidelberg, and Center of Excellence in Dermatology, Theodor-Kutzer-Ufer 1-3, Mannheim, 68167, Germany, 49 621 383 2280, victor.olsavszky@medma.uni-heidelberg.de %K psoriasis vulgaris %K psoriatic arthritis %K automated machine learning %K therapy change %K Psoriasis Area and Severity Index %K PASI score change %K Bath Ankylosing Spondylitis Disease Activity Index %K BASDAI classification %K mobile phone %D 2024 %7 27.6.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Psoriasis vulgaris (PsV) and psoriatic arthritis (PsA) are complex, multifactorial diseases significantly impacting health and quality of life. Predicting treatment response and disease progression is crucial for optimizing therapeutic interventions, yet challenging. Automated machine learning (AutoML) technology shows promise for rapidly creating accurate predictive models based on patient features and treatment data. Objective: This study aims to develop highly accurate machine learning (ML) models using AutoML to address key clinical questions for PsV and PsA patients, including predicting therapy changes, identifying reasons for therapy changes, and factors influencing skin lesion progression or an abnormal Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) score. Methods: Clinical study data from 309 PsV and PsA patients were extensively prepared and analyzed using AutoML to build and select the most accurate predictive models for each variable of interest. Results: Therapy change at 24 weeks follow-up was modeled using the extreme gradient boosted trees classifier with early stopping (area under the receiver operating characteristic curve [AUC] of 0.9078 and logarithmic loss [LogLoss] of 0.3955 for the holdout partition). Key influencing factors included the initial systemic therapeutic agent, the Classification Criteria for Psoriatic Arthritis score at baseline, and changes in quality of life. An average blender incorporating three models (gradient boosted trees classifier, ExtraTrees classifier, and Eureqa generalized additive model classifier) with an AUC of 0.8750 and LogLoss of 0.4603 was used to predict therapy changes for 2 hypothetical patients, highlighting the significance of these factors. Treatments such as methotrexate or specific biologicals showed a lower propensity for change. An average blender of a random forest classifier, an extreme gradient boosted trees classifier, and a Eureqa classifier (AUC of 0.9241 and LogLoss of 0.4498) was used to estimate PASI (Psoriasis Area and Severity Index) change after 24 weeks. Primary predictors included the initial PASI score, change in pruritus levels, and change in therapy. A lower initial PASI score and consistently low pruritus were associated with better outcomes. BASDAI classification at onset was analyzed using an average blender of a Eureqa generalized additive model classifier, an extreme gradient boosted trees classifier with early stopping, and a dropout additive regression trees classifier with an AUC of 0.8274 and LogLoss of 0.5037. Influential factors included initial pain, disease activity, and Hospital Anxiety and Depression Scale scores for depression and anxiety. Increased pain, disease activity, and psychological distress generally led to higher BASDAI scores. Conclusions: The practical implications of these models for clinical decision-making in PsV and PsA can guide early investigation and treatment, contributing to improved patient outcomes. %M 38738977 %R 10.2196/55855 %U https://formative.jmir.org/2024/1/e55855 %U https://doi.org/10.2196/55855 %U http://www.ncbi.nlm.nih.gov/pubmed/38738977 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e59267 %T Evaluating ChatGPT-4’s Accuracy in Identifying Final Diagnoses Within Differential Diagnoses Compared With Those of Physicians: Experimental Study for Diagnostic Cases %A Hirosawa,Takanobu %A Harada,Yukinori %A Mizuta,Kazuya %A Sakamoto,Tetsu %A Tokumasu,Kazuki %A Shimizu,Taro %+ Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, 880 Kitakobayashi, Mibu-cho, Shimotsuga, Tochigi, 321-0293, Japan, 81 282861111, hirosawa@dokkyomed.ac.jp %K decision support system %K diagnostic errors %K diagnostic excellence %K diagnosis %K large language model %K LLM %K natural language processing %K GPT-4 %K ChatGPT %K diagnoses %K physicians %K artificial intelligence %K AI %K chatbots %K medical diagnosis %K assessment %K decision-making support %K application %K applications %K app %K apps %D 2024 %7 26.6.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: The potential of artificial intelligence (AI) chatbots, particularly ChatGPT with GPT-4 (OpenAI), in assisting with medical diagnosis is an emerging research area. However, it is not yet clear how well AI chatbots can evaluate whether the final diagnosis is included in differential diagnosis lists. Objective: This study aims to assess the capability of GPT-4 in identifying the final diagnosis from differential-diagnosis lists and to compare its performance with that of physicians for case report series. Methods: We used a database of differential-diagnosis lists from case reports in the American Journal of Case Reports, corresponding to final diagnoses. These lists were generated by 3 AI systems: GPT-4, Google Bard (currently Google Gemini), and Large Language Models by Meta AI 2 (LLaMA2). The primary outcome was focused on whether GPT-4’s evaluations identified the final diagnosis within these lists. None of these AIs received additional medical training or reinforcement. For comparison, 2 independent physicians also evaluated the lists, with any inconsistencies resolved by another physician. Results: The 3 AIs generated a total of 1176 differential diagnosis lists from 392 case descriptions. GPT-4’s evaluations concurred with those of the physicians in 966 out of 1176 lists (82.1%). The Cohen κ coefficient was 0.63 (95% CI 0.56-0.69), indicating a fair to good agreement between GPT-4 and the physicians’ evaluations. Conclusions: GPT-4 demonstrated a fair to good agreement in identifying the final diagnosis from differential-diagnosis lists, comparable to physicians for case report series. Its ability to compare differential diagnosis lists with final diagnoses suggests its potential to aid clinical decision-making support through diagnostic feedback. While GPT-4 showed a fair to good agreement for evaluation, its application in real-world scenarios and further validation in diverse clinical environments are essential to fully understand its utility in the diagnostic process. %M 38924784 %R 10.2196/59267 %U https://formative.jmir.org/2024/1/e59267 %U https://doi.org/10.2196/59267 %U http://www.ncbi.nlm.nih.gov/pubmed/38924784 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52001 %T Assessing the Reproducibility of the Structured Abstracts Generated by ChatGPT and Bard Compared to Human-Written Abstracts in the Field of Spine Surgery: Comparative Analysis %A Kim,Hong Jin %A Yang,Jae Hyuk %A Chang,Dong-Gune %A Lenke,Lawrence G %A Pizones,Javier %A Castelein,René %A Watanabe,Kota %A Trobisch,Per D %A Mundis Jr,Gregory M %A Suh,Seung Woo %A Suk,Se-Il %+ Department of Orthopedic Surgery, Inje University Sanggye Paik Hospital, College of Medicine, Inje University, 1342, Dongil-Ro, Nowon-Gu, Seoul, 01757, Republic of Korea, 82 2 950 1284, dgchangmd@gmail.com %K artificial intelligence %K AI %K ChatGPT %K Bard %K scientific abstract %K orthopedic surgery %K spine %K journal guidelines %K plagiarism %K ethics %K spine surgery %K surgery %K language model %K chatbot %K formatting guidelines %K abstract %D 2024 %7 26.6.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Due to recent advances in artificial intelligence (AI), language model applications can generate logical text output that is difficult to distinguish from human writing. ChatGPT (OpenAI) and Bard (subsequently rebranded as “Gemini”; Google AI) were developed using distinct approaches, but little has been studied about the difference in their capability to generate the abstract. The use of AI to write scientific abstracts in the field of spine surgery is the center of much debate and controversy. Objective: The objective of this study is to assess the reproducibility of the structured abstracts generated by ChatGPT and Bard compared to human-written abstracts in the field of spine surgery. Methods: In total, 60 abstracts dealing with spine sections were randomly selected from 7 reputable journals and used as ChatGPT and Bard input statements to generate abstracts based on supplied paper titles. A total of 174 abstracts, divided into human-written abstracts, ChatGPT-generated abstracts, and Bard-generated abstracts, were evaluated for compliance with the structured format of journal guidelines and consistency of content. The likelihood of plagiarism and AI output was assessed using the iThenticate and ZeroGPT programs, respectively. A total of 8 reviewers in the spinal field evaluated 30 randomly extracted abstracts to determine whether they were produced by AI or human authors. Results: The proportion of abstracts that met journal formatting guidelines was greater among ChatGPT abstracts (34/60, 56.6%) compared with those generated by Bard (6/54, 11.1%; P<.001). However, a higher proportion of Bard abstracts (49/54, 90.7%) had word counts that met journal guidelines compared with ChatGPT abstracts (30/60, 50%; P<.001). The similarity index was significantly lower among ChatGPT-generated abstracts (20.7%) compared with Bard-generated abstracts (32.1%; P<.001). The AI-detection program predicted that 21.7% (13/60) of the human group, 63.3% (38/60) of the ChatGPT group, and 87% (47/54) of the Bard group were possibly generated by AI, with an area under the curve value of 0.863 (P<.001). The mean detection rate by human reviewers was 53.8% (SD 11.2%), achieving a sensitivity of 56.3% and a specificity of 48.4%. A total of 56.3% (63/112) of the actual human-written abstracts and 55.9% (62/128) of AI-generated abstracts were recognized as human-written and AI-generated by human reviewers, respectively. Conclusions: Both ChatGPT and Bard can be used to help write abstracts, but most AI-generated abstracts are currently considered unethical due to high plagiarism and AI-detection rates. ChatGPT-generated abstracts appear to be superior to Bard-generated abstracts in meeting journal formatting guidelines. Because humans are unable to accurately distinguish abstracts written by humans from those produced by AI programs, it is crucial to exercise special caution and examine the ethical boundaries of using AI programs, including ChatGPT and Bard. %M 38924787 %R 10.2196/52001 %U https://www.jmir.org/2024/1/e52001 %U https://doi.org/10.2196/52001 %U http://www.ncbi.nlm.nih.gov/pubmed/38924787 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54607 %T Multimodal ChatGPT-4V for Electrocardiogram Interpretation: Promise and Limitations %A Zhu,Lingxuan %A Mou,Weiming %A Wu,Keren %A Lai,Yancheng %A Lin,Anqi %A Yang,Tao %A Zhang,Jian %A Luo,Peng %+ Department of Oncology, Zhujiang Hospital, Southern Medical University, 253 Industrial Avenue, Guangzhou, 510282, China, 86 020 61643888, luopeng@smu.edu.cn %K ChatGPT %K ECG %K electrocardiogram %K multimodal %K artificial intelligence %K AI %K large language model %K diagnostic %K quantitative analysis %K clinical %K clinicians %K ECG interpretation %K cardiovascular care %K cardiovascular %D 2024 %7 26.6.2024 %9 Research Letter %J J Med Internet Res %G English %X This study evaluated the capabilities of the newly released ChatGPT-4V, a large language model with visual recognition abilities, in interpreting electrocardiogram waveforms and answering related multiple-choice questions for assisting with cardiovascular care. %M 38764297 %R 10.2196/54607 %U https://www.jmir.org/2024/1/e54607 %U https://doi.org/10.2196/54607 %U http://www.ncbi.nlm.nih.gov/pubmed/38764297 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e56241 %T Clinical Simulation in the Regulation of Software as a Medical Device: An eDelphi Study %A O'Driscoll,Fiona %A O'Brien,Niki %A Guo,Chaohui %A Prime,Matthew %A Darzi,Ara %A Ghafur,Saira %+ Institute of Global Health Innovation, Imperial College London, Room 1035, Queen Elizabeth Queen Mother Wing, St Mary's Campus, South Wharf Road, London, W2 1NY, United Kingdom, 44 020 7594 1419, saira.ghafur13@imperial.ac.uk %K digital health technology %K software as a medical device %K clinical simulation %K Delphi study %K eDelphi study %K artificial intelligence %K digital health %D 2024 %7 25.6.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Accelerated digitalization in the health sector requires the development of appropriate evaluation methods to ensure that digital health technologies (DHTs) are safe and effective. Software as a medical device (SaMD) is a commonly used DHT by clinicians to provide care to patients. Traditional research methods for evaluating health care products, such as randomized clinical trials, may not be suitable for DHTs, such as SaMD. However, evidence to show their safety and efficacy is needed by regulators before they can be used in practice. Clinical simulation can be used by researchers to test SaMD in an agile and low-cost way; yet, there is limited research on criteria to assess the robustness of simulations and, subsequently, their relevance for a regulatory decision. Objective: The objective of this study was to gain consensus on the criteria that should be used to assess clinical simulation from a regulatory perspective when it is used to generate evidence for SaMD. Methods: An eDelphi study approach was chosen to develop a set of criteria to assess clinical simulation when used to evaluate SaMD. Participants were recruited through purposive and snowball sampling based on their experience and knowledge in relevant sectors. They were guided through an initial scoping questionnaire with key themes identified from the literature to obtain a comprehensive list of criteria. Participants voted upon these criteria in 2 Delphi rounds, with criteria being excluded if consensus was not met. Participants were invited to add qualitative comments during rounds and qualitative analysis was performed on the comments gathered during the first round. Consensus was predefined by 2 criteria: if <10% of the panelists deemed the criteria as “not important” or “not important at all” and >60% “important” or “very important.” Results: In total, 33 international experts in the digital health field, including academics, regulators, policy makers, and industry representatives, completed both Delphi rounds, and 43 criteria gained consensus from the participants. The research team grouped these criteria into 7 domains—background and context, overall study design, study population, delivery of the simulation, fidelity, software and artificial intelligence, and study analysis. These 7 domains were formulated into the simulation for regulation of SaMD framework. There were key areas of concern identified by participants regarding the framework criteria, such as the importance of how simulation fidelity is achieved and reported and the avoidance of bias throughout all stages. Conclusions: This study proposes the simulation for regulation of SaMD framework, developed through an eDelphi consensus process, to evaluate clinical simulation when used to assess SaMD. Future research should prioritize the development of safe and effective SaMD, while implementing and refining the framework criteria to adapt to new challenges. %M 38917454 %R 10.2196/56241 %U https://formative.jmir.org/2024/1/e56241 %U https://doi.org/10.2196/56241 %U http://www.ncbi.nlm.nih.gov/pubmed/38917454 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e52316 %T Leveraging Social Media to Predict COVID-19–Induced Disruptions to Mental Well-Being Among University Students: Modeling Study %A Das Swain,Vedant %A Ye,Jingjing %A Ramesh,Siva Karthik %A Mondal,Abhirup %A Abowd,Gregory D %A De Choudhury,Munmun %+ Khoury College of Computer Sciences, Northeastern University, #202, West Village Residence Complex H, 440 Huntington Ave, Boston, MA, 02115, United States, 1 (404) 894 2000, vedantswain@gmail.com %K social media %K mental health %K linguistic markers %K digital phenotyping %K COVID-19 %K disaster well-being %K well-being %K machine learning %K temporal trends %K disruption %D 2024 %7 25.6.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Large-scale crisis events such as COVID-19 often have secondary impacts on individuals’ mental well-being. University students are particularly vulnerable to such impacts. Traditional survey-based methods to identify those in need of support do not scale over large populations and they do not provide timely insights. We pursue an alternative approach through social media data and machine learning. Our models aim to complement surveys and provide early, precise, and objective predictions of students disrupted by COVID-19. Objective: This study aims to demonstrate the feasibility of language on private social media as an indicator of crisis-induced disruption to mental well-being. Methods: We modeled 4124 Facebook posts provided by 43 undergraduate students, spanning over 2 years. We extracted temporal trends in the psycholinguistic attributes of their posts and comments. These trends were used as features to predict how COVID-19 disrupted their mental well-being. Results: The social media–enabled model had an F1-score of 0.79, which was a 39% improvement over a model trained on the self-reported mental state of the participant. The features we used showed promise in predicting other mental states such as anxiety, depression, social, isolation, and suicidal behavior (F1-scores varied between 0.85 and 0.93). We also found that selecting the windows of time 7 months after the COVID-19–induced lockdown presented better results, therefore, paving the way for data minimization. Conclusions: We predicted COVID-19–induced disruptions to mental well-being by developing a machine learning model that leveraged language on private social media. The language in these posts described psycholinguistic trends in students’ online behavior. These longitudinal trends helped predict mental well-being disruption better than models trained on correlated mental health questionnaires. Our work inspires further research into the potential applications of early, precise, and automatic warnings for individuals concerned about their mental health in times of crisis. %M 38916951 %R 10.2196/52316 %U https://formative.jmir.org/2024/1/e52316 %U https://doi.org/10.2196/52316 %U http://www.ncbi.nlm.nih.gov/pubmed/38916951 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e54798 %T Augmenting Telepostpartum Care With Vision-Based Detection of Breastfeeding-Related Conditions: Algorithm Development and Validation %A De Souza,Jessica %A Viswanath,Varun Kumar %A Echterhoff,Jessica Maria %A Chamberlain,Kristina %A Wang,Edward Jay %+ Department of Electrical and Computer Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, United States, 1 (858) 534 7013, jdesouza@ucsd.edu %K remote consultations %K artificial intelligence %K AI for health care %K deep learning %K detection model %K breastfeeding %K telehealth %K perinatal health %K image analysis %K women’s health %K mobile phone %D 2024 %7 24.6.2024 %9 Original Paper %J JMIR AI %G English %X Background: Breastfeeding benefits both the mother and infant and is a topic of attention in public health. After childbirth, untreated medical conditions or lack of support lead many mothers to discontinue breastfeeding. For instance, nipple damage and mastitis affect 80% and 20% of US mothers, respectively. Lactation consultants (LCs) help mothers with breastfeeding, providing in-person, remote, and hybrid lactation support. LCs guide, encourage, and find ways for mothers to have a better experience breastfeeding. Current telehealth services help mothers seek LCs for breastfeeding support, where images help them identify and address many issues. Due to the disproportional ratio of LCs and mothers in need, these professionals are often overloaded and burned out. Objective: This study aims to investigate the effectiveness of 5 distinct convolutional neural networks in detecting healthy lactating breasts and 6 breastfeeding-related issues by only using red, green, and blue images. Our goal was to assess the applicability of this algorithm as an auxiliary resource for LCs to identify painful breast conditions quickly, better manage their patients through triage, respond promptly to patient needs, and enhance the overall experience and care for breastfeeding mothers. Methods: We evaluated the potential for 5 classification models to detect breastfeeding-related conditions using 1078 breast and nipple images gathered from web-based and physical educational resources. We used the convolutional neural networks Resnet50, Visual Geometry Group model with 16 layers (VGG16), InceptionV3, EfficientNetV2, and DenseNet169 to classify the images across 7 classes: healthy, abscess, mastitis, nipple blebs, dermatosis, engorgement, and nipple damage by improper feeding or misuse of breast pumps. We also evaluated the models’ ability to distinguish between healthy and unhealthy images. We present an analysis of the classification challenges, identifying image traits that may confound the detection model. Results: The best model achieves an average area under the receiver operating characteristic curve of 0.93 for all conditions after data augmentation for multiclass classification. For binary classification, we achieved, with the best model, an average area under the curve of 0.96 for all conditions after data augmentation. Several factors contributed to the misclassification of images, including similar visual features in the conditions that precede other conditions (such as the mastitis spectrum disorder), partially covered breasts or nipples, and images depicting multiple conditions in the same breast. Conclusions: This vision-based automated detection technique offers an opportunity to enhance postpartum care for mothers and can potentially help alleviate the workload of LCs by expediting decision-making processes. %R 10.2196/54798 %U https://ai.jmir.org/2024/1/e54798 %U https://doi.org/10.2196/54798 %0 Journal Article %@ 2562-7600 %I JMIR Publications %V 7 %N %P e55793 %T A Scalable and Extensible Logical Data Model of Electronic Health Record Audit Logs for Temporal Data Mining (RNteract): Model Conceptualization and Formulation %A Tiase,Victoria L %A Sward,Katherine A %A Facelli,Julio C %+ Department of Biomedical Informatics, University of Utah, 421 Wakara Way, Salt Lake City, UT, 84108, United States, 1 801 585 3945, victoria.tiase@utah.edu %K burnout %K professional %K nursing %K nurse %K electronic health record %K EHR %K data modeling %K data set %K temporal machine learning %K machine learning %K ML %K artificial intelligence %K AI %K algorithm %K predictive model %K predictive analytics %K practical model %D 2024 %7 24.6.2024 %9 Original Paper %J JMIR Nursing %G English %X Background: Increased workload, including workload related to electronic health record (EHR) documentation, is reported as a main contributor to nurse burnout and adversely affects patient safety and nurse satisfaction. Traditional methods for workload analysis are either administrative measures (such as the nurse-patient ratio) that do not represent actual nursing care or are subjective and limited to snapshots of care (eg, time-motion studies). Observing care and testing workflow changes in real time can be obstructive to clinical care. An examination of EHR interactions using EHR audit logs could provide a scalable, unobtrusive way to quantify the nursing workload, at least to the extent that nursing work is represented in EHR documentation. EHR audit logs are extremely complex; however, simple analytical methods cannot discover complex temporal patterns, requiring use of state-of-the-art temporal data-mining approaches. To effectively use these approaches, it is necessary to structure the raw audit logs into a consistent and scalable logical data model that can be consumed by machine learning (ML) algorithms. Objective: We aimed to conceptualize a logical data model for nurse-EHR interactions that would support the future development of temporal ML models based on EHR audit log data. Methods: We conducted a preliminary review of EHR audit logs to understand the types of nursing-specific data captured. Using concepts derived from the literature and our previous experience studying temporal patterns in biomedical data, we formulated a logical data model that can describe nurse-EHR interactions, the nurse-intrinsic and situational characteristics that may influence those interactions, and outcomes of relevance to the nursing workload in a scalable and extensible manner. Results: We describe the data structure and concepts from EHR audit log data associated with nursing workload as a logical data model named RNteract. We conceptually demonstrate how using this logical data model could support temporal unsupervised ML and state-of-the-art artificial intelligence (AI) methods for predictive modeling. Conclusions: The RNteract logical data model appears capable of supporting a variety of AI-based systems and should be generalizable to any type of EHR system or health care setting. Quantitatively identifying and analyzing temporal patterns of nurse-EHR interactions is foundational for developing interventions that support the nursing documentation workload and address nurse burnout. %M 38913994 %R 10.2196/55793 %U https://nursing.jmir.org/2024/1/e55793 %U https://doi.org/10.2196/55793 %U http://www.ncbi.nlm.nih.gov/pubmed/38913994 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e49613 %T Dermoscopy Differential Diagnosis Explorer (D3X) Ontology to Aggregate and Link Dermoscopic Patterns to Differential Diagnoses: Development and Usability Study %A Lin,Rebecca Z %A Amith,Muhammad Tuan %A Wang,Cynthia X %A Strickley,John %A Tao,Cui %+ Department of Artificial Intelligence and Informatics, Mayo Clinic, 4500 San Pablo Road, Jacksonville, FL, 32224, United States, 1 9049530255, Tao.Cui@mayo.edu %K medical informatics %K biomedical ontology %K ontology %K ontologies %K vocabulary %K OWL %K web ontology language %K skin %K semiotic %K web app %K web application %K visual %K visualization %K dermoscopic %K diagnosis %K diagnoses %K diagnostic %K information storage %K information retrieval %K skin lesion %K skin diseases %K dermoscopy differential diagnosis explorer %K dermatology %K dermoscopy %K differential diagnosis %K information storage and retrieval %D 2024 %7 21.6.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Dermoscopy is a growing field that uses microscopy to allow dermatologists and primary care physicians to identify skin lesions. For a given skin lesion, a wide variety of differential diagnoses exist, which may be challenging for inexperienced users to name and understand. Objective: In this study, we describe the creation of the dermoscopy differential diagnosis explorer (D3X), an ontology linking dermoscopic patterns to differential diagnoses. Methods: Existing ontologies that were incorporated into D3X include the elements of visuals ontology and dermoscopy elements of visuals ontology, which connect visual features to dermoscopic patterns. A list of differential diagnoses for each pattern was generated from the literature and in consultation with domain experts. Open-source images were incorporated from DermNet, Dermoscopedia, and open-access research papers. Results: D3X was encoded in the OWL 2 web ontology language and includes 3041 logical axioms, 1519 classes, 103 object properties, and 20 data properties. We compared D3X with publicly available ontologies in the dermatology domain using a semiotic theory–driven metric to measure the innate qualities of D3X with others. The results indicate that D3X is adequately comparable with other ontologies of the dermatology domain. Conclusions: The D3X ontology is a resource that can link and integrate dermoscopic differential diagnoses and supplementary information with existing ontology-based resources. Future directions include developing a web application based on D3X for dermoscopy education and clinical practice. %M 38904996 %R 10.2196/49613 %U https://medinform.jmir.org/2024/1/e49613 %U https://doi.org/10.2196/49613 %U http://www.ncbi.nlm.nih.gov/pubmed/38904996 %0 Journal Article %@ 2369-2529 %I JMIR Publications %V 11 %N %P e48129 %T The Value of a Virtual Assistant to Improve Engagement in Computerized Cognitive Training at Home: Exploratory Study %A Zsoldos,Isabella %A Trân,Eléonore %A Fournier,Hippolyte %A Tarpin-Bernard,Franck %A Fruitet,Joan %A Fouillen,Mélodie %A Bailly,Gérard %A Elisei,Frédéric %A Bouchot,Béatrice %A Constant,Patrick %A Ringeval,Fabien %A Koenig,Olivier %A Chainay,Hanna %+ Laboratoire d’Étude des Mécanismes Cognitifs, Université Lumière Lyon 2, 5 Avenue Pierre Mendès France, Lyon, 69500, France, 33 478774335, isabella.zsoldos@hotmail.fr %K cognitive training %K cognitive decline %K cognitive disorders %K mild cognitive impairment %K Alzheimer disease %K digital therapies %K virtual health assistant %K conversational agent %K artificial intelligence %K social interaction %K THERADIA %D 2024 %7 20.6.2024 %9 Original Paper %J JMIR Rehabil Assist Technol %G English %X Background: Impaired cognitive function is observed in many pathologies, including neurodegenerative diseases such as Alzheimer disease. At present, the pharmaceutical treatments available to counter cognitive decline have only modest effects, with significant side effects. A nonpharmacological treatment that has received considerable attention is computerized cognitive training (CCT), which aims to maintain or improve cognitive functioning through repeated practice in standardized exercises. CCT allows for more regular and thorough training of cognitive functions directly at home, which represents a significant opportunity to prevent and fight cognitive decline. However, the presence of assistance during training seems to be an important parameter to improve patients’ motivation and adherence to treatment. To compensate for the absence of a therapist during at-home CCT, a relevant option could be to include a virtual assistant to accompany patients throughout their training. Objective: The objective of this exploratory study was to evaluate the interest of including a virtual assistant to accompany patients during CCT. We investigated the relationship between various individual factors (eg, age, psycho-affective functioning, personality, personal motivations, and cognitive skills) and the appreciation and usefulness of a virtual assistant during CCT. This study is part of the THERADIA (Thérapies Digitales Augmentées par l’Intelligence Artificielle) project, which aims to develop an empathetic virtual assistant. Methods: A total of 104 participants were recruited, including 52 (50%) young adults (mean age 21.2, range 18 to 27, SD 2.9 years) and 52 (50%) older adults (mean age 67.9, range 60 to 79, SD 5.1 years). All participants were invited to the laboratory to answer several questionnaires and perform 1 CCT session, which consisted of 4 cognitive exercises supervised by a virtual assistant animated by a human pilot via the Wizard of Oz method. The participants evaluated the virtual assistant and CCT at the end of the session. Results: Analyses were performed using the Bayesian framework. The results suggest that the virtual assistant was appreciated and perceived as useful during CCT in both age groups. However, older adults rated the assistant and CCT more positively overall than young adults. Certain characteristics of users, especially their current affective state (ie, arousal, intrinsic relevance, goal conduciveness, and anxiety state), appeared to be related to their evaluation of the session. Conclusions: This study provides, for the first time, insight into how young and older adults perceive a virtual assistant during CCT. The results suggest that such an assistant could have a beneficial influence on users’ motivation, provided that it can handle different situations, particularly their emotional state. The next step of our project will be to evaluate our device with patients experiencing mild cognitive impairment and to test its effectiveness in long-term cognitive training. %M 38901017 %R 10.2196/48129 %U https://rehab.jmir.org/2024/1/e48129 %U https://doi.org/10.2196/48129 %U http://www.ncbi.nlm.nih.gov/pubmed/38901017 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e50209 %T Retrieval-Based Diagnostic Decision Support: Mixed Methods Study %A Abdullahi,Tassallah %A Mercurio,Laura %A Singh,Ritambhara %A Eickhoff,Carsten %+ School of Medicine, University of Tübingen, Schaffhausenstr, 77, Tübingen, 72072, Germany, 49 7071 29 843, carsten.eickhoff@uni-tuebingen.de %K clinical decision support %K rare diseases %K ensemble learning %K retrieval-augmented learning %K machine learning %K electronic health records %K natural language processing %K retrieval augmented generation %K RAG %K electronic health record %K EHR %K data sparsity %K information retrieval %D 2024 %7 19.6.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Diagnostic errors pose significant health risks and contribute to patient mortality. With the growing accessibility of electronic health records, machine learning models offer a promising avenue for enhancing diagnosis quality. Current research has primarily focused on a limited set of diseases with ample training data, neglecting diagnostic scenarios with limited data availability. Objective: This study aims to develop an information retrieval (IR)–based framework that accommodates data sparsity to facilitate broader diagnostic decision support. Methods: We introduced an IR-based diagnostic decision support framework called CliniqIR. It uses clinical text records, the Unified Medical Language System Metathesaurus, and 33 million PubMed abstracts to classify a broad spectrum of diagnoses independent of training data availability. CliniqIR is designed to be compatible with any IR framework. Therefore, we implemented it using both dense and sparse retrieval approaches. We compared CliniqIR’s performance to that of pretrained clinical transformer models such as Clinical Bidirectional Encoder Representations from Transformers (ClinicalBERT) in supervised and zero-shot settings. Subsequently, we combined the strength of supervised fine-tuned ClinicalBERT and CliniqIR to build an ensemble framework that delivers state-of-the-art diagnostic predictions. Results: On a complex diagnosis data set (DC3) without any training data, CliniqIR models returned the correct diagnosis within their top 3 predictions. On the Medical Information Mart for Intensive Care III data set, CliniqIR models surpassed ClinicalBERT in predicting diagnoses with <5 training samples by an average difference in mean reciprocal rank of 0.10. In a zero-shot setting where models received no disease-specific training, CliniqIR still outperformed the pretrained transformer models with a greater mean reciprocal rank of at least 0.10. Furthermore, in most conditions, our ensemble framework surpassed the performance of its individual components, demonstrating its enhanced ability to make precise diagnostic predictions. Conclusions: Our experiments highlight the importance of IR in leveraging unstructured knowledge resources to identify infrequently encountered diagnoses. In addition, our ensemble framework benefits from combining the complementary strengths of the supervised and retrieval-based models to diagnose a broad spectrum of diseases. %M 38896468 %R 10.2196/50209 %U https://medinform.jmir.org/2024/1/e50209 %U https://doi.org/10.2196/50209 %U http://www.ncbi.nlm.nih.gov/pubmed/38896468 %0 Journal Article %@ 2373-6658 %I JMIR Publications %V 8 %N %P e55321 %T Perspectives on Artificial Intelligence in Nursing in Asia %A Lukkahatai,Nada %A Han,Gyumin %+ School of Nursing, Johns Hopkins University, 525 N Wolfe Street, Baltimore, MD, 21205, United States, 1 4106145297, nada.lukkahatai@jhu.edu %K machine learning %K ML %K artificial intelligence %K AI %K algorithm %K predictive model %K predictive analytics %K predictive system %K practical model %K deep learning %K ChatGPT %K chatbot %K nursing %K nurse %K nursing education %K personalized education %K Asia %D 2024 %7 19.6.2024 %9 Viewpoint %J Asian Pac Isl Nurs J %G English %X Artificial intelligence (AI) is reshaping health care, including nursing, across Asia, presenting opportunities to improve patient care and outcomes. This viewpoint presents our perspective and interpretation of the current AI landscape, acknowledging its evolution driven by enhanced processing capabilities, extensive data sets, and refined algorithms. Notable applications in countries such as Singapore, South Korea, Japan, and China showcase the integration of AI-powered technologies such as chatbots, virtual assistants, data mining, and automated risk assessment systems. This paper further explores the transformative impact of AI on nursing education, emphasizing personalized learning, adaptive approaches, and AI-enriched simulation tools, and discusses the opportunities and challenges of these developments. We argue for the harmonious coexistence of traditional nursing values with AI innovations, marking a significant stride toward a promising health care future in Asia. %M 38896473 %R 10.2196/55321 %U https://apinj.jmir.org/2024/1/e55321 %U https://doi.org/10.2196/55321 %U http://www.ncbi.nlm.nih.gov/pubmed/38896473 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 11 %N %P e53203 %T The Machine Speaks: Conversational AI and the Importance of Effort to Relationships of Meaning %A Hartford,Anna %A Stein,Dan J %+ Neuroscience Institute, University of Cape Town, Groote Schuur Hospital, Observatory, Cape Town, 7935, South Africa, 27 214042174, annahartford@gmail.com %K artificial intelligence %K AI %K conversational AIs %K generative AI %K intimacy %K human-machine interaction %K interpersonal relationships %K effort %K psychotherapy %K conversation %D 2024 %7 18.6.2024 %9 Viewpoint %J JMIR Ment Health %G English %X The focus of debates about conversational artificial intelligence (CAI) has largely been on social and ethical concerns that arise when we speak to machines—what is gained and what is lost when we replace our human interlocutors, including our human therapists, with AI. In this viewpoint, we focus instead on a distinct and growing phenomenon: letting machines speak for us. What is at stake when we replace our own efforts at interpersonal engagement with CAI? The purpose of these technologies is, in part, to remove effort, but effort has enormous value, and in some cases, even intrinsic value. This is true in many realms, but especially in interpersonal relationships. To make an effort for someone, irrespective of what that effort amounts to, often conveys value and meaning in itself. We elaborate on the meaning, worth, and significance that may be lost when we relinquish effort in our interpersonal engagements as well as on the opportunities for self-understanding and growth that we may forsake. %M 38889401 %R 10.2196/53203 %U https://mental.jmir.org/2024/1/e53203 %U https://doi.org/10.2196/53203 %U http://www.ncbi.nlm.nih.gov/pubmed/38889401 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e56064 %T A Prediction Model for Identifying Seasonal Influenza Vaccination Uptake Among Children in Wuxi, China: Prospective Observational Study %A Wang,Qiang %A Yang,Liuqing %A Xiu,Shixin %A Shen,Yuan %A Jin,Hui %A Lin,Leesa %+ Department of Infectious Disease Epidemiology, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, United Kingdom, 44 617 632 6142, Leesa.Lin@lshtm.ac.uk %K influenza %K vaccination %K children %K prediction model %K China %K vaccine %K behaviors %K health care professional %K intervention %K sociodemographics %K vaccine hesitancy %K clinic %K Bayesian network %K logistic regression %K accuracy %K Cohen κ %K prediction %K public health %K immunization %K digital age %D 2024 %7 17.6.2024 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Predicting vaccination behaviors accurately could provide insights for health care professionals to develop targeted interventions. Objective: The aim of this study was to develop predictive models for influenza vaccination behavior among children in China. Methods: We obtained data from a prospective observational study in Wuxi, eastern China. The predicted outcome was individual-level vaccine uptake and covariates included sociodemographics of the child and parent, parental vaccine hesitancy, perceptions of convenience to the clinic, satisfaction with clinic services, and willingness to vaccinate. Bayesian networks, logistic regression, least absolute shrinkage and selection operator (LASSO) regression, support vector machine (SVM), naive Bayes (NB), random forest (RF), and decision tree classifiers were used to construct prediction models. Various performance metrics, including area under the receiver operating characteristic curve (AUC), were used to evaluate the predictive performance of the different models. Receiver operating characteristic curves and calibration plots were used to assess model performance. Results: A total of 2383 participants were included in the study; 83.2% of these children (n=1982) were <5 years old and 6.6% (n=158) had previously received an influenza vaccine. More than half (1356/2383, 56.9%) the parents indicated a willingness to vaccinate their child against influenza. Among the 2383 children, 26.3% (n=627) received influenza vaccination during the 2020-2021 season. Within the training set, the RF model showed the best performance across all metrics. In the validation set, the logistic regression model and NB model had the highest AUC values; the SVM model had the highest precision; the NB model had the highest recall; and the logistic regression model had the highest accuracy, F1 score, and Cohen κ value. The LASSO and logistic regression models were well-calibrated. Conclusions: The developed prediction model can be used to quantify the uptake of seasonal influenza vaccination for children in China. The stepwise logistic regression model may be better suited for prediction purposes. %M 38885032 %R 10.2196/56064 %U https://publichealth.jmir.org/2024/1/e56064 %U https://doi.org/10.2196/56064 %U http://www.ncbi.nlm.nih.gov/pubmed/38885032 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e53297 %T Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study %A Masanneck,Lars %A Schmidt,Linea %A Seifert,Antonia %A Kölsche,Tristan %A Huntemann,Niklas %A Jansen,Robin %A Mehsin,Mohammed %A Bernhard,Michael %A Meuth,Sven G %A Böhm,Lennert %A Pawlitzki,Marc %+ Department of Neurology, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Moorenstraße 5, Düsseldorf, 40225, Germany, 49 0211 81 17880, lars.masanneck@med.uni-duesseldorf.de %K emergency medicine %K triage %K artificial intelligence %K large language models %K ChatGPT %K untrained doctors %K doctor %K doctors %K comparative study %K digital health %K personnel %K staff %K cohort %K Germany %K German %D 2024 %7 14.6.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Large language models (LLMs) have demonstrated impressive performances in various medical domains, prompting an exploration of their potential utility within the high-demand setting of emergency department (ED) triage. This study evaluated the triage proficiency of different LLMs and ChatGPT, an LLM-based chatbot, compared to professionally trained ED staff and untrained personnel. We further explored whether LLM responses could guide untrained staff in effective triage. Objective: This study aimed to assess the efficacy of LLMs and the associated product ChatGPT in ED triage compared to personnel of varying training status and to investigate if the models’ responses can enhance the triage proficiency of untrained personnel. Methods: A total of 124 anonymized case vignettes were triaged by untrained doctors; different versions of currently available LLMs; ChatGPT; and professionally trained raters, who subsequently agreed on a consensus set according to the Manchester Triage System (MTS). The prototypical vignettes were adapted from cases at a tertiary ED in Germany. The main outcome was the level of agreement between raters’ MTS level assignments, measured via quadratic-weighted Cohen κ. The extent of over- and undertriage was also determined. Notably, instances of ChatGPT were prompted using zero-shot approaches without extensive background information on the MTS. The tested LLMs included raw GPT-4, Llama 3 70B, Gemini 1.5, and Mixtral 8x7b. Results: GPT-4–based ChatGPT and untrained doctors showed substantial agreement with the consensus triage of professional raters (κ=mean 0.67, SD 0.037 and κ=mean 0.68, SD 0.056, respectively), significantly exceeding the performance of GPT-3.5–based ChatGPT (κ=mean 0.54, SD 0.024; P<.001). When untrained doctors used this LLM for second-opinion triage, there was a slight but statistically insignificant performance increase (κ=mean 0.70, SD 0.047; P=.97). Other tested LLMs performed similar to or worse than GPT-4–based ChatGPT or showed odd triaging behavior with the used parameters. LLMs and ChatGPT models tended toward overtriage, whereas untrained doctors undertriaged. Conclusions: While LLMs and the LLM-based product ChatGPT do not yet match professionally trained raters, their best models’ triage proficiency equals that of untrained ED doctors. In its current form, LLMs or ChatGPT thus did not demonstrate gold-standard performance in ED triage and, in the setting of this study, failed to significantly improve untrained doctors’ triage when used as decision support. Notable performance enhancements in newer LLM versions over older ones hint at future improvements with further technological development and specific training. %M 38875696 %R 10.2196/53297 %U https://www.jmir.org/2024/1/e53297 %U https://doi.org/10.2196/53297 %U http://www.ncbi.nlm.nih.gov/pubmed/38875696 %0 Journal Article %@ 2369-3762 %I %V 10 %N %P e54987 %T Evolution of Chatbots in Nursing Education: Narrative Review %A Zhang,Fang %A Liu,Xiaoliu %A Wu,Wenyan %A Zhu,Shiben %K nursing education %K chatbots %K artificial intelligence %K narrative review %K ChatGPT %D 2024 %7 13.6.2024 %9 %J JMIR Med Educ %G English %X Background: The integration of chatbots in nursing education is a rapidly evolving area with potential transformative impacts. This narrative review aims to synthesize and analyze the existing literature on chatbots in nursing education. Objective: This study aims to comprehensively examine the temporal trends, international distribution, study designs, and implications of chatbots in nursing education. Methods: A comprehensive search was conducted across 3 databases (PubMed, Web of Science, and Embase) following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram. Results: A total of 40 articles met the eligibility criteria, with a notable increase of publications in 2023 (n=28, 70%). Temporal analysis revealed a notable surge in publications from 2021 to 2023, emphasizing the growing scholarly interest. Geographically, Taiwan province made substantial contributions (n=8, 20%), followed by the United States (n=6, 15%) and South Korea (n=4, 10%). Study designs varied, with reviews (n=8, 20%) and editorials (n=7, 18%) being predominant, showcasing the richness of research in this domain. Conclusions: Integrating chatbots into nursing education presents a promising yet relatively unexplored avenue. This review highlights the urgent need for original research, emphasizing the importance of ethical considerations. %R 10.2196/54987 %U https://mededu.jmir.org/2024/1/e54987 %U https://doi.org/10.2196/54987 %0 Journal Article %@ 2562-7600 %I JMIR Publications %V 7 %N %P e52105 %T Navigating the Pedagogical Landscape: Exploring the Implications of AI and Chatbots in Nursing Education %A Srinivasan,Muthuvenkatachalam %A Venugopal,Ambili %A Venkatesan,Latha %A Kumar,Rajesh %+ College of Nursing, All India Institute of Medical Sciences, Guntur Dt, Mangalagiri, 522503, India, 91 9410366146, muthu.venky@gmail.com %K AI %K artificial intelligence %K ChatGPT %K chatbots %K nursing education %K education %K chatbot %K nursing %K ethical %K ethics %K ethical consideration %K accessible %K learning %K efficiency %K student %K student engagement %K student learning %D 2024 %7 13.6.2024 %9 Viewpoint %J JMIR Nursing %G English %X This viewpoint paper explores the pedagogical implications of artificial intelligence (AI) and AI-based chatbots such as ChatGPT in nursing education, examining their potential uses, benefits, challenges, and ethical considerations. AI and chatbots offer transformative opportunities for nursing education, such as personalized learning, simulation and practice, accessible learning, and improved efficiency. They have the potential to increase student engagement and motivation, enhance learning outcomes, and augment teacher support. However, the integration of these technologies also raises ethical considerations, such as privacy, confidentiality, and bias. The viewpoint paper provides a comprehensive overview of the current state of AI and chatbots in nursing education, offering insights into best practices and guidelines for their integration. By examining the impact of AI and ChatGPT on student learning, engagement, and teacher effectiveness and efficiency, this review aims to contribute to the ongoing discussion on the use of AI and chatbots in nursing education and provide recommendations for future research and development in the field. %M 38870516 %R 10.2196/52105 %U https://nursing.jmir.org/2024/1/e52105 %U https://doi.org/10.2196/52105 %U http://www.ncbi.nlm.nih.gov/pubmed/38870516 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e53574 %T Understanding COVID-19 Impacts on the Health Workforce: AI-Assisted Open-Source Media Content Analysis %A Pienkowska,Anita %A Ravaut,Mathieu %A Mammadova,Maleyka %A Ang,Chin-Siang %A Wang,Hanyu %A Ong,Qi Chwen %A Bojic,Iva %A Qin,Vicky Mengqi %A Sumsuzzman,Dewan Md %A Ajuebor,Onyema %A Boniol,Mathieu %A Bustamante,Juana Paola %A Campbell,James %A Cometto,Giorgio %A Fitzpatrick,Siobhan %A Kane,Catherine %A Joty,Shafiq %A Car,Josip %+ Lee Kong Chian School of Medicine, Nanyang Technological University, 11 Mandalay Rd, Clinical Sciences Building, Singapore, 308232, Singapore, 65 6513 8572, iva.bojic@ntu.edu.sg %K World Health Organization %K WHO %K public surveillance %K natural language processing %K NLP %K artificial intelligence %K AI %K COVID-19 %K SARS-COV-2 %K COVID-19 pandemic %K human-generated analysis %K decision-making %K strategic policy %K health workforce %K news article %K media content analysis %K news coverage %K health care worker %K mental health %K death risk %K intervention %K efficiency %K public health %K surveillance %K innovation %K innovative method %D 2024 %7 13.6.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: To investigate the impacts of the COVID-19 pandemic on the health workforce, we aimed to develop a framework that synergizes natural language processing (NLP) techniques and human-generated analysis to reduce, organize, classify, and analyze a vast volume of publicly available news articles to complement scientific literature and support strategic policy dialogue, advocacy, and decision-making. Objective: This study aimed to explore the possibility of systematically scanning intelligence from media that are usually not captured or best gathered through structured academic channels and inform on the impacts of the COVID-19 pandemic on the health workforce, contributing factors to the pervasiveness of the impacts, and policy responses, as depicted in publicly available news articles. Our focus was to investigate the impacts of the COVID-19 pandemic and, concurrently, assess the feasibility of gathering health workforce insights from open sources rapidly. Methods: We conducted an NLP-assisted media content analysis of open-source news coverage on the COVID-19 pandemic published between January 2020 and June 2022. A data set of 3,299,158 English news articles on the COVID-19 pandemic was extracted from the World Health Organization Epidemic Intelligence through Open Sources (EIOS) system. The data preparation phase included developing rules-based classification, fine-tuning an NLP summarization model, and further data processing. Following relevancy evaluation, a deductive-inductive approach was used for the analysis of the summarizations. This included data extraction, inductive coding, and theme grouping. Results: After processing and classifying the initial data set comprising 3,299,158 news articles and reports, a data set of 5131 articles with 3,007,693 words was devised. The NLP summarization model allowed for a reduction in the length of each article resulting in 496,209 words that facilitated agile analysis performed by humans. Media content analysis yielded results in 3 sections: areas of COVID-19 impacts and their pervasiveness, contributing factors to COVID-19–related impacts, and responses to the impacts. The results suggest that insufficient remuneration and compensation packages have been key disruptors for the health workforce during the COVID-19 pandemic, leading to industrial actions and mental health burdens. Shortages of personal protective equipment and occupational risks have increased infection and death risks, particularly at the pandemic’s onset. Workload and staff shortages became a growing disruption as the pandemic progressed. Conclusions: This study demonstrates the capacity of artificial intelligence–assisted media content analysis applied to open-source news articles and reports concerning the health workforce. Adequate remuneration packages and personal protective equipment supplies should be prioritized as preventive measures to reduce the initial impact of future pandemics on the health workforce. Interventions aimed at lessening the emotional toll and workload need to be formulated as a part of reactive measures, enhancing the efficiency and maintainability of health delivery during a pandemic. %M 38869940 %R 10.2196/53574 %U https://formative.jmir.org/2024/1/e53574 %U https://doi.org/10.2196/53574 %U http://www.ncbi.nlm.nih.gov/pubmed/38869940 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 11 %N %P e50939 %T Assessing the Utility, Impact, and Adoption Challenges of an Artificial Intelligence–Enabled Prescription Advisory Tool for Type 2 Diabetes Management: Qualitative Study %A Yoon,Sungwon %A Goh,Hendra %A Lee,Phong Ching %A Tan,Hong Chang %A Teh,Ming Ming %A Lim,Dawn Shao Ting %A Kwee,Ann %A Suresh,Chandran %A Carmody,David %A Swee,Du Soon %A Tan,Sarah Ying Tse %A Wong,Andy Jun-Wei %A Choo,Charlotte Hui-Min %A Wee,Zongwen %A Bee,Yong Mong %+ Health Services and Systems Research, Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore, 65 66013198, sungwon.yoon@duke-nus.edu.sg %K clinical decision support system %K artificial intelligence %K endocrinology %K diabetes management %K human factors %D 2024 %7 13.6.2024 %9 Original Paper %J JMIR Hum Factors %G English %X Background: The clinical management of type 2 diabetes mellitus (T2DM) presents a significant challenge due to the constantly evolving clinical practice guidelines and growing array of drug classes available. Evidence suggests that artificial intelligence (AI)–enabled clinical decision support systems (CDSSs) have proven to be effective in assisting clinicians with informed decision-making. Despite the merits of AI-driven CDSSs, a significant research gap exists concerning the early-stage implementation and adoption of AI-enabled CDSSs in T2DM management. Objective: This study aimed to explore the perspectives of clinicians on the use and impact of the AI-enabled Prescription Advisory (APA) tool, developed using a multi-institution diabetes registry and implemented in specialist endocrinology clinics, and the challenges to its adoption and application. Methods: We conducted focus group discussions using a semistructured interview guide with purposively selected endocrinologists from a tertiary hospital. The focus group discussions were audio-recorded and transcribed verbatim. Data were thematically analyzed. Results: A total of 13 clinicians participated in 4 focus group discussions. Our findings suggest that the APA tool offered several useful features to assist clinicians in effectively managing T2DM. Specifically, clinicians viewed the AI-generated medication alterations as a good knowledge resource in supporting the clinician’s decision-making on drug modifications at the point of care, particularly for patients with comorbidities. The complication risk prediction was seen as positively impacting patient care by facilitating early doctor-patient communication and initiating prompt clinical responses. However, the interpretability of the risk scores, concerns about overreliance and automation bias, and issues surrounding accountability and liability hindered the adoption of the APA tool in clinical practice. Conclusions: Although the APA tool holds great potential as a valuable resource for improving patient care, further efforts are required to address clinicians’ concerns and improve the tool’s acceptance and applicability in relevant contexts. %M 38869934 %R 10.2196/50939 %U https://humanfactors.jmir.org/2024/1/e50939 %U https://doi.org/10.2196/50939 %U http://www.ncbi.nlm.nih.gov/pubmed/38869934 %0 Journal Article %@ 2563-6316 %I %V 5 %N %P e45973 %T Performance Drift in Machine Learning Models for Cardiac Surgery Risk Prediction: Retrospective Analysis %A Dong,Tim %A Sinha,Shubhra %A Zhai,Ben %A Fudulu,Daniel %A Chan,Jeremy %A Narayan,Pradeep %A Judge,Andy %A Caputo,Massimo %A Dimagli,Arnaldo %A Benedetto,Umberto %A Angelini,Gianni D %K cardiac surgery %K artificial intelligence %K risk prediction %K machine learning %K operative mortality %K data set drift %K performance drift %K national data set %K adult %K data %K cardiac %K surgery %K cardiology %K heart %K risk %K prediction %K United Kingdom %K mortality %K performance %K model %D 2024 %7 12.6.2024 %9 %J JMIRx Med %G English %X Background: The Society of Thoracic Surgeons and European System for Cardiac Operative Risk Evaluation (EuroSCORE) II risk scores are the most commonly used risk prediction models for in-hospital mortality after adult cardiac surgery. However, they are prone to miscalibration over time and poor generalization across data sets; thus, their use remains controversial. Despite increased interest, a gap in understanding the effect of data set drift on the performance of machine learning (ML) over time remains a barrier to its wider use in clinical practice. Data set drift occurs when an ML system underperforms because of a mismatch between the data it was developed from and the data on which it is deployed. Objective: In this study, we analyzed the extent of performance drift using models built on a large UK cardiac surgery database. The objectives were to (1) rank and assess the extent of performance drift in cardiac surgery risk ML models over time and (2) investigate any potential influence of data set drift and variable importance drift on performance drift. Methods: We conducted a retrospective analysis of prospectively, routinely gathered data on adult patients undergoing cardiac surgery in the United Kingdom between 2012 and 2019. We temporally split the data 70:30 into a training and validation set and a holdout set. Five novel ML mortality prediction models were developed and assessed, along with EuroSCORE II, for relationships between and within variable importance drift, performance drift, and actual data set drift. Performance was assessed using a consensus metric. Results: A total of 227,087 adults underwent cardiac surgery during the study period, with a mortality rate of 2.76% (n=6258). There was strong evidence of a decrease in overall performance across all models (P<.0001). Extreme gradient boosting (clinical effectiveness metric [CEM] 0.728, 95% CI 0.728-0.729) and random forest (CEM 0.727, 95% CI 0.727-0.728) were the overall best-performing models, both temporally and nontemporally. EuroSCORE II performed the worst across all comparisons. Sharp changes in variable importance and data set drift from October to December 2017, from June to July 2018, and from December 2018 to February 2019 mirrored the effects of performance decrease across models. Conclusions: All models show a decrease in at least 3 of the 5 individual metrics. CEM and variable importance drift detection demonstrate the limitation of logistic regression methods used for cardiac surgery risk prediction and the effects of data set drift. Future work will be required to determine the interplay between ML models and whether ensemble models could improve on their respective performance advantages. %R 10.2196/45973 %U https://xmed.jmir.org/2024/1/e45973 %U https://doi.org/10.2196/45973 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 10 %N %P e53354 %T Real-World Survival Comparisons Between Radiotherapy and Surgery for Metachronous Second Primary Lung Cancer and Predictions of Lung Cancer–Specific Outcomes Using Machine Learning: Population-Based Study %A Zheng,Yue %A Zhao,Ailin %A Yang,Yuqi %A Wang,Laduona %A Hu,Yifei %A Luo,Ren %A Wu,Yijun %+ Division of Thoracic Tumor Multimodality Treatment, Cancer Center, West China Hospital, Sichuan University, Guoxue Lane 37, Chengdu, 610041, China, 86 17888841669, wuyj01029@wchscu.cn %K metachronous second primary lung cancer %K radiotherapy %K surgical resection %K propensity score matching analysis %K machine learning %D 2024 %7 12.6.2024 %9 Original Paper %J JMIR Cancer %G English %X Background: Metachronous second primary lung cancer (MSPLC) is not that rare but is seldom studied. Objective: We aim to compare real-world survival outcomes between different surgery strategies and radiotherapy for MSPLC. Methods: This retrospective study analyzed data collected from patients with MSPLC between 1988 and 2012 in the Surveillance, Epidemiology, and End Results (SEER) database. Propensity score matching (PSM) analyses and machine learning were performed to compare variables between patients with MSPLC. Survival curves were plotted using the Kaplan-Meier method and were compared using log-rank tests. Results: A total of 2451 MSPLC patients were categorized into the following treatment groups: 864 (35.3%) received radiotherapy, 759 (31%) underwent surgery, 89 (3.6%) had surgery plus radiotherapy, and 739 (30.2%) had neither treatment. After PSM, 470 pairs each for radiotherapy and surgery were generated. The surgery group had significantly better survival than the radiotherapy group (P<.001) and the untreated group (563 pairs; P<.001). Further analysis revealed that both wedge resection (85 pairs; P=.004) and lobectomy (71 pairs; P=.002) outperformed radiotherapy in overall survival for MSPLC patients. Machine learning models (extreme gradient boosting, random forest classifier, adaptive boosting) demonstrated high predictive performance based on area under the curve (AUC) values. Least absolute shrinkage and selection operator (LASSO) regression analysis identified 9 significant variables impacting cancer-specific survival, emphasizing surgery’s consistent influence across 1 year to 10 years. These variables encompassed age at diagnosis, sex, year of diagnosis, radiotherapy of initial primary lung cancer (IPLC), primary site, histology, surgery, chemotherapy, and radiotherapy of MPSLC. Competing risk analysis highlighted lower mortality for female MPSLC patients (hazard ratio [HR]=0.79, 95% CI 0.71-0.87) and recent IPLC diagnoses (HR=0.79, 95% CI 0.73-0.85), while radiotherapy for IPLC increased mortality (HR=1.31, 95% CI 1.16-1.50). Surgery alone had the lowest cancer-specific mortality (HR=0.83, 95% CI 0.81-0.85), with sublevel resection having the lowest mortality rate among the surgical approaches (HR=0.26, 95% CI 0.21-0.31). The findings provide valuable insights into the factors that influence cumulative cancer-specific mortality. Conclusions: Surgical resections such as wedge resection and lobectomy confer better survival than radiation therapy for MSPLC, but radiation can be a valid alternative for the treatment of MSPLC. %M 38865182 %R 10.2196/53354 %U https://cancer.jmir.org/2024/1/e53354 %U https://doi.org/10.2196/53354 %U http://www.ncbi.nlm.nih.gov/pubmed/38865182 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 11 %N %P e56529 %T Considering the Role of Human Empathy in AI-Driven Therapy %A Rubin,Matan %A Arnon,Hadar %A Huppert,Jonathan D %A Perry,Anat %+ Psychology Department, Hebrew University of Jerusalem, Mt Scopus, Jerusalem, 91905, Israel, 972 2 588 3027, anat.perry@mail.huji.ac.il %K empathy %K empathetic %K empathic %K artificial empathy %K AI %K artificial intelligence %K mental health %K machine learning %K algorithm %K algorithms %K predictive model %K predictive models %K predictive analytics %K predictive system %K practical model %K practical models %K model %K models %K therapy %K mental illness %K mental illnesses %K mental disease %K mental diseases %K mood disorder %K mood disorders %K emotion %K emotions %K e-mental health %K digital mental health %K internet-based therapy %D 2024 %7 11.6.2024 %9 Viewpoint %J JMIR Ment Health %G English %X Recent breakthroughs in artificial intelligence (AI) language models have elevated the vision of using conversational AI support for mental health, with a growing body of literature indicating varying degrees of efficacy. In this paper, we ask when, in therapy, it will be easier to replace humans and, conversely, in what instances, human connection will still be more valued. We suggest that empathy lies at the heart of the answer to this question. First, we define different aspects of empathy and outline the potential empathic capabilities of humans versus AI. Next, we consider what determines when these aspects are needed most in therapy, both from the perspective of therapeutic methodology and from the perspective of patient objectives. Ultimately, our goal is to prompt further investigation and dialogue, urging both practitioners and scholars engaged in AI-mediated therapy to keep these questions and considerations in mind when investigating AI implementation in mental health. %M 38861302 %R 10.2196/56529 %U https://mental.jmir.org/2024/1/e56529 %U https://doi.org/10.2196/56529 %U http://www.ncbi.nlm.nih.gov/pubmed/38861302 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e57678 %T Explainable AI Method for Tinnitus Diagnosis via Neighbor-Augmented Knowledge Graph and Traditional Chinese Medicine: Development and Validation Study %A Yin,Ziming %A Kuang,Zhongling %A Zhang,Haopeng %A Guo,Yu %A Li,Ting %A Wu,Zhengkun %A Wang,Lihua %+ Department of Otolaryngology, Shanghai Municipal Hospital of Traditional Chinese Medicine, Shanghai University of Traditional Chinese Medicine, 274 Zhijiang Middle Road, Jing'an District, Shanghai, 200071, China, 86 18116013561, lihuahanhan@126.com %K knowledge graph %K syndrome differentiation %K tinnitus %K traditional Chinese medicine %K explainable %K ear %K audiology %K TCM %K algorithm %K diagnosis %K AI %K artificial intelligence %D 2024 %7 10.6.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Tinnitus diagnosis poses a challenge in otolaryngology owing to an extremely complex pathogenesis, lack of effective objectification methods, and factor-affected diagnosis. There is currently a lack of explainable auxiliary diagnostic tools for tinnitus in clinical practice. Objective: This study aims to develop a diagnostic model using an explainable artificial intelligence (AI) method to address the issue of low accuracy in tinnitus diagnosis. Methods: In this study, a knowledge graph–based tinnitus diagnostic method was developed by combining clinical medical knowledge with electronic medical records. Electronic medical record data from 1267 patients were integrated with traditional Chinese clinical medical knowledge to construct a tinnitus knowledge graph. Subsequently, weights were introduced, which measured patient similarity in the knowledge graph based on mutual information values. Finally, a collaborative neighbor algorithm was proposed, which scored patient similarity to obtain the recommended diagnosis. We conducted 2 group experiments and 1 case derivation to explore the effectiveness of our models and compared the models with state-of-the-art graph algorithms and other explainable machine learning models. Results: The experimental results indicate that the method achieved 99.4% accuracy, 98.5% sensitivity, 99.6% specificity, 98.7% precision, 98.6% F1-score, and 99% area under the receiver operating characteristic curve for the inference of 5 tinnitus subtypes among 253 test patients. Additionally, it demonstrated good interpretability. The topological structure of knowledge graphs provides transparency that can explain the reasons for the similarity between patients. Conclusions: This method provides doctors with a reliable and explainable diagnostic tool that is expected to improve tinnitus diagnosis accuracy. %M 38857077 %R 10.2196/57678 %U https://medinform.jmir.org/2024/1/e57678 %U https://doi.org/10.2196/57678 %U http://www.ncbi.nlm.nih.gov/pubmed/38857077 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e56165 %T Clinical Accuracy, Relevance, Clarity, and Emotional Sensitivity of Large Language Models to Surgical Patient Questions: Cross-Sectional Study %A Dagli,Mert Marcel %A Oettl,Felix Conrad %A Gujral,Jaskeerat %A Malhotra,Kashish %A Ghenbot,Yohannes %A Yoon,Jang W %A Ozturk,Ali K %A Welch,William C %+ Department of Neurosurgery, University of Pennsylvania Perelman School of Medicine, 801 Spruce Street, Philadelphia, PA, 19106, United States, 1 2672306493, marcel.dagli@pennmedicine.upenn.edu %K artificial intelligence %K AI %K natural language processing %K NLP %K large language model %K LLM %K generative AI %K cross-sectional study %K health information %K patient education %K clinical accuracy %K emotional sensitivity %K surgical patient %K surgery %K surgical %D 2024 %7 7.6.2024 %9 Research Letter %J JMIR Form Res %G English %X This cross-sectional study evaluates the clinical accuracy, relevance, clarity, and emotional sensitivity of responses to inquiries from patients undergoing surgery provided by large language models (LLMs), highlighting their potential as adjunct tools in patient communication and education. Our findings demonstrated high performance of LLMs across accuracy, relevance, clarity, and emotional sensitivity, with Anthropic’s Claude 2 outperforming OpenAI’s ChatGPT and Google’s Bard, suggesting LLMs’ potential to serve as complementary tools for enhanced information delivery and patient-surgeon interaction. %M 38848553 %R 10.2196/56165 %U https://formative.jmir.org/2024/1/e56165 %U https://doi.org/10.2196/56165 %U http://www.ncbi.nlm.nih.gov/pubmed/38848553 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e50274 %T Trust but Verify: Lessons Learned for the Application of AI to Case-Based Clinical Decision-Making From Postmarketing Drug Safety Assessment at the US Food and Drug Administration %A Ball,Robert %A Talal,Andrew H %A Dang,Oanh %A Muñoz,Monica %A Markatou,Marianthi %+ Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, United States, 1 301 796 2380, robert.ball@fda.hhs.gov %K drug safety %K artificial intelligence %K machine learning %K natural language processing %K causal inference %K case-based reasoning %K clinical decision support %D 2024 %7 6.6.2024 %9 Viewpoint %J J Med Internet Res %G English %X Adverse drug reactions are a common cause of morbidity in health care. The US Food and Drug Administration (FDA) evaluates individual case safety reports of adverse events (AEs) after submission to the FDA Adverse Event Reporting System as part of its surveillance activities. Over the past decade, the FDA has explored the application of artificial intelligence (AI) to evaluate these reports to improve the efficiency and scientific rigor of the process. However, a gap remains between AI algorithm development and deployment. This viewpoint aims to describe the lessons learned from our experience and research needed to address both general issues in case-based reasoning using AI and specific needs for individual case safety report assessment. Beginning with the recognition that the trustworthiness of the AI algorithm is the main determinant of its acceptance by human experts, we apply the Diffusion of Innovations theory to help explain why certain algorithms for evaluating AEs at the FDA were accepted by safety reviewers and others were not. This analysis reveals that the process by which clinicians decide from case reports whether a drug is likely to cause an AE is not well defined beyond general principles. This makes the development of high performing, transparent, and explainable AI algorithms challenging, leading to a lack of trust by the safety reviewers. Even accounting for the introduction of large language models, the pharmacovigilance community needs an improved understanding of causal inference and of the cognitive framework for determining the causal relationship between a drug and an AE. We describe specific future research directions that underpin facilitating implementation and trust in AI for drug safety applications, including improved methods for measuring and controlling of algorithmic uncertainty, computational reproducibility, and clear articulation of a cognitive framework for causal inference in case-based reasoning. %M 38842929 %R 10.2196/50274 %U https://www.jmir.org/2024/1/e50274 %U https://doi.org/10.2196/50274 %U http://www.ncbi.nlm.nih.gov/pubmed/38842929 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e52251 %T Development of a Digital Patient Assistant for the Management of Cyclic Vomiting Syndrome: Patient-Centric Design Study %A Narang,Gaurav %A Chen,Yaozhu J %A Wedel,Nicole %A Wu,Melody %A Luo,Michelle %A Atreja,Ashish %+ Rx.Health, 21 Penn Plaza, 368 9th Avenue, New York, NY, 10001, United States, 1 6469699939, gnarang@commure.com %K cyclic vomiting syndrome %K vomiting %K vomit %K emetic %K emesis %K gut %K GI %K gastrointestinal %K internal medicine %K prototype %K prototypes %K iterative %K self-management %K disease management %K gut-brain interaction %K gut-brain %K artificial intelligence %K digital patient assistant %K assistant %K assistants %K design thinking %K design %K patient-centric %K patient centred %K patient centered %K patient-centric approach %K System Usability Scale %K symptom tracking %K digital health solution %K user experience %K usability %K symptom %K symptoms %K tracking %K monitoring %K participatory %K co-design digital health technology %K patient assistance %K patient experience %K mobile phone %D 2024 %7 6.6.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Cyclic vomiting syndrome (CVS) is an enigmatic and debilitating disorder of gut-brain interaction that is characterized by recurrent episodes of severe vomiting and nausea. It significantly impairs patients’ quality of life and can lead to frequent medical visits and substantial health care costs. The diagnosis for CVS is often protracted and complex, primarily due to its exclusionary diagnosis nature and the lack of specific biomarkers. This typically leads to a considerable delay in accurate diagnosis, contributing to increased patient morbidity. Additionally, the absence of approved therapies for CVS worsens patient hardship and reflects the urgent need for innovative, patient-centric solutions to improve CVS management. Objective: We aim to develop a digital patient assistant (DPA) for patients with CVS to address their unique needs, and iteratively enhance the technical features and user experience on the initial DPA versions. Methods: The development of the DPA for CVS used a design thinking approach, prioritizing user needs. A literature review and Patient Advisory Board shaped the initial prototype, focusing on diagnostic support and symptom tracking. Iterative development, informed by the design thinking approach and feedback from patients with CVS and caregivers through interviews and smartphone testing, led to significant enhancements in user interaction and artificial intelligence integration. The final DPA’s effectiveness was validated using the System Usability Scale and feedback questions, ensuring it met the specific needs of the CVS community. Results: The DPA developed for CVS integrates an introductory bot, daily and weekly check-in bots, and a knowledge hub, all accessible via a patient dashboard. This multicomponent solution effectively addresses key unmet needs in CVS management: efficient symptom and impacts tracking, access to comprehensive disease information, and a digital health platform for disease management. Significant improvements, based on user feedback, include the implementation of artificial intelligence features like intent recognition and data syncing, enhancing the bot interaction and reducing the burden on patients. The inclusion of the knowledge hub provides educational resources, contributing to better disease understanding and management. The DPA achieved a System Usability Scale score of 80 out of 100, indicating high ease of use and relevance. Patient feedback highlighted the DPA’s potential in disease management and suggested further applications, such as integration into health care provider recommendations for patients with suspected or confirmed CVS. This positive response underscores the DPA’s role in enhancing patient engagement and disease management through a patient-centered digital solution. Conclusions: The development of this DPA for patients with CVS, via an iterative design thinking approach, offers a patient-centric solution for disease management. The DPA development framework may also serve to guide future patient digital support and research scenarios. %M 38842924 %R 10.2196/52251 %U https://formative.jmir.org/2024/1/e52251 %U https://doi.org/10.2196/52251 %U http://www.ncbi.nlm.nih.gov/pubmed/38842924 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e52349 %T Impact of Responsible AI on the Occurrence and Resolution of Ethical Issues: Protocol for a Scoping Review %A Boege,Selina %A Milne-Ives,Madison %A Meinert,Edward %+ Translational and Clinical Research Institute, Newcastle University, Campus for Ageing and Vitality, Westgate Road, Newcastle-upon-Tyne, NE4 6BE, United Kingdom, 44 01912336161, edward.meinert@newcastle.ac.uk %K artificial intelligence %K AI %K responsible artificial intelligence %K RAI %K ethical artificial intelligence %K trustworthy artificial intelligence %K explainable artificial intelligence %K XAI %K digital ethics %D 2024 %7 5.6.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Responsible artificial intelligence (RAI) emphasizes the use of ethical frameworks implementing accountability, responsibility, and transparency to address concerns in the deployment and use of artificial intelligence (AI) technologies, including privacy, autonomy, self-determination, bias, and transparency. Standards are under development to guide the support and implementation of AI given these considerations. Objective: The purpose of this review is to provide an overview of current research evidence and knowledge gaps regarding the implementation of RAI principles and the occurrence and resolution of ethical issues within AI systems. Methods: A scoping review following Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR) guidelines was proposed. PubMed, ERIC, Scopus, IEEE Xplore, EBSCO, Web of Science, ACM Digital Library, and ProQuest (Arts and Humanities) will be systematically searched for articles published since 2013 that examine RAI principles and ethical concerns within AI. Eligibility assessment will be conducted independently and coded data will be analyzed along themes and stratified across discipline-specific literature. Results: The results will be included in the full scoping review, which is expected to start in June 2024 and completed for the submission of publication by the end of 2024. Conclusions: This scoping review will summarize the state of evidence and provide an overview of its impact, as well as strengths, weaknesses, and gaps in research implementing RAI principles. The review may also reveal discipline-specific concerns, priorities, and proposed solutions to the concerns. It will thereby identify priority areas that should be the focus of future regulatory options available, connecting theoretical aspects of ethical requirements for principles with practical solutions. International Registered Report Identifier (IRRID): PRR1-10.2196/52349 %M 38838329 %R 10.2196/52349 %U https://www.researchprotocols.org/2024/1/e52349 %U https://doi.org/10.2196/52349 %U http://www.ncbi.nlm.nih.gov/pubmed/38838329 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e50344 %T Does an App a Day Keep the Doctor Away? AI Symptom Checker Applications, Entrenched Bias, and Professional Responsibility %A Zawati,Ma'n H %A Lang,Michael %+ Centre of Genomics and Policy, McGill University, 5200-740 Dr Penfield Avenue, Montreal, QC, H3A 0G1, Canada, 1 5143988155, man.zawati@mcgill.ca %K artificial intelligence %K applications %K mobile health %K mHealth %K bias %K biases %K professional obligations %K professional obligation %K app %K apps %K application %K symptom checker %K symptom checkers %K diagnose %K diagnosis %K self-diagnose %K self-diagnosis %K ethic %K ethics %K ethical %K regulation %K regulations %K legal %K law %K laws %K safety %K mobile phone %D 2024 %7 5.6.2024 %9 Viewpoint %J J Med Internet Res %G English %X The growing prominence of artificial intelligence (AI) in mobile health (mHealth) has given rise to a distinct subset of apps that provide users with diagnostic information using their inputted health status and symptom information—AI-powered symptom checker apps (AISympCheck). While these apps may potentially increase access to health care, they raise consequential ethical and legal questions. This paper will highlight notable concerns with AI usage in the health care system, further entrenchment of preexisting biases in the health care system and issues with professional accountability. To provide an in-depth analysis of the issues of bias and complications of professional obligations and liability, we focus on 2 mHealth apps as examples—Babylon and Ada. We selected these 2 apps as they were both widely distributed during the COVID-19 pandemic and make prominent claims about their use of AI for the purpose of assessing user symptoms. First, bias entrenchment often originates from the data used to train AI systems, causing the AI to replicate these inequalities through a “garbage in, garbage out” phenomenon. Users of these apps are also unlikely to be demographically representative of the larger population, leading to distorted results. Second, professional accountability poses a substantial challenge given the vast diversity and lack of regulation surrounding the reliability of AISympCheck apps. It is unclear whether these apps should be subject to safety reviews, who is responsible for app-mediated misdiagnosis, and whether these apps ought to be recommended by physicians. With the rapidly increasing number of apps, there remains little guidance available for health professionals. Professional bodies and advocacy organizations have a particularly important role to play in addressing these ethical and legal gaps. Implementing technical safeguards within these apps could mitigate bias, AIs could be trained with primarily neutral data, and apps could be subject to a system of regulation to allow users to make informed decisions. In our view, it is critical that these legal concerns are considered throughout the design and implementation of these potentially disruptive technologies. Entrenched bias and professional responsibility, while operating in different ways, are ultimately exacerbated by the unregulated nature of mHealth. %M 38838309 %R 10.2196/50344 %U https://www.jmir.org/2024/1/e50344 %U https://doi.org/10.2196/50344 %U http://www.ncbi.nlm.nih.gov/pubmed/38838309 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e53918 %T Chinese Oncologists’ Perspectives on Integrating AI into Clinical Practice: Cross-Sectional Survey Study %A Li,Ming %A Xiong,XiaoMin %A Xu,Bo %A Dickson,Conan %+ Department of Health Policy Management, Bloomberg School of Public Health, Johns Hopkins University, 615 North Wolfe Street, Baltimore, MD, 21205, United States, 1 410 955 3543, cdickso1@jh.edu %K artificial intelligence %K AI %K machine learning %K oncologist %K concern %K clinical practice %D 2024 %7 5.6.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: The rapid development of artificial intelligence (AI) has brought significant interest to its potential applications in oncology. Although AI-powered tools are already being implemented in some Chinese hospitals, their integration into clinical practice raises several concerns for Chinese oncologists. Objective: This study aims to explore the concerns of Chinese oncologists regarding the integration of AI into clinical practice and to identify the factors influencing these concerns. Methods: A total of 228 Chinese oncologists participated in a cross-sectional web-based survey from April to June in 2023 in mainland China. The survey gauged their worries about AI with multiple-choice questions. The survey evaluated their views on the statements of “The impact of AI on the doctor-patient relationship” and “AI will replace doctors.” The data were analyzed using descriptive statistics, and variate analyses were used to find correlations between the oncologists’ backgrounds and their concerns. Results: The study revealed that the most prominent concerns were the potential for AI to mislead diagnosis and treatment (163/228, 71.5%); an overreliance on AI (162/228, 71%); data and algorithm bias (123/228, 54%); issues with data security and patient privacy (123/228, 54%); and a lag in the adaptation of laws, regulations, and policies in keeping up with AI’s development (115/228, 50.4%). Oncologists with a bachelor’s degree expressed heightened concerns related to data and algorithm bias (34/49, 69%; P=.03) and the lagging nature of legal, regulatory, and policy issues (32/49, 65%; P=.046). Regarding AI’s impact on doctor-patient relationships, 53.1% (121/228) saw a positive impact, whereas 35.5% (81/228) found it difficult to judge, 9.2% (21/228) feared increased disputes, and 2.2% (5/228) believed that there is no impact. Although sex differences were not significant (P=.08), perceptions varied—male oncologists tended to be more positive than female oncologists (74/135, 54.8% vs 47/93, 50%). Oncologists with a bachelor’s degree (26/49, 53%; P=.03) and experienced clinicians (≥21 years; 28/56, 50%; P=.054). found it the hardest to judge. Those with IT experience were significantly more positive (25/35, 71%) than those without (96/193, 49.7%; P=.02). Opinions regarding the possibility of AI replacing doctors were diverse, with 23.2% (53/228) strongly disagreeing, 14% (32/228) disagreeing, 29.8% (68/228) being neutral, 16.2% (37/228) agreeing, and 16.7% (38/228) strongly agreeing. There were no significant correlations with demographic and professional factors (all P>.05). Conclusions: Addressing oncologists’ concerns about AI requires collaborative efforts from policy makers, developers, health care professionals, and legal experts. Emphasizing transparency, human-centered design, bias mitigation, and education about AI’s potential and limitations is crucial. Through close collaboration and a multidisciplinary strategy, AI can be effectively integrated into oncology, balancing benefits with ethical considerations and enhancing patient care. %M 38838307 %R 10.2196/53918 %U https://formative.jmir.org/2024/1/e53918 %U https://doi.org/10.2196/53918 %U http://www.ncbi.nlm.nih.gov/pubmed/38838307 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e55798 %T Using the Natural Language Processing System Medical Named Entity Recognition-Japanese to Analyze Pharmaceutical Care Records: Natural Language Processing Analysis %A Ohno,Yukiko %A Kato,Riri %A Ishikawa,Haruki %A Nishiyama,Tomohiro %A Isawa,Minae %A Mochizuki,Mayumi %A Aramaki,Eiji %A Aomori,Tohru %+ Faculty of Pharmacy, Takasaki University of Health and Welfare, 37-1 Nakaorui-machi, Takasaki-shi, Gunma, 370-0033, Japan, 81 273521290, aomori-t@takasaki-u.ac.jp %K natural language processing %K NLP %K named entity recognition %K pharmaceutical care records %K machine learning %K cefazolin sodium %K electronic medical record %K EMR %K extraction %K Japanese %D 2024 %7 4.6.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Large language models have propelled recent advances in artificial intelligence technology, facilitating the extraction of medical information from unstructured data such as medical records. Although named entity recognition (NER) is used to extract data from physicians’ records, it has yet to be widely applied to pharmaceutical care records. Objective: In this study, we aimed to investigate the feasibility of automatic extraction of the information regarding patients’ diseases and symptoms from pharmaceutical care records. The verification was performed using Medical Named Entity Recognition-Japanese (MedNER-J), a Japanese disease-extraction system designed for physicians’ records. Methods: MedNER-J was applied to subjective, objective, assessment, and plan data from the care records of 49 patients who received cefazolin sodium injection at Keio University Hospital between April 2018 and March 2019. The performance of MedNER-J was evaluated in terms of precision, recall, and F1-score. Results: The F1-scores of NER for subjective, objective, assessment, and plan data were 0.46, 0.70, 0.76, and 0.35, respectively. In NER and positive-negative classification, the F1-scores were 0.28, 0.39, 0.64, and 0.077, respectively. The F1-scores of NER for objective (0.70) and assessment data (0.76) were higher than those for subjective and plan data, which supported the superiority of NER performance for objective and assessment data. This might be because objective and assessment data contained many technical terms, similar to the training data for MedNER-J. Meanwhile, the F1-score of NER and positive-negative classification was high for assessment data alone (F1-score=0.64), which was attributed to the similarity of its description format and contents to those of the training data. Conclusions: MedNER-J successfully read pharmaceutical care records and showed the best performance for assessment data. However, challenges remain in analyzing records other than assessment data. Therefore, it will be necessary to reinforce the training data for subjective data in order to apply the system to pharmaceutical care records. %M 38833694 %R 10.2196/55798 %U https://formative.jmir.org/2024/1/e55798 %U https://doi.org/10.2196/55798 %U http://www.ncbi.nlm.nih.gov/pubmed/38833694 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e49562 %T Identifying X (Formerly Twitter) Posts Relevant to Dementia and COVID-19: Machine Learning Approach %A Azizi,Mehrnoosh %A Jamali,Ali Akbar %A Spiteri,Raymond J %+ Department of Computer Science, University of Saskatchewan, S425 Thorvaldson Building, 110 Science Place, Saskatoon, SK, S7N5C9, Canada, 1 306 966 2909, spiteri@cs.usask.ca %K machine learning %K dementia %K Alzheimer disease %K COVID-19 %K X (Twitter) %K natural language processing %D 2024 %7 4.6.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: During the pandemic, patients with dementia were identified as a vulnerable population. X (formerly Twitter) became an important source of information for people seeking updates on COVID-19, and, therefore, identifying posts (formerly tweets) relevant to dementia can be an important support for patients with dementia and their caregivers. However, mining and coding relevant posts can be daunting due to the sheer volume and high percentage of irrelevant posts. Objective: The objective of this study was to automate the identification of posts relevant to dementia and COVID-19 using natural language processing and machine learning (ML) algorithms. Methods: We used a combination of natural language processing and ML algorithms with manually annotated posts to identify posts relevant to dementia and COVID-19. We used 3 data sets containing more than 100,000 posts and assessed the capability of various algorithms in correctly identifying relevant posts. Results: Our results showed that (pretrained) transfer learning algorithms outperformed traditional ML algorithms in identifying posts relevant to dementia and COVID-19. Among the algorithms tested, the transfer learning algorithm A Lite Bidirectional Encoder Representations from Transformers (ALBERT) achieved an accuracy of 82.92% and an area under the curve of 83.53%. ALBERT substantially outperformed the other algorithms tested, further emphasizing the superior performance of transfer learning algorithms in the classification of posts. Conclusions: Transfer learning algorithms such as ALBERT are highly effective in identifying topic-specific posts, even when trained with limited or adjacent data, highlighting their superiority over other ML algorithms and applicability to other studies involving analysis of social media posts. Such an automated approach reduces the workload of manual coding of posts and facilitates their analysis for researchers and policy makers to support patients with dementia and their caregivers and other vulnerable populations. %M 38833288 %R 10.2196/49562 %U https://formative.jmir.org/2024/1/e49562 %U https://doi.org/10.2196/49562 %U http://www.ncbi.nlm.nih.gov/pubmed/38833288 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52637 %T Evaluation of the Clinical Efficacy and Trust in AI-Assisted Embryo Ranking: Survey-Based Prospective Study %A Kim,Hyung Min %A Kang,Hyoeun %A Lee,Chaeyoon %A Park,Jong Hyuk %A Chung,Mi Kyung %A Kim,Miran %A Kim,Na Young %A Lee,Hye Jun %+ Kai Health, 217 Teheran-ro #306, Yeoksam-dong, Gangnam-gu, Seoul, 06142, Republic of Korea, 82 1052299697, hyejunlee@gmail.com %K assisted reproductive technology %K in vitro fertilization %K artificial intelligence %K intraobserver and interobserver agreements %K embryos %K embryologists %D 2024 %7 3.6.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Current embryo assessment methods for in vitro fertilization depend on subjective morphological assessments. Recently, artificial intelligence (AI) has emerged as a promising tool for embryo assessment; however, its clinical efficacy and trustworthiness remain unproven. Simulation studies may provide additional evidence, provided that they are meticulously designed to mitigate bias and variance. Objective: The primary objective of this study was to evaluate the benefits of an AI model for predicting clinical pregnancy through well-designed simulations. The secondary objective was to identify the characteristics of and potential bias in the subgroups of embryologists with varying degrees of experience. Methods: This simulation study involved a questionnaire-based survey conducted on 61 embryologists with varying levels of experience from 12 in vitro fertilization clinics. The survey was conducted via Google Forms (Google Inc) in three phases: (1) phase 1, an initial assessment (December 23, 2022, to January 22, 2023); (2) phase 2, a validation assessment (March 6, 2023, to April 5, 2023); and (3) phase 3 an AI-guided assessment (March 6, 2023, to April 5, 2023). Inter- and intraobserver assessments and the accuracy of embryo selection from 360 day-5 embryos before and after AI guidance were analyzed for all embryologists and subgroups of senior and junior embryologists. Results: With AI guidance, the interobserver agreement increased from 0.355 to 0.527 and from 0.440 to 0.524 for junior and senior embryologists, respectively, thus reaching similar levels of agreement. In a test of accurate embryo selection with 90 questions, the numbers of correct responses by the embryologists only, embryologists with AI guidance, and AI only were 34 (38%), 45 (50%), and 59 (66%), respectively. Without AI, the average score (accuracy) of the junior group was 33.516 (37%), while that of the senior group was 35.967 (40%), with P<.001 in the t test. With AI guidance, the average score (accuracy) of the junior group increased to 46.581 (52%), reaching a level similar to that of the senior embryologists of 44.833 (50%), with P=.34. Junior embryologists had a higher level of trust in the AI score. Conclusions: This study demonstrates the potential benefits of AI in selecting embryos with high chances of pregnancy, particularly for embryologists with 5 years or less of experience, possibly due to their trust in AI. Thus, using AI as an auxiliary tool in clinical practice has the potential to improve embryo assessment and increase the probability of a successful pregnancy. %M 38830209 %R 10.2196/52637 %U https://www.jmir.org/2024/1/e52637 %U https://doi.org/10.2196/52637 %U http://www.ncbi.nlm.nih.gov/pubmed/38830209 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e45561 %T The Development of a Text Messaging Platform to Enhance a Youth Diabetes Prevention Program: Observational Process Study %A Sapre,Manali %A Elaiho,Cordelia R %A Brar Prayaga,Rena %A Prayaga,Ram %A Constable,Jeremy %A Vangeepuram,Nita %+ Department of General Pediatrics, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place Box 1198, New York, NY, 10029, United States, 1 917 478 2106, nita.vangeepuram@mssm.edu %K community-based participatory research %K youth %K diabetes prevention %K peer education %K mobile health technology %K SMS text messaging %K mobile phone %K artificial intelligence %K AI %D 2024 %7 29.5.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Approximately 1 in 5 adolescents in the United States has prediabetes, and racially and ethnically minoritized youths are disproportionately impacted. Unfortunately, there are few effective youth diabetes prevention programs, and in-person interventions are challenging because of barriers to access and engagement. Objective: We aimed to develop and assess the preliminary feasibility and acceptability of a youth-informed SMS text messaging platform to provide additional support and motivation to adolescents with prediabetes participating in a diabetes prevention workshop in East Harlem, New York City, New York, United States. We collaborated with our youth action board and a technology partner (mPulse Mobile) to develop and pilot-test the novel interactive platform. Methods: The technology subcommittee of our community action board (comprising youths and young adults) used the results from focus groups that we had previously conducted with youths from our community to develop 5 message types focused on healthy eating and active living: goal setting, behavior tracking, individually tailored guidance, motivational messages, and photo diary. We used an iterative process to develop and pilot the program with our internal study team, including youths from our community action board and mPulse Mobile developers. We then conducted a pilot of the 12-week SMS text messaging program with 13 youths with prediabetes. Results: Participants (aged 15-21 years; 10/13, 77% female; 3/10, 23% Black and 10/13, 77% Hispanic or Latinx) received an average of 2 automated messages per day. The system correctly sent 84% (2231/2656) of the messages at the time intended; the remaining 16% (425/2656) of the messages were either sent at the incorrect time, or the system did not recognize a participant response to provide the appropriate reply. The level of engagement with the program ranged from 1 (little to no response) to 5 (highly responsive) based on how frequently participants responded to the interactive (2-way) messages. Highly responsive participants (6/13, 46%) responded >75% (1154/1538) of the time to interactive messages sent over 12 weeks, and 69% (9/13) of the participants were still engaged with the program at week 12. During a focus group conducted after program completion, the participants remarked that the message frequency was appropriate, and those who had participated in our in-person workshops reflected that the messages were reminiscent of the workshop content. Participants rated goal setting, behavior tracking, and tailored messages most highly and informed planned adaptations to the platform. Participants described the program as: “interactive, informative, enjoyable, very convenient, reliable, motivational, productive, and reflective.” Conclusions: We partnered with youths in the initial content development and pilot testing of a novel SMS text messaging platform to support diabetes prevention. This study is unique in the triple partnership we formed among researchers, technology experts, and diverse youths to develop a mobile health platform to address diabetes-related disparities. %M 38809599 %R 10.2196/45561 %U https://formative.jmir.org/2024/1/e45561 %U https://doi.org/10.2196/45561 %U http://www.ncbi.nlm.nih.gov/pubmed/38809599 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e55676 %T An Extensible Evaluation Framework Applied to Clinical Text Deidentification Natural Language Processing Tools: Multisystem and Multicorpus Study %A Heider,Paul M %A Meystre,Stéphane M %+ Biomedical Informatics Center, Medical University of South Carolina, 22 WestEdge Street, Suite 200, Charleston, SC, 29403, United States, 1 843 792 3385, heiderp@musc.edu %K natural language processing %K evaluation methodology %K deidentification %K privacy protection %K de-identification %K secondary use %K patient privacy %D 2024 %7 28.5.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Clinical natural language processing (NLP) researchers need access to directly comparable evaluation results for applications such as text deidentification across a range of corpus types and the means to easily test new systems or corpora within the same framework. Current systems, reported metrics, and the personally identifiable information (PII) categories evaluated are not easily comparable. Objective: This study presents an open-source and extensible end-to-end framework for comparing clinical NLP system performance across corpora even when the annotation categories do not align. Methods: As a use case for this framework, we use 6 off-the-shelf text deidentification systems (ie, CliniDeID, deid from PhysioNet, MITRE Identity Scrubber Toolkit [MIST], NeuroNER, National Library of Medicine [NLM] Scrubber, and Philter) across 3 standard clinical text corpora for the task (2 of which are publicly available) and 1 private corpus (all in English), with annotation categories that are not directly analogous. The framework is built on shell scripts that can be extended to include new systems, corpora, and performance metrics. We present this open tool, multiple means for aligning PII categories during evaluation, and our initial timing and performance metric findings. Code for running this framework with all settings needed to run all pairs are available via Codeberg and GitHub. Results: From this case study, we found large differences in processing speed between systems. The fastest system (ie, MIST) processed an average of 24.57 (SD 26.23) notes per second, while the slowest (ie, CliniDeID) processed an average of 1.00 notes per second. No system uniformly outperformed the others at identifying PII across corpora and categories. Instead, a rich tapestry of performance trade-offs emerged for PII categories. CliniDeID and Philter prioritize recall over precision (with an average recall 6.9 and 11.2 points higher, respectively, for partially matching spans of text matching any PII category), while the other 4 systems consistently have higher precision (with MIST’s precision scoring 20.2 points higher, NLM Scrubber scoring 4.4 points higher, NeuroNER scoring 7.2 points higher, and deid scoring 17.1 points higher). The macroaverage recall across corpora for identifying names, one of the more sensitive PII categories, included deid (48.8%) and MIST (66.9%) at the low end and NeuroNER (84.1%), NLM Scrubber (88.1%), and CliniDeID (95.9%) at the high end. A variety of metrics across categories and corpora are reported with a wider variety (eg, F2-score) available via the tool. Conclusions: NLP systems in general and deidentification systems and corpora in our use case tend to be evaluated in stand-alone research articles that only include a limited set of comparators. We hold that a single evaluation pipeline across multiple systems and corpora allows for more nuanced comparisons. Our open pipeline should reduce barriers to evaluation and system advancement. %M 38805692 %R 10.2196/55676 %U https://www.jmir.org/2024/1/e55676 %U https://doi.org/10.2196/55676 %U http://www.ncbi.nlm.nih.gov/pubmed/38805692 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e50853 %T The Effect of Artificial Intelligence on Patient-Physician Trust: Cross-Sectional Vignette Study %A Zondag,Anna G M %A Rozestraten,Raoul %A Grimmelikhuijsen,Stephan G %A Jongsma,Karin R %A van Solinge,Wouter W %A Bots,Michiel L %A Vernooij,Robin W M %A Haitjema,Saskia %+ Central Diagnostic Laboratory, University Medical Center Utrecht, Utrecht University, Heidelberglaan 100, Utrecht, 3584 CX, Netherlands, 31 631117922, a.g.m.zondag@umcutrecht.nl %K patient-physician relationship %K trust %K clinical decision support %K artificial intelligence %K digital health %K decision support system %D 2024 %7 28.5.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Clinical decision support systems (CDSSs) based on routine care data, using artificial intelligence (AI), are increasingly being developed. Previous studies focused largely on the technical aspects of using AI, but the acceptability of these technologies by patients remains unclear. Objective: We aimed to investigate whether patient-physician trust is affected when medical decision-making is supported by a CDSS. Methods: We conducted a vignette study among the patient panel (N=860) of the University Medical Center Utrecht, the Netherlands. Patients were randomly assigned into 4 groups—either the intervention or control groups of the high-risk or low-risk cases. In both the high-risk and low-risk case groups, a physician made a treatment decision with (intervention groups) or without (control groups) the support of a CDSS. Using a questionnaire with a 7-point Likert scale, with 1 indicating “strongly disagree” and 7 indicating “strongly agree,” we collected data on patient-physician trust in 3 dimensions: competence, integrity, and benevolence. We assessed differences in patient-physician trust between the control and intervention groups per case using Mann-Whitney U tests and potential effect modification by the participant’s sex, age, education level, general trust in health care, and general trust in technology using multivariate analyses of (co)variance. Results: In total, 398 patients participated. In the high-risk case, median perceived competence and integrity were lower in the intervention group compared to the control group but not statistically significant (5.8 vs 5.6; P=.16 and 6.3 vs 6.0; P=.06, respectively). However, the effect of a CDSS application on the perceived competence of the physician depended on the participant’s sex (P=.03). Although no between-group differences were found in men, in women, the perception of the physician’s competence and integrity was significantly lower in the intervention compared to the control group (P=.009 and P=.01, respectively). In the low-risk case, no differences in trust between the groups were found. However, increased trust in technology positively influenced the perceived benevolence and integrity in the low-risk case (P=.009 and P=.04, respectively). Conclusions: We found that, in general, patient-physician trust was high. However, our findings indicate a potentially negative effect of AI applications on the patient-physician relationship, especially among women and in high-risk situations. Trust in technology, in general, might increase the likelihood of embracing the use of CDSSs by treating professionals. %M 38805702 %R 10.2196/50853 %U https://www.jmir.org/2024/1/e50853 %U https://doi.org/10.2196/50853 %U http://www.ncbi.nlm.nih.gov/pubmed/38805702 %0 Journal Article %@ 2562-7600 %I JMIR Publications %V 7 %N %P e54496 %T Using AI-Based Technologies to Help Nurses Detect Behavioral Disorders: Narrative Literature Review %A Fernandes,Sofia %A von Gunten,Armin %A Verloo,Henk %+ School of Health Sciences, University of Applied Sciences and Arts Western Switzerland (HES-SO), Chemin de l’Agasse 5, Sion, 1950, Switzerland, 41 00415860861, sofia.fernandes@hevs.ch %K artificial intelligence %K behavioral and psychological symptoms of dementia %K neuropsychiatric symptoms %K early detection %K management %K narrative literature review %D 2024 %7 28.5.2024 %9 Review %J JMIR Nursing %G English %X Background: The behavioral and psychological symptoms of dementia (BPSD) are common among people with dementia and have multiple negative consequences. Artificial intelligence–based technologies (AITs) have the potential to help nurses in the early prodromal detection of BPSD. Despite significant recent interest in the topic and the increasing number of available appropriate devices, little information is available on using AITs to help nurses striving to detect BPSD early. Objective: The aim of this study is to identify the number and characteristics of existing publications on introducing AITs to support nursing interventions to detect and manage BPSD early. Methods: A literature review of publications in the PubMed database referring to AITs and dementia was conducted in September 2023. A detailed analysis sought to identify the characteristics of these publications. The results were reported using a narrative approach. Results: A total of 25 publications from 14 countries were identified, with most describing prospective observational studies. We identified three categories of publications on using AITs and they are (1) predicting behaviors and the stages and progression of dementia, (2) screening and assessing clinical symptoms, and (3) managing dementia and BPSD. Most of the publications referred to managing dementia and BPSD. Conclusions: Despite growing interest, most AITs currently in use are designed to support psychosocial approaches to treating and caring for existing clinical signs of BPSD. AITs thus remain undertested and underused for the early and real-time detection of BPSD. They could, nevertheless, provide nurses with accurate, reliable systems for assessing, monitoring, planning, and supporting safe therapeutic interventions. %M 38805252 %R 10.2196/54496 %U https://nursing.jmir.org/2024/1/e54496 %U https://doi.org/10.2196/54496 %U http://www.ncbi.nlm.nih.gov/pubmed/38805252 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e57292 %T Evaluation of Artificial Intelligence Algorithms for Diabetic Retinopathy Detection: Protocol for a Systematic Review and Meta-Analysis %A Sesgundo III,Jaime Angeles %A Maeng,David Collin %A Tukay,Jumelle Aubrey %A Ascano,Maria Patricia %A Suba-Cohen,Justine %A Sampang,Virginia %+ Office of Medical Research, University of Nevada, Reno School of Medicine, 1664 N. Virginia St, Reno, NV, 89557, United States, 1 7757841110, jsesgundo@med.unr.edu %K artificial intelligence %K diabetic retinopathy %K deep learning %K ophthalmology %K accuracy %K imaging %K AI %K DR %K complication %K retinopathy %K Optha %K AI algorithms %K detection %K management %K ophthalmologists %K early detection %K screening %K meta-analysis %K diabetes mellitus %K DM %K diabetes %K systematic review %D 2024 %7 27.5.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Diabetic retinopathy (DR) is one of the most common complications of diabetes mellitus. The global burden is immense with a worldwide prevalence of 8.5%. Recent advancements in artificial intelligence (AI) have demonstrated the potential to transform the landscape of ophthalmology with earlier detection and management of DR. Objective: This study seeks to provide an update and evaluate the accuracy and current diagnostic ability of AI in detecting DR versus ophthalmologists. Additionally, this review will highlight the potential of AI integration to enhance DR screening, management, and disease progression. Methods: A systematic review of the current landscape of AI’s role in DR will be undertaken, guided by the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) model. Relevant peer-reviewed papers published in English will be identified by searching 4 international databases: PubMed, Embase, CINAHL, and the Cochrane Central Register of Controlled Trials. Eligible studies will include randomized controlled trials, observational studies, and cohort studies published on or after 2022 that evaluate AI’s performance in retinal imaging detection of DR in diverse adult populations. Studies that focus on specific comorbid conditions, nonimage-based applications of AI, or those lacking a direct comparison group or clear methodology will be excluded. Selected papers will be independently assessed for bias by 2 review authors (JS and DM) using the Quality Assessment of Diagnostic Accuracy Studies tool for systematic reviews. Upon systematic review completion, if it is determined that there are sufficient data, a meta-analysis will be performed. Data synthesis will use a quantitative model. Statistical software such as RevMan and STATA will be used to produce a random-effects meta-regression model to pool data from selected studies. Results: Using selected search queries across multiple databases, we accumulated 3494 studies regarding our topic of interest, of which 1588 were duplicates, leaving 1906 unique research papers to review and analyze. Conclusions: This systematic review and meta-analysis protocol outlines a comprehensive evaluation of AI for DR detection. This active study is anticipated to assess the current accuracy of AI methods in detecting DR. International Registered Report Identifier (IRRID): DERR1-10.2196/57292 %M 38801771 %R 10.2196/57292 %U https://www.researchprotocols.org/2024/1/e57292 %U https://doi.org/10.2196/57292 %U http://www.ncbi.nlm.nih.gov/pubmed/38801771 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54095 %T Advances in the Application of AI Robots in Critical Care: Scoping Review %A Li,Yun %A Wang,Min %A Wang,Lu %A Cao,Yuan %A Liu,Yuyan %A Zhao,Yan %A Yuan,Rui %A Yang,Mengmeng %A Lu,Siqian %A Sun,Zhichao %A Zhou,Feihu %A Qian,Zhirong %A Kang,Hongjun %+ The First Medical Centre, Chinese PLA General Hospital, 28 Fuxing Road, Haidian District,, Beijing, 100853, China, 86 13811989878, doctorkang301@163.com %K critical care medicine %K artificial intelligence %K AI %K robotics %K intensive care unit %K ICU %D 2024 %7 27.5.2024 %9 Review %J J Med Internet Res %G English %X Background: In recent epochs, the field of critical medicine has experienced significant advancements due to the integration of artificial intelligence (AI). Specifically, AI robots have evolved from theoretical concepts to being actively implemented in clinical trials and applications. The intensive care unit (ICU), known for its reliance on a vast amount of medical information, presents a promising avenue for the deployment of robotic AI, anticipated to bring substantial improvements to patient care. Objective: This review aims to comprehensively summarize the current state of AI robots in the field of critical care by searching for previous studies, developments, and applications of AI robots related to ICU wards. In addition, it seeks to address the ethical challenges arising from their use, including concerns related to safety, patient privacy, responsibility delineation, and cost-benefit analysis. Methods: Following the scoping review framework proposed by Arksey and O’Malley and the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, we conducted a scoping review to delineate the breadth of research in this field of AI robots in ICU and reported the findings. The literature search was carried out on May 1, 2023, across 3 databases: PubMed, Embase, and the IEEE Xplore Digital Library. Eligible publications were initially screened based on their titles and abstracts. Publications that passed the preliminary screening underwent a comprehensive review. Various research characteristics were extracted, summarized, and analyzed from the final publications. Results: Of the 5908 publications screened, 77 (1.3%) underwent a full review. These studies collectively spanned 21 ICU robotics projects, encompassing their system development and testing, clinical trials, and approval processes. Upon an expert-reviewed classification framework, these were categorized into 5 main types: therapeutic assistance robots, nursing assistance robots, rehabilitation assistance robots, telepresence robots, and logistics and disinfection robots. Most of these are already widely deployed and commercialized in ICUs, although a select few remain under testing. All robotic systems and tools are engineered to deliver more personalized, convenient, and intelligent medical services to patients in the ICU, concurrently aiming to reduce the substantial workload on ICU medical staff and promote therapeutic and care procedures. This review further explored the prevailing challenges, particularly focusing on ethical and safety concerns, proposing viable solutions or methodologies, and illustrating the prospective capabilities and potential of AI-driven robotic technologies in the ICU environment. Ultimately, we foresee a pivotal role for robots in a future scenario of a fully automated continuum from admission to discharge within the ICU. Conclusions: This review highlights the potential of AI robots to transform ICU care by improving patient treatment, support, and rehabilitation processes. However, it also recognizes the ethical complexities and operational challenges that come with their implementation, offering possible solutions for future development and optimization. %M 38801765 %R 10.2196/54095 %U https://www.jmir.org/2024/1/e54095 %U https://doi.org/10.2196/54095 %U http://www.ncbi.nlm.nih.gov/pubmed/38801765 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 11 %N %P e55399 %T The Impact of Performance Expectancy, Workload, Risk, and Satisfaction on Trust in ChatGPT: Cross-Sectional Survey Analysis %A Choudhury,Avishek %A Shamszare,Hamid %+ Industrial and Management Systems Engineering, Benjamin M. Statler College of Engineering and Mineral Resources, West Virginia University, 321 Engineering Sciences Building, 1306 Evansdale Drive, Morgantown, WV, 26506, United States, 1 3042939431, avishek.choudhury@mail.wvu.edu %K ChatGPT %K chatbots %K health care %K health care decision-making %K health-related decision-making %K health care management %K decision-making %K user perception %K usability %K usable %K usableness %K usefulness %K artificial intelligence %K algorithms %K predictive models %K predictive analytics %K predictive system %K practical models %K deep learning %K cross-sectional survey %D 2024 %7 27.5.2024 %9 Original Paper %J JMIR Hum Factors %G English %X Background: ChatGPT (OpenAI) is a powerful tool for a wide range of tasks, from entertainment and creativity to health care queries. There are potential risks and benefits associated with this technology. In the discourse concerning the deployment of ChatGPT and similar large language models, it is sensible to recommend their use primarily for tasks a human user can execute accurately. As we transition into the subsequent phase of ChatGPT deployment, establishing realistic performance expectations and understanding users’ perceptions of risk associated with its use are crucial in determining the successful integration of this artificial intelligence (AI) technology. Objective: The aim of the study is to explore how perceived workload, satisfaction, performance expectancy, and risk-benefit perception influence users’ trust in ChatGPT. Methods: A semistructured, web-based survey was conducted with 607 adults in the United States who actively use ChatGPT. The survey questions were adapted from constructs used in various models and theories such as the technology acceptance model, the theory of planned behavior, the unified theory of acceptance and use of technology, and research on trust and security in digital environments. To test our hypotheses and structural model, we used the partial least squares structural equation modeling method, a widely used approach for multivariate analysis. Results: A total of 607 people responded to our survey. A significant portion of the participants held at least a high school diploma (n=204, 33.6%), and the majority had a bachelor’s degree (n=262, 43.1%). The primary motivations for participants to use ChatGPT were for acquiring information (n=219, 36.1%), amusement (n=203, 33.4%), and addressing problems (n=135, 22.2%). Some participants used it for health-related inquiries (n=44, 7.2%), while a few others (n=6, 1%) used it for miscellaneous activities such as brainstorming, grammar verification, and blog content creation. Our model explained 64.6% of the variance in trust. Our analysis indicated a significant relationship between (1) workload and satisfaction, (2) trust and satisfaction, (3) performance expectations and trust, and (4) risk-benefit perception and trust. Conclusions: The findings underscore the importance of ensuring user-friendly design and functionality in AI-based applications to reduce workload and enhance user satisfaction, thereby increasing user trust. Future research should further explore the relationship between risk-benefit perception and trust in the context of AI chatbots. %M 38801658 %R 10.2196/55399 %U https://humanfactors.jmir.org/2024/1/e55399 %U https://doi.org/10.2196/55399 %U http://www.ncbi.nlm.nih.gov/pubmed/38801658 %0 Journal Article %@ 2291-9694 %I %V 12 %N %P e56909 %T Generalization of a Deep Learning Model for Continuous Glucose Monitoring–Based Hypoglycemia Prediction: Algorithm Development and Validation Study %A Shao,Jian %A Pan,Ying %A Kou,Wei-Bin %A Feng,Huyi %A Zhao,Yu %A Zhou,Kaixin %A Zhong,Shao %K hypoglycemia prediction %K hypoglycemia %K hypoglycemic %K blood sugar %K prediction %K predictive %K deep learning %K generalization %K machine learning %K glucose %K diabetes %K continuous glucose monitoring %K type 1 diabetes %K type 2 diabetes %K LSTM %K long short-term memory %D 2024 %7 24.5.2024 %9 %J JMIR Med Inform %G English %X Background: Predicting hypoglycemia while maintaining a low false alarm rate is a challenge for the wide adoption of continuous glucose monitoring (CGM) devices in diabetes management. One small study suggested that a deep learning model based on the long short-term memory (LSTM) network had better performance in hypoglycemia prediction than traditional machine learning algorithms in European patients with type 1 diabetes. However, given that many well-recognized deep learning models perform poorly outside the training setting, it remains unclear whether the LSTM model could be generalized to different populations or patients with other diabetes subtypes. Objective: The aim of this study was to validate LSTM hypoglycemia prediction models in more diverse populations and across a wide spectrum of patients with different subtypes of diabetes. Methods: We assembled two large data sets of patients with type 1 and type 2 diabetes. The primary data set including CGM data from 192 Chinese patients with diabetes was used to develop the LSTM, support vector machine (SVM), and random forest (RF) models for hypoglycemia prediction with a prediction horizon of 30 minutes. Hypoglycemia was categorized into mild (glucose=54-70 mg/dL) and severe (glucose<54 mg/dL) levels. The validation data set of 427 patients of European-American ancestry in the United States was used to validate the models and examine their generalizations. The predictive performance of the models was evaluated according to the sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). Results: For the difficult-to-predict mild hypoglycemia events, the LSTM model consistently achieved AUC values greater than 97% in the primary data set, with a less than 3% AUC reduction in the validation data set, indicating that the model was robust and generalizable across populations. AUC values above 93% were also achieved when the LSTM model was applied to both type 1 and type 2 diabetes in the validation data set, further strengthening the generalizability of the model. Under different satisfactory levels of sensitivity for mild and severe hypoglycemia prediction, the LSTM model achieved higher specificity than the SVM and RF models, thereby reducing false alarms. Conclusions: Our results demonstrate that the LSTM model is robust for hypoglycemia prediction and is generalizable across populations or diabetes subtypes. Given its additional advantage of false-alarm reduction, the LSTM model is a strong candidate to be widely implemented in future CGM devices for hypoglycemia prediction. %R 10.2196/56909 %U https://medinform.jmir.org/2024/1/e56909 %U https://doi.org/10.2196/56909 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e57001 %T Assessing and Optimizing Large Language Models on Spondyloarthritis Multi-Choice Question Answering: Protocol for Enhancement and Assessment %A Wang,Anan %A Wu,Yunong %A Ji,Xiaojian %A Wang,Xiangyang %A Hu,Jiawen %A Zhang,Fazhan %A Zhang,Zhanchao %A Pu,Dong %A Tang,Lulu %A Ma,Shikui %A Liu,Qiang %A Dong,Jing %A He,Kunlun %A Li,Kunpeng %A Teng,Da %A Li,Tao %+ Department of Medical Innovation Research, Chinese PLA General Hospital, No.28 Fuxing Road, Wanshou Road, Haidian District, Beijing, China, 86 13810398393, litao301hospital@163.com %K spondyloarthritis %K benchmark %K large language model %K artificial intelligence %K AI %K AI chatbot %K AI-assistant diagnosis %D 2024 %7 24.5.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Spondyloarthritis (SpA), a chronic inflammatory disorder, predominantly impacts the sacroiliac joints and spine, significantly escalating the risk of disability. SpA’s complexity, as evidenced by its diverse clinical presentations and symptoms that often mimic other diseases, presents substantial challenges in its accurate diagnosis and differentiation. This complexity becomes even more pronounced in nonspecialist health care environments due to limited resources, resulting in delayed referrals, increased misdiagnosis rates, and exacerbated disability outcomes for patients with SpA. The emergence of large language models (LLMs) in medical diagnostics introduces a revolutionary potential to overcome these diagnostic hurdles. Despite recent advancements in artificial intelligence and LLMs demonstrating effectiveness in diagnosing and treating various diseases, their application in SpA remains underdeveloped. Currently, there is a notable absence of SpA-specific LLMs and an established benchmark for assessing the performance of such models in this particular field. Objective: Our objective is to develop a foundational medical model, creating a comprehensive evaluation benchmark tailored to the essential medical knowledge of SpA and its unique diagnostic and treatment protocols. The model, post-pretraining, will be subject to further enhancement through supervised fine-tuning. It is projected to significantly aid physicians in SpA diagnosis and treatment, especially in settings with limited access to specialized care. Furthermore, this initiative is poised to promote early and accurate SpA detection at the primary care level, thereby diminishing the risks associated with delayed or incorrect diagnoses. Methods: A rigorous benchmark, comprising 222 meticulously formulated multiple-choice questions on SpA, will be established and developed. These questions will be extensively revised to ensure their suitability for accurately evaluating LLMs’ performance in real-world diagnostic and therapeutic scenarios. Our methodology involves selecting and refining top foundational models using public data sets. The best-performing model in our benchmark will undergo further training. Subsequently, more than 80,000 real-world inpatient and outpatient cases from hospitals will enhance LLM training, incorporating techniques such as supervised fine-tuning and low-rank adaptation. We will rigorously assess the models’ generated responses for accuracy and evaluate their reasoning processes using the metrics of fluency, relevance, completeness, and medical proficiency. Results: Development of the model is progressing, with significant enhancements anticipated by early 2024. The benchmark, along with the results of evaluations, is expected to be released in the second quarter of 2024. Conclusions: Our trained model aims to capitalize on the capabilities of LLMs in analyzing complex clinical data, thereby enabling precise detection, diagnosis, and treatment of SpA. This innovation is anticipated to play a vital role in diminishing the disabilities arising from delayed or incorrect SpA diagnoses. By promoting this model across diverse health care settings, we anticipate a significant improvement in SpA management, culminating in enhanced patient outcomes and a reduced overall burden of the disease. International Registered Report Identifier (IRRID): DERR1-10.2196/57001 %M 38788208 %R 10.2196/57001 %U https://www.researchprotocols.org/2024/1/e57001 %U https://doi.org/10.2196/57001 %U http://www.ncbi.nlm.nih.gov/pubmed/38788208 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e50446 %T Evaluating a New Digital App–Based Program for Heart Health: Feasibility and Acceptability Pilot Study %A Lockwood,Kimberly G %A Kulkarni,Priya R %A Paruthi,Jason %A Buch,Lauren S %A Chaffard,Mathieu %A Schitter,Eva C %A Branch,OraLee H %A Graham,Sarah A %+ Lark Health, 809 Cuesta Dr, Suite B #1033, Mountain View, CA, 94040, United States, 1 5033801340, kimberly.lockwood@lark.com %K digital health %K cardiovascular disease %K artificial intelligence %K AI %K acceptability and feasibility %K pilot study %K lifestyle coaching %K mobile phone %D 2024 %7 24.5.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Cardiovascular disease (CVD) is the leading cause of death in the United States, affecting a significant proportion of adults. Digital health lifestyle change programs have emerged as a promising method of CVD prevention, offering benefits such as on-demand support, lower cost, and increased scalability. Prior research has shown the effectiveness of digital health interventions in reducing negative CVD outcomes. This pilot study focuses on the Lark Heart Health program, a fully digital artificial intelligence (AI)–powered smartphone app, providing synchronous CVD risk counseling, educational content, and personalized coaching. Objective: This pilot study evaluated the feasibility and acceptability of a fully digital AI-powered lifestyle change program called Lark Heart Health. Primary analyses assessed (1) participant satisfaction, (2) engagement with the program, and (3) the submission of health screeners. Secondary analyses were conducted to evaluate weight loss outcomes, given that a major focus of the Heart Health program is weight management. Methods: This study enrolled 509 participants in the 90-day real-world single-arm pilot study of the Heart Health app. Participants engaged with the app by participating in coaching conversations, logging meals, tracking weight, and completing educational lessons. The study outcomes included participant satisfaction, app engagement, the completion of screeners, and weight loss. Results: On average, Heart Health study participants were aged 60.9 (SD 10.3; range 40-75) years, with average BMI indicating class I obesity. Of the 509 participants, 489 (96.1%) stayed enrolled until the end of the study (dropout rate: 3.9%). Study retention, based on providing a weight measurement during month 3, was 80% (407/509; 95% CI 76.2%-83.4%). Participant satisfaction scores indicated high satisfaction with the overall app experience, with an average score of ≥4 out of 5 for all satisfaction indicators. Participants also showed high engagement with the app, with 83.4% (408/489; 95% CI 80.1%-86.7%) of the sample engaging in ≥5 coaching conversations in month 3. The results indicated that participants were successfully able to submit health screeners within the app, with 90% (440/489; 95% CI 87%-92.5%) submitting all 3 screeners measured in the study. Finally, secondary analyses showed that participants lost weight during the program, with analyses showing an average weight nadir of 3.8% (SD 2.9%; 95% CI 3.5%-4.1%). Conclusions: The study results indicate that participants in this study were satisfied with their experience using the Heart Health app, highly engaged with the app features, and willing and able to complete health screening surveys in the app. These acceptability and feasibility results provide a key first step in the process of evidence generation for a new AI-powered digital program for heart health. Future work can expand these results to test outcomes with a commercial version of the Heart Health app in a diverse real-world sample. %M 38787598 %R 10.2196/50446 %U https://formative.jmir.org/2024/1/e50446 %U https://doi.org/10.2196/50446 %U http://www.ncbi.nlm.nih.gov/pubmed/38787598 %0 Journal Article %@ 2562-7600 %I JMIR Publications %V 7 %N %P e56474 %T The Cooperation Between Nurses and a New Digital Colleague “AI-Driven Lifestyle Monitoring” in Long-Term Care for Older Adults: Viewpoint %A Groeneveld,Sjors %A Bin Noon,Gaya %A den Ouden,Marjolein E M %A van Os-Medendorp,Harmieke %A van Gemert-Pijnen,J E W C %A Verdaasdonk,Rudolf M %A Morita,Plinio Pelegrini %+ Research Group Technology, Health & Care, Saxion University of Applied Sciences, Postbox 70.000, Enschede, 7500 KB, Netherlands, 31 88 019 8888, s.w.m.groeneveld@saxion.nl %K artificial intelligence %K data %K algorithm %K nurse %K nurses %K health care professional %K health care professionals %K health professional %K health professionals %K health technology %K digital health %K smart home %K smart homes %K health monitoring %K health promotion %K aging in place %K assisted living %K ambient assisted living %K aging %K gerontology %K geriatric %K geriatrics %K older adults %K independent living %K machine learning %D 2024 %7 23.5.2024 %9 Viewpoint %J JMIR Nursing %G English %X Technology has a major impact on the way nurses work. Data-driven technologies, such as artificial intelligence (AI), have particularly strong potential to support nurses in their work. However, their use also introduces ambiguities. An example of such a technology is AI-driven lifestyle monitoring in long-term care for older adults, based on data collected from ambient sensors in an older adult’s home. Designing and implementing this technology in such an intimate setting requires collaboration with nurses experienced in long-term and older adult care. This viewpoint paper emphasizes the need to incorporate nurses and the nursing perspective into every stage of designing, using, and implementing AI-driven lifestyle monitoring in long-term care settings. It is argued that the technology will not replace nurses, but rather act as a new digital colleague, complementing the humane qualities of nurses and seamlessly integrating into nursing workflows. Several advantages of such a collaboration between nurses and technology are highlighted, as are potential risks such as decreased patient empowerment, depersonalization, lack of transparency, and loss of human contact. Finally, practical suggestions are offered to move forward with integrating the digital colleague. %M 38781012 %R 10.2196/56474 %U https://nursing.jmir.org/2024/1/e56474 %U https://doi.org/10.2196/56474 %U http://www.ncbi.nlm.nih.gov/pubmed/38781012 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54705 %T AI Quality Standards in Health Care: Rapid Umbrella Review %A Kuziemsky,Craig E %A Chrimes,Dillon %A Minshall,Simon %A Mannerow,Michael %A Lau,Francis %+ MacEwan University, 10700 104 Avenue, 7-257, Edmonton, AB, T5J4S2, Canada, 1 7806333290, kuziemskyc@macewan.ca %K artificial intelligence %K health care artificial intelligence %K health care AI %K rapid review %K umbrella review %K quality standard %D 2024 %7 22.5.2024 %9 Review %J J Med Internet Res %G English %X Background: In recent years, there has been an upwelling of artificial intelligence (AI) studies in the health care literature. During this period, there has been an increasing number of proposed standards to evaluate the quality of health care AI studies. Objective: This rapid umbrella review examines the use of AI quality standards in a sample of health care AI systematic review articles published over a 36-month period. Methods: We used a modified version of the Joanna Briggs Institute umbrella review method. Our rapid approach was informed by the practical guide by Tricco and colleagues for conducting rapid reviews. Our search was focused on the MEDLINE database supplemented with Google Scholar. The inclusion criteria were English-language systematic reviews regardless of review type, with mention of AI and health in the abstract, published during a 36-month period. For the synthesis, we summarized the AI quality standards used and issues noted in these reviews drawing on a set of published health care AI standards, harmonized the terms used, and offered guidance to improve the quality of future health care AI studies. Results: We selected 33 review articles published between 2020 and 2022 in our synthesis. The reviews covered a wide range of objectives, topics, settings, designs, and results. Over 60 AI approaches across different domains were identified with varying levels of detail spanning different AI life cycle stages, making comparisons difficult. Health care AI quality standards were applied in only 39% (13/33) of the reviews and in 14% (25/178) of the original studies from the reviews examined, mostly to appraise their methodological or reporting quality. Only a handful mentioned the transparency, explainability, trustworthiness, ethics, and privacy aspects. A total of 23 AI quality standard–related issues were identified in the reviews. There was a recognized need to standardize the planning, conduct, and reporting of health care AI studies and address their broader societal, ethical, and regulatory implications. Conclusions: Despite the growing number of AI standards to assess the quality of health care AI studies, they are seldom applied in practice. With increasing desire to adopt AI in different health topics, domains, and settings, practitioners and researchers must stay abreast of and adapt to the evolving landscape of health care AI quality standards and apply these standards to improve the quality of their AI studies. %M 38776538 %R 10.2196/54705 %U https://www.jmir.org/2024/1/e54705 %U https://doi.org/10.2196/54705 %U http://www.ncbi.nlm.nih.gov/pubmed/38776538 %0 Journal Article %@ 2817-092X %I JMIR Publications %V 3 %N %P e51822 %T Direct Clinical Applications of Natural Language Processing in Common Neurological Disorders: Scoping Review %A Lefkovitz,Ilana %A Walsh,Samantha %A Blank,Leah J %A Jetté,Nathalie %A Kummer,Benjamin R %+ Department of Neurology, Icahn School of Medicine at Mount Sinai, One Gustave Levy Place, Box 1137, New York, NY, 10029, United States, 1 212 241 5050, benjamin.kummer@mountsinai.org %K natural language processing %K NLP %K unstructured %K text %K machine learning %K deep learning %K neurology %K headache disorders %K migraine %K Parkinson disease %K cerebrovascular disease %K stroke %K transient ischemic attack %K epilepsy %K multiple sclerosis %K cardiovascular %K artificial intelligence %K Parkinson %K neurological %K neurological disorder %K scoping review %K diagnosis %K treatment %K prediction %D 2024 %7 22.5.2024 %9 Review %J JMIR Neurotech %G English %X Background: Natural language processing (NLP), a branch of artificial intelligence that analyzes unstructured language, is being increasingly used in health care. However, the extent to which NLP has been formally studied in neurological disorders remains unclear. Objective: We sought to characterize studies that applied NLP to the diagnosis, prediction, or treatment of common neurological disorders. Methods: This review followed the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) standards. The search was conducted using MEDLINE and Embase on May 11, 2022. Studies of NLP use in migraine, Parkinson disease, Alzheimer disease, stroke and transient ischemic attack, epilepsy, or multiple sclerosis were included. We excluded conference abstracts, review papers, as well as studies involving heterogeneous clinical populations or indirect clinical uses of NLP. Study characteristics were extracted and analyzed using descriptive statistics. We did not aggregate measurements of performance in our review due to the high variability in study outcomes, which is the main limitation of the study. Results: In total, 916 studies were identified, of which 41 (4.5%) met all eligibility criteria and were included in the final review. Of the 41 included studies, the most frequently represented disorders were stroke and transient ischemic attack (n=20, 49%), followed by epilepsy (n=10, 24%), Alzheimer disease (n=6, 15%), and multiple sclerosis (n=5, 12%). We found no studies of NLP use in migraine or Parkinson disease that met our eligibility criteria. The main objective of NLP was diagnosis (n=20, 49%), followed by disease phenotyping (n=17, 41%), prognostication (n=9, 22%), and treatment (n=4, 10%). In total, 18 (44%) studies used only machine learning approaches, 6 (15%) used only rule-based methods, and 17 (41%) used both. Conclusions: We found that NLP was most commonly applied for diagnosis, implying a potential role for NLP in augmenting diagnostic accuracy in settings with limited access to neurological expertise. We also found several gaps in neurological NLP research, with few to no studies addressing certain disorders, which may suggest additional areas of inquiry. Trial Registration: Prospective Register of Systematic Reviews (PROSPERO) CRD42021228703; https://www.crd.york.ac.uk/PROSPERO/display_record.php?RecordID=228703 %R 10.2196/51822 %U https://neuro.jmir.org/2024/1/e51822 %U https://doi.org/10.2196/51822 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e53164 %T Hallucination Rates and Reference Accuracy of ChatGPT and Bard for Systematic Reviews: Comparative Analysis %A Chelli,Mikaël %A Descamps,Jules %A Lavoué,Vincent %A Trojani,Christophe %A Azar,Michel %A Deckert,Marcel %A Raynier,Jean-Luc %A Clowez,Gilles %A Boileau,Pascal %A Ruetsch-Chelli,Caroline %+ Institute for Sports and Reconstructive Bone and Joint Surgery, Groupe Kantys, 7 Avenue, Durante, Nice, 06000, France, 33 4 93 16 76 40, mikael.chelli@gmail.com %K artificial intelligence %K large language models %K ChatGPT %K Bard %K rotator cuff %K systematic reviews %K literature search %K hallucinated %K human conducted %D 2024 %7 22.5.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Large language models (LLMs) have raised both interest and concern in the academic community. They offer the potential for automating literature search and synthesis for systematic reviews but raise concerns regarding their reliability, as the tendency to generate unsupported (hallucinated) content persist. Objective: The aim of the study is to assess the performance of LLMs such as ChatGPT and Bard (subsequently rebranded Gemini) to produce references in the context of scientific writing. Methods: The performance of ChatGPT and Bard in replicating the results of human-conducted systematic reviews was assessed. Using systematic reviews pertaining to shoulder rotator cuff pathology, these LLMs were tested by providing the same inclusion criteria and comparing the results with original systematic review references, serving as gold standards. The study used 3 key performance metrics: recall, precision, and F1-score, alongside the hallucination rate. Papers were considered “hallucinated” if any 2 of the following information were wrong: title, first author, or year of publication. Results: In total, 11 systematic reviews across 4 fields yielded 33 prompts to LLMs (3 LLMs×11 reviews), with 471 references analyzed. Precision rates for GPT-3.5, GPT-4, and Bard were 9.4% (13/139), 13.4% (16/119), and 0% (0/104) respectively (P<.001). Recall rates were 11.9% (13/109) for GPT-3.5 and 13.7% (15/109) for GPT-4, with Bard failing to retrieve any relevant papers (P<.001). Hallucination rates stood at 39.6% (55/139) for GPT-3.5, 28.6% (34/119) for GPT-4, and 91.4% (95/104) for Bard (P<.001). Further analysis of nonhallucinated papers retrieved by GPT models revealed significant differences in identifying various criteria, such as randomized studies, participant criteria, and intervention criteria. The study also noted the geographical and open-access biases in the papers retrieved by the LLMs. Conclusions: Given their current performance, it is not recommended for LLMs to be deployed as the primary or exclusive tool for conducting systematic reviews. Any references generated by such models warrant thorough validation by researchers. The high occurrence of hallucinations in LLMs highlights the necessity for refining their training and functionality before confidently using them for rigorous academic purposes. %M 38776130 %R 10.2196/53164 %U https://www.jmir.org/2024/1/e53164 %U https://doi.org/10.2196/53164 %U http://www.ncbi.nlm.nih.gov/pubmed/38776130 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e53968 %T User Dynamics and Thematic Exploration in r/Depression During the COVID-19 Pandemic: Insights From Overlapping r/SuicideWatch Users %A Zhu,Jianfeng %A Jin,Ruoming %A Kenne,Deric R %A Phan,NhatHai %A Ku,Wei-Shinn %+ Department of Computer Science, Kent State University, 800 E. Summit St., Kent, OH, 44242, United States, 1 3306729980, jzhu10@kent.edu %K reddit %K natural language processing %K NLP %K suicidal ideation %K SI %K online communities %K depression symptoms %K COVID-19 pandemic %K bidirectional encoder representations from transformers %K BERT %K r/SuicideWatch %K r/Depression %D 2024 %7 20.5.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: In 2023, the United States experienced its highest- recorded number of suicides, exceeding 50,000 deaths. In the realm of psychiatric disorders, major depressive disorder stands out as the most common issue, affecting 15% to 17% of the population and carrying a notable suicide risk of approximately 15%. However, not everyone with depression has suicidal thoughts. While “suicidal depression” is not a clinical diagnosis, it may be observed in daily life, emphasizing the need for awareness. Objective: This study aims to examine the dynamics, emotional tones, and topics discussed in posts within the r/Depression subreddit, with a specific focus on users who had also engaged in the r/SuicideWatch community. The objective was to use natural language processing techniques and models to better understand the complexities of depression among users with potential suicide ideation, with the goal of improving intervention and prevention strategies for suicide. Methods: Archived posts were extracted from the r/Depression and r/SuicideWatch Reddit communities in English spanning from 2019 to 2022, resulting in a final data set of over 150,000 posts contributed by approximately 25,000 unique overlapping users. A broad and comprehensive mix of methods was conducted on these posts, including trend and survival analysis, to explore the dynamic of users in the 2 subreddits. The BERT family of models extracted features from data for sentiment and thematic analysis. Results: On August 16, 2020, the post count in r/SuicideWatch surpassed that of r/Depression. The transition from r/Depression to r/SuicideWatch in 2020 was the shortest, lasting only 26 days. Sadness emerged as the most prevalent emotion among overlapping users in the r/Depression community. In addition, physical activity changes, negative self-view, and suicidal thoughts were identified as the most common depression symptoms, all showing strong positive correlations with the emotion tone of disappointment. Furthermore, the topic “struggles with depression and motivation in school and work” (12%) emerged as the most discussed topic aside from suicidal thoughts, categorizing users based on their inclination toward suicide ideation. Conclusions: Our study underscores the effectiveness of using natural language processing techniques to explore language markers and patterns associated with mental health challenges in online communities like r/Depression and r/SuicideWatch. These insights offer novel perspectives distinct from previous research. In the future, there will be potential for further refinement and optimization of machine classifications using these techniques, which could lead to more effective intervention and prevention strategies. %M 38767953 %R 10.2196/53968 %U https://www.jmir.org/2024/1/e53968 %U https://doi.org/10.2196/53968 %U http://www.ncbi.nlm.nih.gov/pubmed/38767953 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e47805 %T Framework for Ranking Machine Learning Predictions of Limited, Multimodal, and Longitudinal Behavioral Passive Sensing Data: Combining User-Agnostic and Personalized Modeling %A Mullick,Tahsin %A Shaaban,Sam %A Radovic,Ana %A Doryab,Afsaneh %+ Department of Systems and Information Engineering, University of Virginia, Olsson Hall, 151 Engineer's Way, Charlottesville, VA, 22903, United States, 1 4349245393, tum7q@virginia.edu %K machine learning %K AI %K artificial intelligence %K passive sensing %K ranking framework %K small health data set %K ranking %K algorithm %K algorithms %K sensor %K multimodal %K predict %K prediction %K agnostic %K framework %K validation %K data set %D 2024 %7 20.5.2024 %9 Original Paper %J JMIR AI %G English %X Background: Passive mobile sensing provides opportunities for measuring and monitoring health status in the wild and outside of clinics. However, longitudinal, multimodal mobile sensor data can be small, noisy, and incomplete. This makes processing, modeling, and prediction of these data challenging. The small size of the data set restricts it from being modeled using complex deep learning networks. The current state of the art (SOTA) tackles small sensor data sets following a singular modeling paradigm based on traditional machine learning (ML) algorithms. These opt for either a user-agnostic modeling approach, making the model susceptible to a larger degree of noise, or a personalized approach, where training on individual data alludes to a more limited data set, giving rise to overfitting, therefore, ultimately, having to seek a trade-off by choosing 1 of the 2 modeling approaches to reach predictions. Objective: The objective of this study was to filter, rank, and output the best predictions for small, multimodal, longitudinal sensor data using a framework that is designed to tackle data sets that are limited in size (particularly targeting health studies that use passive multimodal sensors) and that combines both user agnostic and personalized approaches, along with a combination of ranking strategies to filter predictions. Methods: In this paper, we introduced a novel ranking framework for longitudinal multimodal sensors (FLMS) to address challenges encountered in health studies involving passive multimodal sensors. Using the FLMS, we (1) built a tensor-based aggregation and ranking strategy for final interpretation, (2) processed various combinations of sensor fusions, and (3) balanced user-agnostic and personalized modeling approaches with appropriate cross-validation strategies. The performance of the FLMS was validated with the help of a real data set of adolescents diagnosed with major depressive disorder for the prediction of change in depression in the adolescent participants. Results: Predictions output by the proposed FLMS achieved a 7% increase in accuracy and a 13% increase in recall for the real data set. Experiments with existing SOTA ML algorithms showed an 11% increase in accuracy for the depression data set and how overfitting and sparsity were handled. Conclusions: The FLMS aims to fill the gap that currently exists when modeling passive sensor data with a small number of data points. It achieves this through leveraging both user-agnostic and personalized modeling techniques in tandem with an effective ranking strategy to filter predictions. %M 38875667 %R 10.2196/47805 %U https://ai.jmir.org/2024/1/e47805 %U https://doi.org/10.2196/47805 %U http://www.ncbi.nlm.nih.gov/pubmed/38875667 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e53761 %T Application of Machine Learning in Multimorbidity Research: Protocol for a Scoping Review %A Anthonimuthu,Danny Jeganathan %A Hejlesen,Ole %A Zwisler,Ann-Dorthe Olsen %A Udsen,Flemming Witt %+ Department of Health Science and Technology, Faculty of Medicine, Aalborg University, Selma Lagerløfs Vej 249, Gistrup, 9260, Denmark, 45 41627109, dant@hst.aau.dk %K multimorbidity %K multiple long-term conditions %K machine learning %K artificial intelligence %K scoping review %K protocol %K chronic conditions %K health care system %K health care %D 2024 %7 20.5.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Multimorbidity, defined as the coexistence of multiple chronic conditions, poses significant challenges to health care systems on a global scale. It is associated with increased mortality, reduced quality of life, and increased health care costs. The burden of multimorbidity is expected to worsen if no effective intervention is taken. Machine learning has the potential to assist in addressing these challenges since it offers advanced analysis and decision-making capabilities, such as disease prediction, treatment development, and clinical strategies. Objective: This paper represents the protocol of a scoping review that aims to identify and explore the current literature concerning the use of machine learning for patients with multimorbidity. More precisely, the objective is to recognize various machine learning models, the patient groups involved, features considered, types of input data, the maturity of the machine learning algorithms, and the outcomes from these machine learning models. Methods: The scoping review will be based on the guidelines of the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews). Five databases (PubMed, Embase, IEEE, Web of Science, and Scopus) are chosen to conduct a literature search. Two reviewers will independently screen the titles, abstracts, and full texts of identified studies based on predefined eligibility criteria. Covidence (Veritas Health Innovation Ltd) will be used as a tool for managing and screening papers. Only studies that examine more than 1 chronic disease or individuals with a single chronic condition at risk of developing another will be included in the scoping review. Data from the included studies will be collected using Microsoft Excel (Microsoft Corp). The focus of the data extraction will be on bibliographical information, objectives, study populations, types of input data, types of algorithm, performance, maturity of the algorithms, and outcome. Results: The screening process will be presented in a PRISMA-ScR flow diagram. The findings of the scoping review will be conveyed through a narrative synthesis. Additionally, data extracted from the studies will be presented in more comprehensive formats, such as charts or tables. The results will be presented in a forthcoming scoping review, which will be published in a peer-reviewed journal. Conclusions: To our knowledge, this may be the first scoping review to investigate the use of machine learning in multimorbidity research. The goal of the scoping review is to summarize the field of literature on machine learning in patients with multiple chronic conditions, highlight different approaches, and potentially discover research gaps. The results will offer insights for future research within this field, contributing to developments that can enhance patient outcomes. International Registered Report Identifier (IRRID): PRR1-10.2196/53761 %M 38767948 %R 10.2196/53761 %U https://www.researchprotocols.org/2024/1/e53761 %U https://doi.org/10.2196/53761 %U http://www.ncbi.nlm.nih.gov/pubmed/38767948 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e53985 %T Longitudinal Changes in Diagnostic Accuracy of a Differential Diagnosis List Developed by an AI-Based Symptom Checker: Retrospective Observational Study %A Harada,Yukinori %A Sakamoto,Tetsu %A Sugimoto,Shu %A Shimizu,Taro %+ Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, 880 Kitakobayashi, Shimotsuga, 321-0293, Japan, 81 282 86 1111, yharada@dokkyomed.ac.jp %K atypical presentations %K diagnostic accuracy %K diagnosis %K diagnostics %K symptom checker %K uncommon diseases %K symptom checkers %K uncommon %K rare %K artificial intelligence %D 2024 %7 17.5.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Artificial intelligence (AI) symptom checker models should be trained using real-world patient data to improve their diagnostic accuracy. Given that AI-based symptom checkers are currently used in clinical practice, their performance should improve over time. However, longitudinal evaluations of the diagnostic accuracy of these symptom checkers are limited. Objective: This study aimed to assess the longitudinal changes in the accuracy of differential diagnosis lists created by an AI-based symptom checker used in the real world. Methods: This was a single-center, retrospective, observational study. Patients who visited an outpatient clinic without an appointment between May 1, 2019, and April 30, 2022, and who were admitted to a community hospital in Japan within 30 days of their index visit were considered eligible. We only included patients who underwent an AI-based symptom checkup at the index visit, and the diagnosis was finally confirmed during follow-up. Final diagnoses were categorized as common or uncommon, and all cases were categorized as typical or atypical. The primary outcome measure was the accuracy of the differential diagnosis list created by the AI-based symptom checker, defined as the final diagnosis in a list of 10 differential diagnoses created by the symptom checker. To assess the change in the symptom checker’s diagnostic accuracy over 3 years, we used a chi-square test to compare the primary outcome over 3 periods: from May 1, 2019, to April 30, 2020 (first year); from May 1, 2020, to April 30, 2021 (second year); and from May 1, 2021, to April 30, 2022 (third year). Results: A total of 381 patients were included. Common diseases comprised 257 (67.5%) cases, and typical presentations were observed in 298 (78.2%) cases. Overall, the accuracy of the differential diagnosis list created by the AI-based symptom checker was 172 (45.1%), which did not differ across the 3 years (first year: 97/219, 44.3%; second year: 32/72, 44.4%; and third year: 43/90, 47.7%; P=.85). The accuracy of the differential diagnosis list created by the symptom checker was low in those with uncommon diseases (30/124, 24.2%) and atypical presentations (12/83, 14.5%). In the multivariate logistic regression model, common disease (P<.001; odds ratio 4.13, 95% CI 2.50-6.98) and typical presentation (P<.001; odds ratio 6.92, 95% CI 3.62-14.2) were significantly associated with the accuracy of the differential diagnosis list created by the symptom checker. Conclusions: A 3-year longitudinal survey of the diagnostic accuracy of differential diagnosis lists developed by an AI-based symptom checker, which has been implemented in real-world clinical practice settings, showed no improvement over time. Uncommon diseases and atypical presentations were independently associated with a lower diagnostic accuracy. In the future, symptom checkers should be trained to recognize uncommon conditions. %M 38758588 %R 10.2196/53985 %U https://formative.jmir.org/2024/1/e53985 %U https://doi.org/10.2196/53985 %U http://www.ncbi.nlm.nih.gov/pubmed/38758588 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e55913 %T Machine Learning–Based Prediction of Suicidal Thinking in Adolescents by Derivation and Validation in 3 Independent Worldwide Cohorts: Algorithm Development and Validation Study %A Kim,Hyejun %A Son,Yejun %A Lee,Hojae %A Kang,Jiseung %A Hammoodi,Ahmed %A Choi,Yujin %A Kim,Hyeon Jin %A Lee,Hayeon %A Fond,Guillaume %A Boyer,Laurent %A Kwon,Rosie %A Woo,Selin %A Yon,Dong Keon %+ Center for Digital Health, Medical Science Research Institute, Kyung Hee University College of Medicine, 23 Kyungheedae–ro, Dongdaemun–gu, Seoul, 02447, Republic of Korea, 82 2 6935 2476, yonkkang@gmail.com %K adolescent %K machine learning %K Shapley additive explanations %K SHAP value %K suicidal thinking %K XGBoost %K mental health %K predictive model %K risk behavior %D 2024 %7 17.5.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Suicide is the second-leading cause of death among adolescents and is associated with clusters of suicides. Despite numerous studies on this preventable cause of death, the focus has primarily been on single nations and traditional statistical methods. Objective: This study aims to develop a predictive model for adolescent suicidal thinking using multinational data sets and machine learning (ML). Methods: We used data from the Korea Youth Risk Behavior Web-based Survey with 566,875 adolescents aged between 13 and 18 years and conducted external validation using the Youth Risk Behavior Survey with 103,874 adolescents and Norway’s University National General Survey with 19,574 adolescents. Several tree-based ML models were developed, and feature importance and Shapley additive explanations values were analyzed to identify risk factors for adolescent suicidal thinking. Results: When trained on the Korea Youth Risk Behavior Web-based Survey data from South Korea with a 95% CI, the XGBoost model reported an area under the receiver operating characteristic (AUROC) curve of 90.06% (95% CI 89.97-90.16), displaying superior performance compared to other models. For external validation using the Youth Risk Behavior Survey data from the United States and the University National General Survey from Norway, the XGBoost model achieved AUROCs of 83.09% and 81.27%, respectively. Across all data sets, XGBoost consistently outperformed the other models with the highest AUROC score, and was selected as the optimal model. In terms of predictors of suicidal thinking, feelings of sadness and despair were the most influential, accounting for 57.4% of the impact, followed by stress status at 19.8%. This was followed by age (5.7%), household income (4%), academic achievement (3.4%), sex (2.1%), and others, which contributed less than 2% each. Conclusions: This study used ML by integrating diverse data sets from 3 countries to address adolescent suicide. The findings highlight the important role of emotional health indicators in predicting suicidal thinking among adolescents. Specifically, sadness and despair were identified as the most significant predictors, followed by stressful conditions and age. These findings emphasize the critical need for early diagnosis and prevention of mental health issues during adolescence. %M 38758578 %R 10.2196/55913 %U https://www.jmir.org/2024/1/e55913 %U https://doi.org/10.2196/55913 %U http://www.ncbi.nlm.nih.gov/pubmed/38758578 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54758 %T Utility of Large Language Models for Health Care Professionals and Patients in Navigating Hematopoietic Stem Cell Transplantation: Comparison of the Performance of ChatGPT-3.5, ChatGPT-4, and Bard %A Xue,Elisabetta %A Bracken-Clarke,Dara %A Iannantuono,Giovanni Maria %A Choo-Wosoba,Hyoyoung %A Gulley,James L %A Floudas,Charalampos S %+ Center for Immuno-Oncology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, 9000 Rockville Pike, Building 10, B2L312, Bethesda, MD, 20892, United States, 1 2403518904, elisabetta.xue@nih.gov %K hematopoietic stem cell transplant %K large language models %K chatbot %K chatbots %K stem cell %K large language model %K artificial intelligence %K AI %K medical information %K hematopoietic %K HSCT %K ChatGPT %D 2024 %7 17.5.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence is increasingly being applied to many workflows. Large language models (LLMs) are publicly accessible platforms trained to understand, interact with, and produce human-readable text; their ability to deliver relevant and reliable information is also of particular interest for the health care providers and the patients. Hematopoietic stem cell transplantation (HSCT) is a complex medical field requiring extensive knowledge, background, and training to practice successfully and can be challenging for the nonspecialist audience to comprehend. Objective: We aimed to test the applicability of 3 prominent LLMs, namely ChatGPT-3.5 (OpenAI), ChatGPT-4 (OpenAI), and Bard (Google AI), in guiding nonspecialist health care professionals and advising patients seeking information regarding HSCT. Methods: We submitted 72 open-ended HSCT–related questions of variable difficulty to the LLMs and rated their responses based on consistency—defined as replicability of the response—response veracity, language comprehensibility, specificity to the topic, and the presence of hallucinations. We then rechallenged the 2 best performing chatbots by resubmitting the most difficult questions and prompting to respond as if communicating with either a health care professional or a patient and to provide verifiable sources of information. Responses were then rerated with the additional criterion of language appropriateness, defined as language adaptation for the intended audience. Results: ChatGPT-4 outperformed both ChatGPT-3.5 and Bard in terms of response consistency (66/72, 92%; 54/72, 75%; and 63/69, 91%, respectively; P=.007), response veracity (58/66, 88%; 40/54, 74%; and 16/63, 25%, respectively; P<.001), and specificity to the topic (60/66, 91%; 43/54, 80%; and 27/63, 43%, respectively; P<.001). Both ChatGPT-4 and ChatGPT-3.5 outperformed Bard in terms of language comprehensibility (64/66, 97%; 53/54, 98%; and 52/63, 83%, respectively; P=.002). All displayed episodes of hallucinations. ChatGPT-3.5 and ChatGPT-4 were then rechallenged with a prompt to adapt their language to the audience and to provide source of information, and responses were rated. ChatGPT-3.5 showed better ability to adapt its language to nonmedical audience than ChatGPT-4 (17/21, 81% and 10/22, 46%, respectively; P=.03); however, both failed to consistently provide correct and up-to-date information resources, reporting either out-of-date materials, incorrect URLs, or unfocused references, making their output not verifiable by the reader. Conclusions: In conclusion, despite LLMs’ potential capability in confronting challenging medical topics such as HSCT, the presence of mistakes and lack of clear references make them not yet appropriate for routine, unsupervised clinical use, or patient counseling. Implementation of LLMs’ ability to access and to reference current and updated websites and research papers, as well as development of LLMs trained in specialized domain knowledge data sets, may offer potential solutions for their future clinical application. %M 38758582 %R 10.2196/54758 %U https://www.jmir.org/2024/1/e54758 %U https://doi.org/10.2196/54758 %U http://www.ncbi.nlm.nih.gov/pubmed/38758582 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e52095 %T Sample Size Considerations for Fine-Tuning Large Language Models for Named Entity Recognition Tasks: Methodological Study %A Majdik,Zoltan P %A Graham,S Scott %A Shiva Edward,Jade C %A Rodriguez,Sabrina N %A Karnes,Martha S %A Jensen,Jared T %A Barbour,Joshua B %A Rousseau,Justin F %+ Department of Rhetoric & Writing, The University of Texas at Austin, Parlin Hall 29, Mail Code: B5500, Austin, TX, 78712, United States, 1 512 475 9507, ssg@utexas.edu %K named-entity recognition %K large language models %K fine-tuning %K transfer learning %K expert annotation %K annotation %K sample size %K sample %K language model %K machine learning %K natural language processing %K disclosure %K disclosures %K statement %K statements %K conflict of interest %D 2024 %7 16.5.2024 %9 Original Paper %J JMIR AI %G English %X Background: Large language models (LLMs) have the potential to support promising new applications in health informatics. However, practical data on sample size considerations for fine-tuning LLMs to perform specific tasks in biomedical and health policy contexts are lacking. Objective: This study aims to evaluate sample size and sample selection techniques for fine-tuning LLMs to support improved named entity recognition (NER) for a custom data set of conflicts of interest disclosure statements. Methods: A random sample of 200 disclosure statements was prepared for annotation. All “PERSON” and “ORG” entities were identified by each of the 2 raters, and once appropriate agreement was established, the annotators independently annotated an additional 290 disclosure statements. From the 490 annotated documents, 2500 stratified random samples in different size ranges were drawn. The 2500 training set subsamples were used to fine-tune a selection of language models across 2 model architectures (Bidirectional Encoder Representations from Transformers [BERT] and Generative Pre-trained Transformer [GPT]) for improved NER, and multiple regression was used to assess the relationship between sample size (sentences), entity density (entities per sentence [EPS]), and trained model performance (F1-score). Additionally, single-predictor threshold regression models were used to evaluate the possibility of diminishing marginal returns from increased sample size or entity density. Results: Fine-tuned models ranged in topline NER performance from F1-score=0.79 to F1-score=0.96 across architectures. Two-predictor multiple linear regression models were statistically significant with multiple R2 ranging from 0.6057 to 0.7896 (all P<.001). EPS and the number of sentences were significant predictors of F1-scores in all cases ( P<.001), except for the GPT-2_large model, where EPS was not a significant predictor (P=.184). Model thresholds indicate points of diminishing marginal return from increased training data set sample size measured by the number of sentences, with point estimates ranging from 439 sentences for RoBERTa_large to 527 sentences for GPT-2_large. Likewise, the threshold regression models indicate a diminishing marginal return for EPS with point estimates between 1.36 and 1.38. Conclusions: Relatively modest sample sizes can be used to fine-tune LLMs for NER tasks applied to biomedical text, and training data entity density should representatively approximate entity density in production data. Training data quality and a model architecture’s intended use (text generation vs text processing or classification) may be as, or more, important as training data volume and model parameter size. %M 38875593 %R 10.2196/52095 %U https://ai.jmir.org/2024/1/e52095 %U https://doi.org/10.2196/52095 %U http://www.ncbi.nlm.nih.gov/pubmed/38875593 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51514 %T Suitability of the Current Health Technology Assessment of Innovative Artificial Intelligence-Based Medical Devices: Scoping Literature Review %A Farah,Line %A Borget,Isabelle %A Martelli,Nicolas %A Vallee,Alexandre %+ Innovation Center for Medical Devices Department, Foch Hospital, 40 rue Worth, Suresnes, 92150, France, 33 952329655, line.farah1@gmail.com %K artificial intelligence %K machine learning %K health technology assessment %K medical devices %K evaluation %D 2024 %7 13.5.2024 %9 Review %J J Med Internet Res %G English %X Background: Artificial intelligence (AI)–based medical devices have garnered attention due to their ability to revolutionize medicine. Their health technology assessment framework is lacking. Objective: This study aims to analyze the suitability of each health technology assessment (HTA) domain for the assessment of AI-based medical devices. Methods: We conducted a scoping literature review following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) methodology. We searched databases (PubMed, Embase, and Cochrane Library), gray literature, and HTA agency websites. Results: A total of 10.1% (78/775) of the references were included. Data quality and integration are vital aspects to consider when describing and assessing the technical characteristics of AI-based medical devices during an HTA process. When it comes to implementing specialized HTA for AI-based medical devices, several practical challenges and potential barriers could be highlighted and should be taken into account (AI technological evolution timeline, data requirements, complexity and transparency, clinical validation and safety requirements, regulatory and ethical considerations, and economic evaluation). Conclusions: The adaptation of the HTA process through a methodological framework for AI-based medical devices enhances the comparability of results across different evaluations and jurisdictions. By defining the necessary expertise, the framework supports the development of a skilled workforce capable of conducting robust and reliable HTAs of AI-based medical devices. A comprehensive adapted HTA framework for AI-based medical devices can provide valuable insights into the effectiveness, cost-effectiveness, and societal impact of AI-based medical devices, guiding their responsible implementation and maximizing their benefits for patients and health care systems. %M 38739911 %R 10.2196/51514 %U https://www.jmir.org/2024/1/e51514 %U https://doi.org/10.2196/51514 %U http://www.ncbi.nlm.nih.gov/pubmed/38739911 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e50204 %T Envisioning the Future of Personalized Medicine: Role and Realities of Digital Twins %A Vallée,Alexandre %+ Department of Epidemiology and Public Health, Foch Hospital, 40 rue Worth, Suresnes, 92150, France, 33 0146257352, al.vallee@hopital-foch.com %K digital health %K digital twin %K personalized medicine %K prevention %K prediction %K health care system %D 2024 %7 13.5.2024 %9 Viewpoint %J J Med Internet Res %G English %X Digital twins have emerged as a groundbreaking concept in personalized medicine, offering immense potential to transform health care delivery and improve patient outcomes. It is important to highlight the impact of digital twins on personalized medicine across the understanding of patient health, risk assessment, clinical trials and drug development, and patient monitoring. By mirroring individual health profiles, digital twins offer unparalleled insights into patient-specific conditions, enabling more accurate risk assessments and tailored interventions. However, their application extends beyond clinical benefits, prompting significant ethical debates over data privacy, consent, and potential biases in health care. The rapid evolution of this technology necessitates a careful balancing act between innovation and ethical responsibility. As the field of personalized medicine continues to evolve, digital twins hold tremendous promise in transforming health care delivery and revolutionizing patient care. While challenges exist, the continued development and integration of digital twins hold the potential to revolutionize personalized medicine, ushering in an era of tailored treatments and improved patient well-being. Digital twins can assist in recognizing trends and indicators that might signal the presence of diseases or forecast the likelihood of developing specific medical conditions, along with the progression of such diseases. Nevertheless, the use of human digital twins gives rise to ethical dilemmas related to informed consent, data ownership, and the potential for discrimination based on health profiles. There is a critical need for robust guidelines and regulations to navigate these challenges, ensuring that the pursuit of advanced health care solutions does not compromise patient rights and well-being. This viewpoint aims to ignite a comprehensive dialogue on the responsible integration of digital twins in medicine, advocating for a future where technology serves as a cornerstone for personalized, ethical, and effective patient care. %M 38739913 %R 10.2196/50204 %U https://www.jmir.org/2024/1/e50204 %U https://doi.org/10.2196/50204 %U http://www.ncbi.nlm.nih.gov/pubmed/38739913 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e53724 %T Evaluating the Diagnostic Performance of Large Language Models on Complex Multimodal Medical Cases %A Chiu,Wan Hang Keith %A Ko,Wei Sum Koel %A Cho,William Chi Shing %A Hui,Sin Yu Joanne %A Chan,Wing Chi Lawrence %A Kuo,Michael D %+ Ensemble Group, 10541 E Firewheel Drive, Scottsdale, AZ, 85259, United States, 1 4084512341, mikedkuo@gmail.com %K large language model %K hospital %K health center %K Massachusetts %K statistical analysis %K chi-square %K ANOVA %K clinician %K physician %K performance %K proficiency %K disease etiology %D 2024 %7 13.5.2024 %9 Research Letter %J J Med Internet Res %G English %X Large language models showed interpretative reasoning in solving diagnostically challenging medical cases. %M 38739441 %R 10.2196/53724 %U https://www.jmir.org/2024/1/e53724 %U https://doi.org/10.2196/53724 %U http://www.ncbi.nlm.nih.gov/pubmed/38739441 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52399 %T Potential of Large Language Models in Health Care: Delphi Study %A Denecke,Kerstin %A May,Richard %A , %A Rivera Romero,Octavio %+ Bern University of Applied Sciences, Quallgasse 21, Biel, 2502, Switzerland, 41 323216794, kerstin.denecke@bfh.ch %K large language models %K LLMs %K health care %K Delphi study %K natural language processing %K NLP %K artificial intelligence %K language model %K Delphi %K future %K innovation %K interview %K interviews %K informatics %K experience %K experiences %K attitude %K attitudes %K opinion %K perception %K perceptions %K perspective %K perspectives %K implementation %D 2024 %7 13.5.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: A large language model (LLM) is a machine learning model inferred from text data that captures subtle patterns of language use in context. Modern LLMs are based on neural network architectures that incorporate transformer methods. They allow the model to relate words together through attention to multiple words in a text sequence. LLMs have been shown to be highly effective for a range of tasks in natural language processing (NLP), including classification and information extraction tasks and generative applications. Objective: The aim of this adapted Delphi study was to collect researchers’ opinions on how LLMs might influence health care and on the strengths, weaknesses, opportunities, and threats of LLM use in health care. Methods: We invited researchers in the fields of health informatics, nursing informatics, and medical NLP to share their opinions on LLM use in health care. We started the first round with open questions based on our strengths, weaknesses, opportunities, and threats framework. In the second and third round, the participants scored these items. Results: The first, second, and third rounds had 28, 23, and 21 participants, respectively. Almost all participants (26/28, 93% in round 1 and 20/21, 95% in round 3) were affiliated with academic institutions. Agreement was reached on 103 items related to use cases, benefits, risks, reliability, adoption aspects, and the future of LLMs in health care. Participants offered several use cases, including supporting clinical tasks, documentation tasks, and medical research and education, and agreed that LLM-based systems will act as health assistants for patient education. The agreed-upon benefits included increased efficiency in data handling and extraction, improved automation of processes, improved quality of health care services and overall health outcomes, provision of personalized care, accelerated diagnosis and treatment processes, and improved interaction between patients and health care professionals. In total, 5 risks to health care in general were identified: cybersecurity breaches, the potential for patient misinformation, ethical concerns, the likelihood of biased decision-making, and the risk associated with inaccurate communication. Overconfidence in LLM-based systems was recognized as a risk to the medical profession. The 6 agreed-upon privacy risks included the use of unregulated cloud services that compromise data security, exposure of sensitive patient data, breaches of confidentiality, fraudulent use of information, vulnerabilities in data storage and communication, and inappropriate access or use of patient data. Conclusions: Future research related to LLMs should not only focus on testing their possibilities for NLP-related tasks but also consider the workflows the models could contribute to and the requirements regarding quality, integration, and regulations needed for successful implementation in practice. %M 38739445 %R 10.2196/52399 %U https://www.jmir.org/2024/1/e52399 %U https://doi.org/10.2196/52399 %U http://www.ncbi.nlm.nih.gov/pubmed/38739445 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e53787 %T The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review %A Preiksaitis,Carl %A Ashenburg,Nicholas %A Bunney,Gabrielle %A Chu,Andrew %A Kabeer,Rana %A Riley,Fran %A Ribeira,Ryan %A Rose,Christian %+ Department of Emergency Medicine, Stanford University School of Medicine, 900 Welch Road, Suite 350, Palo Alto, CA, 94304, United States, 1 650 723 6576, cpreiksaitis@stanford.edu %K large language model %K LLM %K emergency medicine %K clinical decision support %K workflow efficiency %K medical education %K artificial intelligence %K AI %K natural language processing %K NLP %K AI literacy %K ChatGPT %K Bard %K Pathways Language Model %K Med-PaLM %K Bidirectional Encoder Representations from Transformers %K BERT %K generative pretrained transformer %K GPT %K United States %K US %K China %K scoping review %K Preferred Reporting Items for Systematic Reviews and Meta-Analyses %K PRISMA %K decision support %K workflow efficiency %K risk %K ethics %K education %K communication %K medical training %K physician %K health literacy %K emergency care %D 2024 %7 10.5.2024 %9 Review %J JMIR Med Inform %G English %X Background: Artificial intelligence (AI), more specifically large language models (LLMs), holds significant potential in revolutionizing emergency care delivery by optimizing clinical workflows and enhancing the quality of decision-making. Although enthusiasm for integrating LLMs into emergency medicine (EM) is growing, the existing literature is characterized by a disparate collection of individual studies, conceptual analyses, and preliminary implementations. Given these complexities and gaps in understanding, a cohesive framework is needed to comprehend the existing body of knowledge on the application of LLMs in EM. Objective: Given the absence of a comprehensive framework for exploring the roles of LLMs in EM, this scoping review aims to systematically map the existing literature on LLMs’ potential applications within EM and identify directions for future research. Addressing this gap will allow for informed advancements in the field. Methods: Using PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) criteria, we searched Ovid MEDLINE, Embase, Web of Science, and Google Scholar for papers published between January 2018 and August 2023 that discussed LLMs’ use in EM. We excluded other forms of AI. A total of 1994 unique titles and abstracts were screened, and each full-text paper was independently reviewed by 2 authors. Data were abstracted independently, and 5 authors performed a collaborative quantitative and qualitative synthesis of the data. Results: A total of 43 papers were included. Studies were predominantly from 2022 to 2023 and conducted in the United States and China. We uncovered four major themes: (1) clinical decision-making and support was highlighted as a pivotal area, with LLMs playing a substantial role in enhancing patient care, notably through their application in real-time triage, allowing early recognition of patient urgency; (2) efficiency, workflow, and information management demonstrated the capacity of LLMs to significantly boost operational efficiency, particularly through the automation of patient record synthesis, which could reduce administrative burden and enhance patient-centric care; (3) risks, ethics, and transparency were identified as areas of concern, especially regarding the reliability of LLMs’ outputs, and specific studies highlighted the challenges of ensuring unbiased decision-making amidst potentially flawed training data sets, stressing the importance of thorough validation and ethical oversight; and (4) education and communication possibilities included LLMs’ capacity to enrich medical training, such as through using simulated patient interactions that enhance communication skills. Conclusions: LLMs have the potential to fundamentally transform EM, enhancing clinical decision-making, optimizing workflows, and improving patient outcomes. This review sets the stage for future advancements by identifying key research areas: prospective validation of LLM applications, establishing standards for responsible use, understanding provider and patient perceptions, and improving physicians’ AI literacy. Effective integration of LLMs into EM will require collaborative efforts and thorough evaluation to ensure these technologies can be safely and effectively applied. %M 38728687 %R 10.2196/53787 %U https://medinform.jmir.org/2024/1/e53787 %U https://doi.org/10.2196/53787 %U http://www.ncbi.nlm.nih.gov/pubmed/38728687 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e49848 %T Development and Validation of an Explainable Deep Learning Model to Predict In-Hospital Mortality for Patients With Acute Myocardial Infarction: Algorithm Development and Validation Study %A Xie,Puguang %A Wang,Hao %A Xiao,Jun %A Xu,Fan %A Liu,Jingyang %A Chen,Zihang %A Zhao,Weijie %A Hou,Siyu %A Wu,Dongdong %A Ma,Yu %A Xiao,Jingjing %+ Bio-Med Informatics Research Centre & Clinical Research Centre, Xinqiao Hospital, Army Medical University, No. 183 Xinqiao Street, Shapingba District, Chongqing, 400037, China, 86 18502299862, shine636363@sina.com %K acute myocardial infarction %K mortality %K deep learning %K explainable model %K prediction %D 2024 %7 10.5.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Acute myocardial infarction (AMI) is one of the most severe cardiovascular diseases and is associated with a high risk of in-hospital mortality. However, the current deep learning models for in-hospital mortality prediction lack interpretability. Objective: This study aims to establish an explainable deep learning model to provide individualized in-hospital mortality prediction and risk factor assessment for patients with AMI. Methods: In this retrospective multicenter study, we used data for consecutive patients hospitalized with AMI from the Chongqing University Central Hospital between July 2016 and December 2022 and the Electronic Intensive Care Unit Collaborative Research Database. These patients were randomly divided into training (7668/10,955, 70%) and internal test (3287/10,955, 30%) data sets. In addition, data of patients with AMI from the Medical Information Mart for Intensive Care database were used for external validation. Deep learning models were used to predict in-hospital mortality in patients with AMI, and they were compared with linear and tree-based models. The Shapley Additive Explanations method was used to explain the model with the highest area under the receiver operating characteristic curve in both the internal test and external validation data sets to quantify and visualize the features that drive predictions. Results: A total of 10,955 patients with AMI who were admitted to Chongqing University Central Hospital or included in the Electronic Intensive Care Unit Collaborative Research Database were randomly divided into a training data set of 7668 (70%) patients and an internal test data set of 3287 (30%) patients. A total of 9355 patients from the Medical Information Mart for Intensive Care database were included for independent external validation. In-hospital mortality occurred in 8.74% (670/7668), 8.73% (287/3287), and 9.12% (853/9355) of the patients in the training, internal test, and external validation cohorts, respectively. The Self-Attention and Intersample Attention Transformer model performed best in both the internal test data set and the external validation data set among the 9 prediction models, with the highest area under the receiver operating characteristic curve of 0.86 (95% CI 0.84-0.88) and 0.85 (95% CI 0.84-0.87), respectively. Older age, high heart rate, and low body temperature were the 3 most important predictors of increased mortality, according to the explanations of the Self-Attention and Intersample Attention Transformer model. Conclusions: The explainable deep learning model that we developed could provide estimates of mortality and visual contribution of the features to the prediction for a patient with AMI. The explanations suggested that older age, unstable vital signs, and metabolic disorders may increase the risk of mortality in patients with AMI. %M 38728685 %R 10.2196/49848 %U https://www.jmir.org/2024/1/e49848 %U https://doi.org/10.2196/49848 %U http://www.ncbi.nlm.nih.gov/pubmed/38728685 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e44805 %T Decision Support for Managing Common Musculoskeletal Pain Disorders: Development of a Case-Based Reasoning Application %A Granviken,Fredrik %A Vasseljen,Ottar %A Bach,Kerstin %A Jaiswal,Amar %A Meisingset,Ingebrigt %+ Department of Public Health and Nursing, Norwegian University of Science and Technology, Postboks 8905, Trondheim, 7491, Norway, 47 93059497, fredrik.granviken@ntnu.no %K case-based reasoning %K musculoskeletal pain %K physiotherapy %K decision support %K primary care %K artificial intelligence %D 2024 %7 10.5.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Common interventions for musculoskeletal pain disorders either lack evidence to support their use or have small to modest or short-term effects. Given the heterogeneity of patients with musculoskeletal pain disorders, treatment guidelines and systematic reviews have limited transferability to clinical practice. A problem-solving method in artificial intelligence, case-based reasoning (CBR), where new problems are solved based on experiences from past similar problems, might offer guidance in such situations. Objective: This study aims to use CBR to build a decision support system for patients with musculoskeletal pain disorders seeking physiotherapy care. This study describes the development of the CBR system SupportPrim PT and demonstrates its ability to identify similar patients. Methods: Data from physiotherapy patients in primary care in Norway were collected to build a case base for SupportPrim PT. We used the local-global principle in CBR to identify similar patients. The global similarity measures are attributes used to identify similar patients and consisted of prognostic attributes. They were weighted in terms of prognostic importance and choice of treatment, where the weighting represents the relevance of the different attributes. For the local similarity measures, the degree of similarity within each attribute was based on minimal clinically important differences and expert knowledge. The SupportPrim PT’s ability to identify similar patients was assessed by comparing the similarity scores of all patients in the case base with the scores on an established screening tool (the short form Örebro Musculoskeletal Pain Screening Questionnaire [ÖMSPQ]) and an outcome measure (the Musculoskeletal Health Questionnaire [MSK-HQ]) used in musculoskeletal pain. We also assessed the same in a more extensive case base. Results: The original case base contained 105 patients with musculoskeletal pain (mean age 46, SD 15 years; 77/105, 73.3% women). The SupportPrim PT consisted of 29 weighted attributes with local similarities. When comparing the similarity scores for all patients in the case base, one at a time, with the ÖMSPQ and MSK-HQ, the most similar patients had a mean absolute difference from the query patient of 9.3 (95% CI 8.0-10.6) points on the ÖMSPQ and a mean absolute difference of 5.6 (95% CI 4.6-6.6) points on the MSK-HQ. For both ÖMSPQ and MSK-HQ, the absolute score difference increased as the rank of most similar patients decreased. Patients retrieved from a more extensive case base (N=486) had a higher mean similarity score and were slightly more similar to the query patients in ÖMSPQ and MSK-HQ compared with the original smaller case base. Conclusions: This study describes the development of a CBR system, SupportPrim PT, for musculoskeletal pain in primary care. The SupportPrim PT identified similar patients according to an established screening tool and an outcome measure for patients with musculoskeletal pain. %M 38728686 %R 10.2196/44805 %U https://formative.jmir.org/2024/1/e44805 %U https://doi.org/10.2196/44805 %U http://www.ncbi.nlm.nih.gov/pubmed/38728686 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e52171 %T A Comparison of Personalized and Generalized Approaches to Emotion Recognition Using Consumer Wearable Devices: Machine Learning Study %A Li,Joe %A Washington,Peter %+ Information and Computer Sciences, University of Hawai`i at Mānoa, 1680 East-West Road, Room 312, Honolulu, HI, 96822, United States, 1 000000000, pyw@hawaii.edu %K affect detection %K affective computing %K deep learning %K digital health %K emotion recognition %K machine learning %K mental health %K personalization %K stress detection %K wearable technology %D 2024 %7 10.5.2024 %9 Original Paper %J JMIR AI %G English %X Background: There are a wide range of potential adverse health effects, ranging from headaches to cardiovascular disease, associated with long-term negative emotions and chronic stress. Because many indicators of stress are imperceptible to observers, the early detection of stress remains a pressing medical need, as it can enable early intervention. Physiological signals offer a noninvasive method for monitoring affective states and are recorded by a growing number of commercially available wearables. Objective: We aim to study the differences between personalized and generalized machine learning models for 3-class emotion classification (neutral, stress, and amusement) using wearable biosignal data. Methods: We developed a neural network for the 3-class emotion classification problem using data from the Wearable Stress and Affect Detection (WESAD) data set, a multimodal data set with physiological signals from 15 participants. We compared the results between a participant-exclusive generalized, a participant-inclusive generalized, and a personalized deep learning model. Results: For the 3-class classification problem, our personalized model achieved an average accuracy of 95.06% and an F1-score of 91.71%; our participant-inclusive generalized model achieved an average accuracy of 66.95% and an F1-score of 42.50%; and our participant-exclusive generalized model achieved an average accuracy of 67.65% and an F1-score of 43.05%. Conclusions: Our results emphasize the need for increased research in personalized emotion recognition models given that they outperform generalized models in certain contexts. We also demonstrate that personalized machine learning models for emotion classification are viable and can achieve high performance. %M 38875573 %R 10.2196/52171 %U https://ai.jmir.org/2024/1/e52171 %U https://doi.org/10.2196/52171 %U http://www.ncbi.nlm.nih.gov/pubmed/38875573 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e51346 %T ChatGPT as a Tool for Medical Education and Clinical Decision-Making on the Wards: Case Study %A Skryd,Anthony %A Lawrence,Katharine %+ Department of Medicine, NYU Langone Health, 550 1st Avenue, New York City, NY, 10016, United States, 1 646 929 7800, anthony.skryd@nyulangone.org %K ChatGPT %K medical education %K large language models %K LLMs %K clinical decision-making %D 2024 %7 8.5.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Large language models (LLMs) are computational artificial intelligence systems with advanced natural language processing capabilities that have recently been popularized among health care students and educators due to their ability to provide real-time access to a vast amount of medical knowledge. The adoption of LLM technology into medical education and training has varied, and little empirical evidence exists to support its use in clinical teaching environments. Objective: The aim of the study is to identify and qualitatively evaluate potential use cases and limitations of LLM technology for real-time ward-based educational contexts. Methods: A brief, single-site exploratory evaluation of the publicly available ChatGPT-3.5 (OpenAI) was conducted by implementing the tool into the daily attending rounds of a general internal medicine inpatient service at a large urban academic medical center. ChatGPT was integrated into rounds via both structured and organic use, using the web-based “chatbot” style interface to interact with the LLM through conversational free-text and discrete queries. A qualitative approach using phenomenological inquiry was used to identify key insights related to the use of ChatGPT through analysis of ChatGPT conversation logs and associated shorthand notes from the clinical sessions. Results: Identified use cases for ChatGPT integration included addressing medical knowledge gaps through discrete medical knowledge inquiries, building differential diagnoses and engaging dual-process thinking, challenging medical axioms, using cognitive aids to support acute care decision-making, and improving complex care management by facilitating conversations with subspecialties. Potential additional uses included engaging in difficult conversations with patients, exploring ethical challenges and general medical ethics teaching, personal continuing medical education resources, developing ward-based teaching tools, supporting and automating clinical documentation, and supporting productivity and task management. LLM biases, misinformation, ethics, and health equity were identified as areas of concern and potential limitations to clinical and training use. A code of conduct on ethical and appropriate use was also developed to guide team usage on the wards. Conclusions: Overall, ChatGPT offers a novel tool to enhance ward-based learning through rapid information querying, second-order content exploration, and engaged team discussion regarding generated responses. More research is needed to fully understand contexts for educational use, particularly regarding the risks and limitations of the tool in clinical settings and its impacts on trainee development. %M 38717811 %R 10.2196/51346 %U https://formative.jmir.org/2024/1/e51346 %U https://doi.org/10.2196/51346 %U http://www.ncbi.nlm.nih.gov/pubmed/38717811 %0 Journal Article %@ 2563-3570 %I JMIR Publications %V 5 %N %P e52700 %T ChatGPT and Medicine: Together We Embrace the AI Renaissance %A Hacking,Sean %+ NYU Langone, Tisch Hospital, 560 First Avenue, Suite TH 461, New York, NY, 10016, United States, 1 6466836133, hackingsean1@gmail.com %K ChatGPT %K generative AI %K NLP %K medicine %K bioinformatics %K AI democratization %K AI renaissance %K artificial intelligence %K natural language processing %D 2024 %7 7.5.2024 %9 Editorial %J JMIR Bioinform Biotech %G English %X The generative artificial intelligence (AI) model ChatGPT holds transformative prospects in medicine. The development of such models has signaled the beginning of a new era where complex biological data can be made more accessible and interpretable. ChatGPT is a natural language processing tool that can process, interpret, and summarize vast data sets. It can serve as a digital assistant for physicians and researchers, aiding in integrating medical imaging data with other multiomics data and facilitating the understanding of complex biological systems. The physician’s and AI’s viewpoints emphasize the value of such AI models in medicine, providing tangible examples of how this could enhance patient care. The editorial also discusses the rise of generative AI, highlighting its substantial impact in democratizing AI applications for modern medicine. While AI may not supersede health care professionals, practitioners incorporating AI into their practices could potentially have a competitive edge. %M 38935938 %R 10.2196/52700 %U https://bioinform.jmir.org/2024/1/e52700 %U https://doi.org/10.2196/52700 %U http://www.ncbi.nlm.nih.gov/pubmed/38935938 %0 Journal Article %@ 2561-7605 %I %V 7 %N %P e53019 %T Assessing the Quality of ChatGPT Responses to Dementia Caregivers’ Questions: Qualitative Analysis %A Aguirre,Alyssa %A Hilsabeck,Robin %A Smith,Tawny %A Xie,Bo %A He,Daqing %A Wang,Zhendong %A Zou,Ning %K Alzheimer’s disease %K information technology %K social media %K neurology %K dementia %K Alzheimer disease %K caregiver %K ChatGPT %D 2024 %7 6.5.2024 %9 %J JMIR Aging %G English %X Background: Artificial intelligence (AI) such as ChatGPT by OpenAI holds great promise to improve the quality of life of patients with dementia and their caregivers by providing high-quality responses to their questions about typical dementia behaviors. So far, however, evidence on the quality of such ChatGPT responses is limited. A few recent publications have investigated the quality of ChatGPT responses in other health conditions. Our study is the first to assess ChatGPT using real-world questions asked by dementia caregivers themselves. Objectives: This pilot study examines the potential of ChatGPT-3.5 to provide high-quality information that may enhance dementia care and patient-caregiver education. Methods: Our interprofessional team used a formal rating scale (scoring range: 0-5; the higher the score, the better the quality) to evaluate ChatGPT responses to real-world questions posed by dementia caregivers. We selected 60 posts by dementia caregivers from Reddit, a popular social media platform. These posts were verified by 3 interdisciplinary dementia clinicians as representing dementia caregivers’ desire for information in the areas of memory loss and confusion, aggression, and driving. Word count for posts in the memory loss and confusion category ranged from 71 to 531 (mean 218; median 188), aggression posts ranged from 58 to 602 words (mean 254; median 200), and driving posts ranged from 93 to 550 words (mean 272; median 276). Results: ChatGPT’s response quality scores ranged from 3 to 5. Of the 60 responses, 26 (43%) received 5 points, 21 (35%) received 4 points, and 13 (22%) received 3 points, suggesting high quality. ChatGPT obtained consistently high scores in synthesizing information to provide follow-up recommendations (n=58, 96%), with the lowest scores in the area of comprehensiveness (n=38, 63%). Conclusions: ChatGPT provided high-quality responses to complex questions posted by dementia caregivers, but it did have limitations. ChatGPT was unable to anticipate future problems that a human professional might recognize and address in a clinical encounter. At other times, ChatGPT recommended a strategy that the caregiver had already explicitly tried. This pilot study indicates the potential of AI to provide high-quality information to enhance dementia care and patient-caregiver education in tandem with information provided by licensed health care professionals. Evaluating the quality of responses is necessary to ensure that caregivers can make informed decisions. ChatGPT has the potential to transform health care practice by shaping how caregivers receive health information. %R 10.2196/53019 %U https://aging.jmir.org/2024/1/e53019 %U https://doi.org/10.2196/53019 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 12 %N %P e57978 %T The Evaluation of Generative AI Should Include Repetition to Assess Stability %A Zhu,Lingxuan %A Mou,Weiming %A Hong,Chenglin %A Yang,Tao %A Lai,Yancheng %A Qi,Chang %A Lin,Anqi %A Zhang,Jian %A Luo,Peng %+ Department of Oncology, Zhujiang Hospital, Southern Medical University, 253 Industrial Avenue, Guangzhou, China, 86 020 61643888, luopeng@smu.edu.cn %K large language model %K generative AI %K ChatGPT %K artificial intelligence %K health care %D 2024 %7 6.5.2024 %9 Commentary %J JMIR Mhealth Uhealth %G English %X The increasing interest in the potential applications of generative artificial intelligence (AI) models like ChatGPT in health care has prompted numerous studies to explore its performance in various medical contexts. However, evaluating ChatGPT poses unique challenges due to the inherent randomness in its responses. Unlike traditional AI models, ChatGPT generates different responses for the same input, making it imperative to assess its stability through repetition. This commentary highlights the importance of including repetition in the evaluation of ChatGPT to ensure the reliability of conclusions drawn from its performance. Similar to biological experiments, which often require multiple repetitions for validity, we argue that assessing generative AI models like ChatGPT demands a similar approach. Failure to acknowledge the impact of repetition can lead to biased conclusions and undermine the credibility of research findings. We urge researchers to incorporate appropriate repetition in their studies from the outset and transparently report their methods to enhance the robustness and reproducibility of findings in this rapidly evolving field. %M 38688841 %R 10.2196/57978 %U https://mhealth.jmir.org/2024/1/e57978 %U https://doi.org/10.2196/57978 %U http://www.ncbi.nlm.nih.gov/pubmed/38688841 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e52691 %T Exploring the Association Between Structural Racism and Mental Health: Geospatial and Machine Learning Analysis %A Mohebbi,Fahimeh %A Forati,Amir Masoud %A Torres,Lucas %A deRoon-Cassini,Terri A %A Harris,Jennifer %A Tomas,Carissa W %A Mantsch,John R %A Ghose,Rina %+ Department of Pharmacology & Toxicology, Medical College of Wisconsin, 8701 Watertown Plank Rd, Milwaukee, WI, 53226, United States, 1 4149558861, jomantsch@mcw.edu %K machine learning %K geospatial %K racial disparities %K social determinant of health %K structural racism %K mental health %K health disparities %K deep learning %D 2024 %7 3.5.2024 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Structural racism produces mental health disparities. While studies have examined the impact of individual factors such as poverty and education, the collective contribution of these elements, as manifestations of structural racism, has been less explored. Milwaukee County, Wisconsin, with its racial and socioeconomic diversity, provides a unique context for this multifactorial investigation. Objective: This research aimed to delineate the association between structural racism and mental health disparities in Milwaukee County, using a combination of geospatial and deep learning techniques. We used secondary data sets where all data were aggregated and anonymized before being released by federal agencies. Methods: We compiled 217 georeferenced explanatory variables across domains, initially deliberately excluding race-based factors to focus on nonracial determinants. This approach was designed to reveal the underlying patterns of risk factors contributing to poor mental health, subsequently reintegrating race to assess the effects of racism quantitatively. The variable selection combined tree-based methods (random forest) and conventional techniques, supported by variance inflation factor and Pearson correlation analysis for multicollinearity mitigation. The geographically weighted random forest model was used to investigate spatial heterogeneity and dependence. Self-organizing maps, combined with K-means clustering, were used to analyze data from Milwaukee communities, focusing on quantifying the impact of structural racism on the prevalence of poor mental health. Results: While 12 influential factors collectively accounted for 95.11% of the variability in mental health across communities, the top 6 factors—smoking, poverty, insufficient sleep, lack of health insurance, employment, and age—were particularly impactful. Predominantly, African American neighborhoods were disproportionately affected, which is 2.23 times more likely to encounter high-risk clusters for poor mental health. Conclusions: The findings demonstrate that structural racism shapes mental health disparities, with Black community members disproportionately impacted. The multifaceted methodological approach underscores the value of integrating geospatial analysis and deep learning to understand complex social determinants of mental health. These insights highlight the need for targeted interventions, addressing both individual and systemic factors to mitigate mental health disparities rooted in structural racism. %M 38701436 %R 10.2196/52691 %U https://publichealth.jmir.org/2024/1/e52691 %U https://doi.org/10.2196/52691 %U http://www.ncbi.nlm.nih.gov/pubmed/38701436 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e48572 %T Mining Real-World Big Data to Characterize Adverse Drug Reaction Quantitatively: Mixed Methods Study %A Yue,Qi-Xuan %A Ding,Ruo-Fan %A Chen,Wei-Hao %A Wu,Lv-Ying %A Liu,Ke %A Ji,Zhi-Liang %+ State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Number 4221 Xiang'an South Road, Xiang'an District, Xiamen, 361102, China, 86 0592 2182897, appo@xmu.edu.cn %K clinical drug toxicity %K adverse drug reaction %K ADR severity %K ADR frequency %K mathematical model %D 2024 %7 3.5.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Adverse drug reactions (ADRs), which are the phenotypic manifestations of clinical drug toxicity in humans, are a major concern in precision clinical medicine. A comprehensive evaluation of ADRs is helpful for unbiased supervision of marketed drugs and for discovering new drugs with high success rates. Objective: In current practice, drug safety evaluation is often oversimplified to the occurrence or nonoccurrence of ADRs. Given the limitations of current qualitative methods, there is an urgent need for a quantitative evaluation model to improve pharmacovigilance and the accurate assessment of drug safety. Methods: In this study, we developed a mathematical model, namely the Adverse Drug Reaction Classification System (ADReCS) severity-grading model, for the quantitative characterization of ADR severity, a crucial feature for evaluating the impact of ADRs on human health. The model was constructed by mining millions of real-world historical adverse drug event reports. A new parameter called Severity_score was introduced to measure the severity of ADRs, and upper and lower score boundaries were determined for 5 severity grades. Results: The ADReCS severity-grading model exhibited excellent consistency (99.22%) with the expert-grading system, the Common Terminology Criteria for Adverse Events. Hence, we graded the severity of 6277 standard ADRs for 129,407 drug-ADR pairs. Moreover, we calculated the occurrence rates of 6272 distinct ADRs for 127,763 drug-ADR pairs in large patient populations by mining real-world medication prescriptions. With the quantitative features, we demonstrated example applications in systematically elucidating ADR mechanisms and thereby discovered a list of drugs with improper dosages. Conclusions: In summary, this study represents the first comprehensive determination of both ADR severity grades and ADR frequencies. This endeavor establishes a strong foundation for future artificial intelligence applications in discovering new drugs with high efficacy and low toxicity. It also heralds a paradigm shift in clinical toxicity research, moving from qualitative description to quantitative evaluation. %M 38700923 %R 10.2196/48572 %U https://www.jmir.org/2024/1/e48572 %U https://doi.org/10.2196/48572 %U http://www.ncbi.nlm.nih.gov/pubmed/38700923 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52508 %T Consolidated Reporting Guidelines for Prognostic and Diagnostic Machine Learning Models (CREMLS) %A El Emam,Khaled %A Leung,Tiffany I %A Malin,Bradley %A Klement,William %A Eysenbach,Gunther %+ School of Epidemiology and Public Health, University of Ottawa, 401 Smyth Road, Ottawa, ON, K1H 8L1, Canada, 1 6137377600, kelemam@ehealthinformation.ca %K reporting guidelines %K machine learning %K predictive models %K diagnostic models %K prognostic models %K artificial intelligence %K editorial policy %D 2024 %7 2.5.2024 %9 Editorial %J J Med Internet Res %G English %X The number of papers presenting machine learning (ML) models that are being submitted to and published in the Journal of Medical Internet Research and other JMIR Publications journals has steadily increased. Editors and peer reviewers involved in the review process for such manuscripts often go through multiple review cycles to enhance the quality and completeness of reporting. The use of reporting guidelines or checklists can help ensure consistency in the quality of submitted (and published) scientific manuscripts and, for example, avoid instances of missing information. In this Editorial, the editors of JMIR Publications journals discuss the general JMIR Publications policy regarding authors’ application of reporting guidelines and specifically focus on the reporting of ML studies in JMIR Publications journals, using the Consolidated Reporting of Machine Learning Studies (CREMLS) guidelines, with an example of how authors and other journals could use the CREMLS checklist to ensure transparency and rigor in reporting. %M 38696776 %R 10.2196/52508 %U https://www.jmir.org/2024/1/e52508 %U https://doi.org/10.2196/52508 %U http://www.ncbi.nlm.nih.gov/pubmed/38696776 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54363 %T Improving the Prognostic Evaluation Precision of Hospital Outcomes for Heart Failure Using Admission Notes and Clinical Tabular Data: Multimodal Deep Learning Model %A Gao,Zhenyue %A Liu,Xiaoli %A Kang,Yu %A Hu,Pan %A Zhang,Xiu %A Yan,Wei %A Yan,Muyang %A Yu,Pengming %A Zhang,Qing %A Xiao,Wendong %A Zhang,Zhengbo %+ Center for Artificial Intelligence in Medicine, The General Hospital of People's Liberation Army, 28 Fuxing Road, Beijing, 100853, China, 86 010 68295454, zhangzhengbo@301hospital.com.cn %K heart failure %K multimodal deep learning %K mortality prediction %K admission notes %K clinical tabular data %K tabular %K notes %K deep learning %K machine learning %K cardiology %K heart %K cardiac %K documentation %K prognostic %K prognosis %K prognoses %K predict %K prediction %K predictions %K predictive %D 2024 %7 2.5.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Clinical notes contain contextualized information beyond structured data related to patients’ past and current health status. Objective: This study aimed to design a multimodal deep learning approach to improve the evaluation precision of hospital outcomes for heart failure (HF) using admission clinical notes and easily collected tabular data. Methods: Data for the development and validation of the multimodal model were retrospectively derived from 3 open-access US databases, including the Medical Information Mart for Intensive Care III v1.4 (MIMIC-III) and MIMIC-IV v1.0, collected from a teaching hospital from 2001 to 2019, and the eICU Collaborative Research Database v1.2, collected from 208 hospitals from 2014 to 2015. The study cohorts consisted of all patients with critical HF. The clinical notes, including chief complaint, history of present illness, physical examination, medical history, and admission medication, as well as clinical variables recorded in electronic health records, were analyzed. We developed a deep learning mortality prediction model for in-hospital patients, which underwent complete internal, prospective, and external evaluation. The Integrated Gradients and SHapley Additive exPlanations (SHAP) methods were used to analyze the importance of risk factors. Results: The study included 9989 (16.4%) patients in the development set, 2497 (14.1%) patients in the internal validation set, 1896 (18.3%) in the prospective validation set, and 7432 (15%) patients in the external validation set. The area under the receiver operating characteristic curve of the models was 0.838 (95% CI 0.827-0.851), 0.849 (95% CI 0.841-0.856), and 0.767 (95% CI 0.762-0.772), for the internal, prospective, and external validation sets, respectively. The area under the receiver operating characteristic curve of the multimodal model outperformed that of the unimodal models in all test sets, and tabular data contributed to higher discrimination. The medical history and physical examination were more useful than other factors in early assessments. Conclusions: The multimodal deep learning model for combining admission notes and clinical tabular data showed promising efficacy as a potentially novel method in evaluating the risk of mortality in patients with HF, providing more accurate and timely decision support. %M 38696251 %R 10.2196/54363 %U https://www.jmir.org/2024/1/e54363 %U https://doi.org/10.2196/54363 %U http://www.ncbi.nlm.nih.gov/pubmed/38696251 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52499 %T Using Large Language Models to Support Content Analysis: A Case Study of ChatGPT for Adverse Event Detection %A Leas,Eric C %A Ayers,John W %A Desai,Nimit %A Dredze,Mark %A Hogarth,Michael %A Smith,Davey M %+ Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, 9500 Gilman Drive, Mail Code: 0725, La Jolla, CA, 92093, United States, 1 951 346 9131, ecleas@ucsd.edu %K adverse events %K artificial intelligence %K AI %K text analysis %K annotation %K ChatGPT %K LLM %K large language model %K cannabis %K delta-8-THC %K delta-8-tetrahydrocannabiol %D 2024 %7 2.5.2024 %9 Research Letter %J J Med Internet Res %G English %X This study explores the potential of using large language models to assist content analysis by conducting a case study to identify adverse events (AEs) in social media posts. The case study compares ChatGPT’s performance with human annotators’ in detecting AEs associated with delta-8-tetrahydrocannabinol, a cannabis-derived product. Using the identical instructions given to human annotators, ChatGPT closely approximated human results, with a high degree of agreement noted: 94.4% (9436/10,000) for any AE detection (Fleiss κ=0.95) and 99.3% (9931/10,000) for serious AEs (κ=0.96). These findings suggest that ChatGPT has the potential to replicate human annotation accurately and efficiently. The study recognizes possible limitations, including concerns about the generalizability due to ChatGPT’s training data, and prompts further research with different models, data sources, and content analysis tasks. The study highlights the promise of large language models for enhancing the efficiency of biomedical research. %M 38696245 %R 10.2196/52499 %U https://www.jmir.org/2024/1/e52499 %U https://doi.org/10.2196/52499 %U http://www.ncbi.nlm.nih.gov/pubmed/38696245 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51354 %T Machine Learning for Predicting Risk and Prognosis of Acute Kidney Disease in Critically Ill Elderly Patients During Hospitalization: Internet-Based and Interpretable Model Study %A Li,Mingxia %A Han,Shuzhe %A Liang,Fang %A Hu,Chenghuan %A Zhang,Buyao %A Hou,Qinlan %A Zhao,Shuangping %+ Department of Critical Care Medicine, Xiangya Hospital of Central South University, No 87, Xiangya Road, Kaifu District, Changsha, 410008, China, 86 1397495302, zhshping@csu.edu.cn %K acute kidney disease %K AKD %K machine learning %K critically ill patients %K elderly patients %K Shapley additive explanation %K SHAP %D 2024 %7 1.5.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Acute kidney disease (AKD) affects more than half of critically ill elderly patients with acute kidney injury (AKI), which leads to worse short-term outcomes. Objective: We aimed to establish 2 machine learning models to predict the risk and prognosis of AKD in the elderly and to deploy the models as online apps. Methods: Data on elderly patients with AKI (n=3542) and AKD (n=2661) from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database were used to develop 2 models for predicting the AKD risk and in-hospital mortality, respectively. Data collected from Xiangya Hospital of Central South University were for external validation. A bootstrap method was used for internal validation to obtain relatively stable results. We extracted the indicators within 24 hours of the first diagnosis of AKI and the fluctuation range of some indicators, namely delta (day 3 after AKI minus day 1), as features. Six machine learning algorithms were used for modeling; the area under the receiver operating characteristic curve (AUROC), decision curve analysis, and calibration curve for evaluating; Shapley additive explanation (SHAP) analysis for visually interpreting; and the Heroku platform for deploying the best-performing models as web-based apps. Results: For the model of predicting the risk of AKD in elderly patients with AKI during hospitalization, the Light Gradient Boosting Machine (LightGBM) showed the best overall performance in the training (AUROC=0.844, 95% CI 0.831-0.857), internal validation (AUROC=0.853, 95% CI 0.841-0.865), and external (AUROC=0.755, 95% CI 0.699–0.811) cohorts. In addition, LightGBM performed well for the AKD prognostic prediction in the training (AUROC=0.861, 95% CI 0.843-0.878), internal validation (AUROC=0.868, 95% CI 0.851-0.885), and external (AUROC=0.746, 95% CI 0.673-0.820) cohorts. The models deployed as online prediction apps allowed users to predict and provide feedback to submit new data for model iteration. In the importance ranking and correlation visualization of the model’s top 10 influencing factors conducted based on the SHAP value, partial dependence plots revealed the optimal cutoff of some interventionable indicators. The top 5 factors predicting the risk of AKD were creatinine on day 3, sepsis, delta blood urea nitrogen (BUN), diastolic blood pressure (DBP), and heart rate, while the top 5 factors determining in-hospital mortality were age, BUN on day 1, vasopressor use, BUN on day 3, and partial pressure of carbon dioxide (PaCO2). Conclusions: We developed and validated 2 online apps for predicting the risk of AKD and its prognostic mortality in elderly patients, respectively. The top 10 factors that influenced the AKD risk and mortality during hospitalization were identified and explained visually, which might provide useful applications for intelligent management and suggestions for future prospective research. %M 38691403 %R 10.2196/51354 %U https://www.jmir.org/2024/1/e51354 %U https://doi.org/10.2196/51354 %U http://www.ncbi.nlm.nih.gov/pubmed/38691403 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54948 %T Integrating Text and Image Analysis: Exploring GPT-4V’s Capabilities in Advanced Radiological Applications Across Subspecialties %A Busch,Felix %A Han,Tianyu %A Makowski,Marcus R %A Truhn,Daniel %A Bressem,Keno K %A Adams,Lisa %+ Department of Neuroradiology, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Charitépl. 1, Berlin, 10117, Germany, 49 3045050, felix.busch@charite.de %K GPT-4 %K ChatGPT %K Generative Pre-Trained Transformer %K multimodal large language models %K artificial intelligence %K AI applications in medicine %K diagnostic radiology %K clinical decision support systems %K generative AI %K medical image analysis %D 2024 %7 1.5.2024 %9 Research Letter %J J Med Internet Res %G English %X This study demonstrates that GPT-4V outperforms GPT-4 across radiology subspecialties in analyzing 207 cases with 1312 images from the Radiological Society of North America Case Collection. %M 38691404 %R 10.2196/54948 %U https://www.jmir.org/2024/1/e54948 %U https://doi.org/10.2196/54948 %U http://www.ncbi.nlm.nih.gov/pubmed/38691404 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54934 %T The Applications of Artificial Intelligence for Assessing Fall Risk: Systematic Review %A González-Castro,Ana %A Leirós-Rodríguez,Raquel %A Prada-García,Camino %A Benítez-Andrades,José Alberto %+ Nursing and Physical Therapy Department, Universidad de León, Astorga Ave, Ponferrada, 24401, Spain, 34 987442000, agonzc28@estudiantes.unileon.es %K machine learning %K accidental falls %K public health %K patient care %K artificial intelligence %K AI %K fall risk %D 2024 %7 29.4.2024 %9 Review %J J Med Internet Res %G English %X Background: Falls and their consequences are a serious public health problem worldwide. Each year, 37.3 million falls requiring medical attention occur. Therefore, the analysis of fall risk is of great importance for prevention. Artificial intelligence (AI) represents an innovative tool for creating predictive statistical models of fall risk through data analysis. Objective: The aim of this review was to analyze the available evidence on the applications of AI in the analysis of data related to postural control and fall risk. Methods: A literature search was conducted in 6 databases with the following inclusion criteria: the articles had to be published within the last 5 years (from 2018 to 2024), they had to apply some method of AI, AI analyses had to be applied to data from samples consisting of humans, and the analyzed sample had to consist of individuals with independent walking with or without the assistance of external orthopedic devices. Results: We obtained a total of 3858 articles, of which 22 were finally selected. Data extraction for subsequent analysis varied in the different studies: 82% (18/22) of them extracted data through tests or functional assessments, and the remaining 18% (4/22) of them extracted through existing medical records. Different AI techniques were used throughout the articles. All the research included in the review obtained accuracy values of >70% in the predictive models obtained through AI. Conclusions: The use of AI proves to be a valuable tool for creating predictive models of fall risk. The use of this tool could have a significant socioeconomic impact as it enables the development of low-cost predictive models with a high level of accuracy. Trial Registration: PROSPERO CRD42023443277; https://tinyurl.com/4sb72ssv %M 38684088 %R 10.2196/54934 %U https://www.jmir.org/2024/1/e54934 %U https://doi.org/10.2196/54934 %U http://www.ncbi.nlm.nih.gov/pubmed/38684088 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 7 %N %P e50537 %T Automatic Spontaneous Speech Analysis for the Detection of Cognitive Functional Decline in Older Adults: Multilanguage Cross-Sectional Study %A Ambrosini,Emilia %A Giangregorio,Chiara %A Lomurno,Eugenio %A Moccia,Sara %A Milis,Marios %A Loizou,Christos %A Azzolino,Domenico %A Cesari,Matteo %A Cid Gala,Manuel %A Galán de Isla,Carmen %A Gomez-Raja,Jonathan %A Borghese,Nunzio Alberto %A Matteucci,Matteo %A Ferrante,Simona %+ Department of Electronics, Information and Bioengineering, Politecnico di Milano, Piazza Leonardo da Vinci 32, Milano, 20133, Italy, 39 0223999509, emilia.ambrosini@polimi.it %K cognitive decline %K speech processing %K machine learning %K multilanguage %K Mini-Mental Status Examination %D 2024 %7 29.4.2024 %9 Original Paper %J JMIR Aging %G English %X Background: The rise in life expectancy is associated with an increase in long-term and gradual cognitive decline. Treatment effectiveness is enhanced at the early stage of the disease. Therefore, there is a need to find low-cost and ecological solutions for mass screening of community-dwelling older adults. Objective: This work aims to exploit automatic analysis of free speech to identify signs of cognitive function decline. Methods: A sample of 266 participants older than 65 years were recruited in Italy and Spain and were divided into 3 groups according to their Mini-Mental Status Examination (MMSE) scores. People were asked to tell a story and describe a picture, and voice recordings were used to extract high-level features on different time scales automatically. Based on these features, machine learning algorithms were trained to solve binary and multiclass classification problems by using both mono- and cross-lingual approaches. The algorithms were enriched using Shapley Additive Explanations for model explainability. Results: In the Italian data set, healthy participants (MMSE score≥27) were automatically discriminated from participants with mildly impaired cognitive function (20≤MMSE score≤26) and from those with moderate to severe impairment of cognitive function (11≤MMSE score≤19) with accuracy of 80% and 86%, respectively. Slightly lower performance was achieved in the Spanish and multilanguage data sets. Conclusions: This work proposes a transparent and unobtrusive assessment method, which might be included in a mobile app for large-scale monitoring of cognitive functionality in older adults. Voice is confirmed to be an important biomarker of cognitive decline due to its noninvasive and easily accessible nature. %M 38386279 %R 10.2196/50537 %U https://aging.jmir.org/2024/1/e50537 %U https://doi.org/10.2196/50537 %U http://www.ncbi.nlm.nih.gov/pubmed/38386279 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e55847 %T Leveraging Large Language Models for Improved Patient Access and Self-Management: Assessor-Blinded Comparison Between Expert- and AI-Generated Content %A Lv,Xiaolei %A Zhang,Xiaomeng %A Li,Yuan %A Ding,Xinxin %A Lai,Hongchang %A Shi,Junyu %+ Department of Oral and Maxillofacial Implantology, Shanghai PerioImplant Innovation Center, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Quxi Road No 500, Shanghai, 200011, China, 86 21 23271699 ext 5298, sakyamuni_jin@163.com %K large language model %K artificial intelligence %K public oral health %K health care access %K patient education %D 2024 %7 25.4.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: While large language models (LLMs) such as ChatGPT and Google Bard have shown significant promise in various fields, their broader impact on enhancing patient health care access and quality, particularly in specialized domains such as oral health, requires comprehensive evaluation. Objective: This study aims to assess the effectiveness of Google Bard, ChatGPT-3.5, and ChatGPT-4 in offering recommendations for common oral health issues, benchmarked against responses from human dental experts. Methods: This comparative analysis used 40 questions derived from patient surveys on prevalent oral diseases, which were executed in a simulated clinical environment. Responses, obtained from both human experts and LLMs, were subject to a blinded evaluation process by experienced dentists and lay users, focusing on readability, appropriateness, harmlessness, comprehensiveness, intent capture, and helpfulness. Additionally, the stability of artificial intelligence responses was also assessed by submitting each question 3 times under consistent conditions. Results: Google Bard excelled in readability but lagged in appropriateness when compared to human experts (mean 8.51, SD 0.37 vs mean 9.60, SD 0.33; P=.03). ChatGPT-3.5 and ChatGPT-4, however, performed comparably with human experts in terms of appropriateness (mean 8.96, SD 0.35 and mean 9.34, SD 0.47, respectively), with ChatGPT-4 demonstrating the highest stability and reliability. Furthermore, all 3 LLMs received superior harmlessness scores comparable to human experts, with lay users finding minimal differences in helpfulness and intent capture between the artificial intelligence models and human responses. Conclusions: LLMs, particularly ChatGPT-4, show potential in oral health care, providing patient-centric information for enhancing patient education and clinical care. The observed performance variations underscore the need for ongoing refinement and ethical considerations in health care settings. Future research focuses on developing strategies for the safe integration of LLMs in health care settings. %M 38663010 %R 10.2196/55847 %U https://www.jmir.org/2024/1/e55847 %U https://doi.org/10.2196/55847 %U http://www.ncbi.nlm.nih.gov/pubmed/38663010 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e56764 %T Large Language Models and User Trust: Consequence of Self-Referential Learning Loop and the Deskilling of Health Care Professionals %A Choudhury,Avishek %A Chaudhry,Zaira %+ Industrial and Management Systems Engineering, West Virginia University, 321 Engineering Sciences Bdlg, 1306 Evansdale Drive, Morgantown, WV, 26506, United States, 1 5156080777, avishek.choudhury@mail.wvu.edu %K trust %K ChatGPT %K human factors %K healthcare %K LLMs %K large language models %K LLM user trust %K AI accountability %K artificial intelligence %K AI technology %K technologies %K effectiveness %K policy %K medical student %K medical students %K risk factor %K quality of care %K healthcare professional %K healthcare professionals %K human element %D 2024 %7 25.4.2024 %9 Viewpoint %J J Med Internet Res %G English %X As the health care industry increasingly embraces large language models (LLMs), understanding the consequence of this integration becomes crucial for maximizing benefits while mitigating potential pitfalls. This paper explores the evolving relationship among clinician trust in LLMs, the transition of data sources from predominantly human-generated to artificial intelligence (AI)–generated content, and the subsequent impact on the performance of LLMs and clinician competence. One of the primary concerns identified in this paper is the LLMs’ self-referential learning loops, where AI-generated content feeds into the learning algorithms, threatening the diversity of the data pool, potentially entrenching biases, and reducing the efficacy of LLMs. While theoretical at this stage, this feedback loop poses a significant challenge as the integration of LLMs in health care deepens, emphasizing the need for proactive dialogue and strategic measures to ensure the safe and effective use of LLM technology. Another key takeaway from our investigation is the role of user expertise and the necessity for a discerning approach to trusting and validating LLM outputs. The paper highlights how expert users, particularly clinicians, can leverage LLMs to enhance productivity by off-loading routine tasks while maintaining a critical oversight to identify and correct potential inaccuracies in AI-generated content. This balance of trust and skepticism is vital for ensuring that LLMs augment rather than undermine the quality of patient care. We also discuss the risks associated with the deskilling of health care professionals. Frequent reliance on LLMs for critical tasks could result in a decline in health care providers’ diagnostic and thinking skills, particularly affecting the training and development of future professionals. The legal and ethical considerations surrounding the deployment of LLMs in health care are also examined. We discuss the medicolegal challenges, including liability in cases of erroneous diagnoses or treatment advice generated by LLMs. The paper references recent legislative efforts, such as The Algorithmic Accountability Act of 2023, as crucial steps toward establishing a framework for the ethical and responsible use of AI-based technologies in health care. In conclusion, this paper advocates for a strategic approach to integrating LLMs into health care. By emphasizing the importance of maintaining clinician expertise, fostering critical engagement with LLM outputs, and navigating the legal and ethical landscape, we can ensure that LLMs serve as valuable tools in enhancing patient care and supporting health care professionals. This approach addresses the immediate challenges posed by integrating LLMs and sets a foundation for their maintainable and responsible use in the future. %M 38662419 %R 10.2196/56764 %U https://www.jmir.org/2024/1/e56764 %U https://doi.org/10.2196/56764 %U http://www.ncbi.nlm.nih.gov/pubmed/38662419 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e54388 %T Leveraging AI and Machine Learning to Develop and Evaluate a Contextualized User-Friendly Cough Audio Classifier for Detecting Respiratory Diseases: Protocol for a Diagnostic Study in Rural Tanzania %A Isangula,Kahabi Ganka %A Haule,Rogers John %+ School of Nursing and Midwifery, Aga Khan University, Salama House, 344 Urambo St, PO Box 125, Dar Es Salaam, 255, United Republic of Tanzania, 255 754030726, kahabi.isangula@aku.edu %K artificial intelligence %K machine learning %K respiratory diseases %K cough classifiers %K Tanzania %K Africa %K mobile phone %K user-friendly %K cough %K detecting respiratory disease %K diagnostic study %K tuberculosis %K asthma %K chronic obstructive pulmonary disease %K treatment %K management %K noninvasive %K rural %K cross-sectional research %K analysis %K cough sound %D 2024 %7 23.4.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Respiratory diseases, including active tuberculosis (TB), asthma, and chronic obstructive pulmonary disease (COPD), constitute substantial global health challenges, necessitating timely and accurate diagnosis for effective treatment and management. Objective: This research seeks to develop and evaluate a noninvasive user-friendly artificial intelligence (AI)–powered cough audio classifier for detecting these respiratory conditions in rural Tanzania. Methods: This is a nonexperimental cross-sectional research with the primary objective of collection and analysis of cough sounds from patients with active TB, asthma, and COPD in outpatient clinics to generate and evaluate a noninvasive cough audio classifier. Specialized cough sound recording devices, designed to be nonintrusive and user-friendly, will facilitate the collection of diverse cough sound samples from patients attending outpatient clinics in 20 health care facilities in the Shinyanga region. The collected cough sound data will undergo rigorous analysis, using advanced AI signal processing and machine learning techniques. By comparing acoustic features and patterns associated with TB, asthma, and COPD, a robust algorithm capable of automated disease discrimination will be generated facilitating the development of a smartphone-based cough sound classifier. The classifier will be evaluated against the calculated reference standards including clinical assessments, sputum smear, GeneXpert, chest x-ray, culture and sensitivity, spirometry and peak expiratory flow, and sensitivity and predictive values. Results: This research represents a vital step toward enhancing the diagnostic capabilities available in outpatient clinics, with the potential to revolutionize the field of respiratory disease diagnosis. Findings from the 4 phases of the study will be presented as descriptions supported by relevant images, tables, and figures. The anticipated outcome of this research is the creation of a reliable, noninvasive diagnostic cough classifier that empowers health care professionals and patients themselves to identify and differentiate these respiratory diseases based on cough sound patterns. Conclusions: Cough sound classifiers use advanced technology for early detection and management of respiratory conditions, offering a less invasive and more efficient alternative to traditional diagnostics. This technology promises to ease public health burdens, improve patient outcomes, and enhance health care access in under-resourced areas, potentially transforming respiratory disease management globally. International Registered Report Identifier (IRRID): PRR1-10.2196/54388 %M 38652526 %R 10.2196/54388 %U https://www.researchprotocols.org/2024/1/e54388 %U https://doi.org/10.2196/54388 %U http://www.ncbi.nlm.nih.gov/pubmed/38652526 %0 Journal Article %@ 2561-1011 %I JMIR Publications %V 8 %N %P e53091 %T Use of Machine Learning for Early Detection of Maternal Cardiovascular Conditions: Retrospective Study Using Electronic Health Record Data %A Shara,Nawar %A Mirabal-Beltran,Roxanne %A Talmadge,Bethany %A Falah,Noor %A Ahmad,Maryam %A Dempers,Ramon %A Crovatt,Samantha %A Eisenberg,Steven %A Anderson,Kelley %+ School of Nursing, Georgetown University, 3700 Reservoir Road, NW, Washington, DC, 20057, United States, 1 2026873496, rm1910@georgetown.edu %K machine learning %K preeclampsia %K cardiovascular %K maternal %K obstetrics %K health disparities %K woman %K women %K pregnancy %K pregnant %K cardiovascular %K cardiovascular condition %K retrospective study %K electronic health record %K EHR %K technology %K decision-making %K health disparity %K virtual server %K thromboembolism %K kidney failure %K HOPE-CAT %D 2024 %7 22.4.2024 %9 Original Paper %J JMIR Cardio %G English %X Background: Cardiovascular conditions (eg, cardiac and coronary conditions, hypertensive disorders of pregnancy, and cardiomyopathies) were the leading cause of maternal mortality between 2017 and 2019. The United States has the highest maternal mortality rate of any high-income nation, disproportionately impacting those who identify as non-Hispanic Black or Hispanic. Novel clinical approaches to the detection and diagnosis of cardiovascular conditions are therefore imperative. Emerging research is demonstrating that machine learning (ML) is a promising tool for detecting patients at increased risk for hypertensive disorders during pregnancy. However, additional studies are required to determine how integrating ML and big data, such as electronic health records (EHRs), can improve the identification of obstetric patients at higher risk of cardiovascular conditions. Objective: This study aimed to evaluate the capability and timing of a proprietary ML algorithm, Healthy Outcomes for all Pregnancy Experiences-Cardiovascular-Risk Assessment Technology (HOPE-CAT), to detect maternal-related cardiovascular conditions and outcomes. Methods: Retrospective data from the EHRs of a large health care system were investigated by HOPE-CAT in a virtual server environment. Deidentification of EHR data and standardization enabled HOPE-CAT to analyze data without pre-existing biases. The ML algorithm assessed risk factors selected by clinical experts in cardio-obstetrics, and the algorithm was iteratively trained using relevant literature and current standards of risk identification. After refinement of the algorithm’s learned risk factors, risk profiles were generated for every patient including a designation of standard versus high risk. The profiles were individually paired with clinical outcomes pertaining to cardiovascular pregnancy conditions and complications, wherein a delta was calculated between the date of the risk profile and the actual diagnosis or intervention in the EHR. Results: In total, 604 pregnancies resulting in birth had records or diagnoses that could be compared against the risk profile; the majority of patients identified as Black (n=482, 79.8%) and aged between 21 and 34 years (n=509, 84.4%). Preeclampsia (n=547, 90.6%) was the most common condition, followed by thromboembolism (n=16, 2.7%) and acute kidney disease or failure (n=13, 2.2%). The average delta was 56.8 (SD 69.7) days between the identification of risk factors by HOPE-CAT and the first date of diagnosis or intervention of a related condition reported in the EHR. HOPE-CAT showed the strongest performance in early risk detection of myocardial infarction at a delta of 65.7 (SD 81.4) days. Conclusions: This study provides additional evidence to support ML in obstetrical patients to enhance the early detection of cardiovascular conditions during pregnancy. ML can synthesize multiday patient presentations to enhance provider decision-making and potentially reduce maternal health disparities. %M 38648629 %R 10.2196/53091 %U https://cardio.jmir.org/2024/1/e53091 %U https://doi.org/10.2196/53091 %U http://www.ncbi.nlm.nih.gov/pubmed/38648629 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54419 %T Using ChatGPT-4 to Create Structured Medical Notes From Audio Recordings of Physician-Patient Encounters: Comparative Study %A Kernberg,Annessa %A Gold,Jeffrey A %A Mohan,Vishnu %+ Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Sciences University, 3181 SW Sam Jackson Park Road, Portland, OR, 97239, United States, 1 5034944469, mohanV@ohsu.edu %K generative AI %K generative artificial intelligence %K ChatGPT %K simulation %K large language model %K clinical documentation %K quality %K accuracy %K reproducibility %K publicly available %K medical note %K medical notes %K generation %K medical documentation %K documentation %K documentations %K AI %K artificial intelligence %K transcript %K transcripts %K ChatGPT-4 %D 2024 %7 22.4.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Medical documentation plays a crucial role in clinical practice, facilitating accurate patient management and communication among health care professionals. However, inaccuracies in medical notes can lead to miscommunication and diagnostic errors. Additionally, the demands of documentation contribute to physician burnout. Although intermediaries like medical scribes and speech recognition software have been used to ease this burden, they have limitations in terms of accuracy and addressing provider-specific metrics. The integration of ambient artificial intelligence (AI)–powered solutions offers a promising way to improve documentation while fitting seamlessly into existing workflows. Objective: This study aims to assess the accuracy and quality of Subjective, Objective, Assessment, and Plan (SOAP) notes generated by ChatGPT-4, an AI model, using established transcripts of History and Physical Examination as the gold standard. We seek to identify potential errors and evaluate the model’s performance across different categories. Methods: We conducted simulated patient-provider encounters representing various ambulatory specialties and transcribed the audio files. Key reportable elements were identified, and ChatGPT-4 was used to generate SOAP notes based on these transcripts. Three versions of each note were created and compared to the gold standard via chart review; errors generated from the comparison were categorized as omissions, incorrect information, or additions. We compared the accuracy of data elements across versions, transcript length, and data categories. Additionally, we assessed note quality using the Physician Documentation Quality Instrument (PDQI) scoring system. Results: Although ChatGPT-4 consistently generated SOAP-style notes, there were, on average, 23.6 errors per clinical case, with errors of omission (86%) being the most common, followed by addition errors (10.5%) and inclusion of incorrect facts (3.2%). There was significant variance between replicates of the same case, with only 52.9% of data elements reported correctly across all 3 replicates. The accuracy of data elements varied across cases, with the highest accuracy observed in the “Objective” section. Consequently, the measure of note quality, assessed by PDQI, demonstrated intra- and intercase variance. Finally, the accuracy of ChatGPT-4 was inversely correlated to both the transcript length (P=.05) and the number of scorable data elements (P=.05). Conclusions: Our study reveals substantial variability in errors, accuracy, and note quality generated by ChatGPT-4. Errors were not limited to specific sections, and the inconsistency in error types across replicates complicated predictability. Transcript length and data complexity were inversely correlated with note accuracy, raising concerns about the model’s effectiveness in handling complex medical cases. The quality and reliability of clinical notes produced by ChatGPT-4 do not meet the standards required for clinical use. Although AI holds promise in health care, caution should be exercised before widespread adoption. Further research is needed to address accuracy, variability, and potential errors. ChatGPT-4, while valuable in various applications, should not be considered a safe alternative to human-generated clinical documentation at this time. %M 38648636 %R 10.2196/54419 %U https://www.jmir.org/2024/1/e54419 %U https://doi.org/10.2196/54419 %U http://www.ncbi.nlm.nih.gov/pubmed/38648636 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e55037 %T ChatGPT’s Performance in Cardiac Arrest and Bradycardia Simulations Using the American Heart Association's Advanced Cardiovascular Life Support Guidelines: Exploratory Study %A Pham,Cecilia %A Govender,Romi %A Tehami,Salik %A Chavez,Summer %A Adepoju,Omolola E %A Liaw,Winston %+ Tilman J Fertitta Family College of Medicine, University of Houston, 5055 Medical Circle, Houston, TX, 77204, United States, 1 713 743 7047, cmpham4@uh.edu %K ChatGPT %K artificial intelligence %K AI %K large language model %K LLM %K cardiac arrest %K bradycardia %K simulation %K advanced cardiovascular life support %K ACLS %K bradycardia simulations %K America %K American %K heart association %K cardiac %K life support %K exploratory study %K heart %K heart attack %K clinical decision support %K diagnostics %K algorithms %D 2024 %7 22.4.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: ChatGPT is the most advanced large language model to date, with prior iterations having passed medical licensing examinations, providing clinical decision support, and improved diagnostics. Although limited, past studies of ChatGPT’s performance found that artificial intelligence could pass the American Heart Association’s advanced cardiovascular life support (ACLS) examinations with modifications. ChatGPT’s accuracy has not been studied in more complex clinical scenarios. As heart disease and cardiac arrest remain leading causes of morbidity and mortality in the United States, finding technologies that help increase adherence to ACLS algorithms, which improves survival outcomes, is critical. Objective: This study aims to examine the accuracy of ChatGPT in following ACLS guidelines for bradycardia and cardiac arrest. Methods: We evaluated the accuracy of ChatGPT’s responses to 2 simulations based on the 2020 American Heart Association ACLS guidelines with 3 primary outcomes of interest: the mean individual step accuracy, the accuracy score per simulation attempt, and the accuracy score for each algorithm. For each simulation step, ChatGPT was scored for correctness (1 point) or incorrectness (0 points). Each simulation was conducted 20 times. Results: ChatGPT’s median accuracy for each step was 85% (IQR 40%-100%) for cardiac arrest and 30% (IQR 13%-81%) for bradycardia. ChatGPT’s median accuracy over 20 simulation attempts for cardiac arrest was 69% (IQR 67%-74%) and for bradycardia was 42% (IQR 33%-50%). We found that ChatGPT’s outputs varied despite consistent input, the same actions were persistently missed, repetitive overemphasis hindered guidance, and erroneous medication information was presented. Conclusions: This study highlights the need for consistent and reliable guidance to prevent potential medical errors and optimize the application of ChatGPT to enhance its reliability and effectiveness in clinical practice. %M 38648098 %R 10.2196/55037 %U https://www.jmir.org/2024/1/e55037 %U https://doi.org/10.2196/55037 %U http://www.ncbi.nlm.nih.gov/pubmed/38648098 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e55388 %T Evaluation of Prompts to Simplify Cardiovascular Disease Information Generated Using a Large Language Model: Cross-Sectional Study %A Mishra,Vishala %A Sarraju,Ashish %A Kalwani,Neil M %A Dexter,Joseph P %+ Data Science Initiative, Harvard University, Science and Engineering Complex 1.312-10, 150 Western Avenue, Allston, MA, 02134, United States, 1 8023381330, jdexter@fas.harvard.edu %K artificial intelligence %K ChatGPT %K GPT %K digital health %K large language model %K NLP %K language model %K language models %K prompt engineering %K health communication %K generative %K health literacy %K natural language processing %K patient-physician communication %K health communication %K prevention %K cardiology %K cardiovascular %K heart %K education %K educational %K human-in-the-loop %K machine learning %D 2024 %7 22.4.2024 %9 Research Letter %J J Med Internet Res %G English %X In this cross-sectional study, we evaluated the completeness, readability, and syntactic complexity of cardiovascular disease prevention information produced by GPT-4 in response to 4 kinds of prompts. %M 38648104 %R 10.2196/55388 %U https://www.jmir.org/2024/1/e55388 %U https://doi.org/10.2196/55388 %U http://www.ncbi.nlm.nih.gov/pubmed/38648104 %0 Journal Article %@ 1947-2579 %I JMIR Publications %V 16 %N %P e50201 %T Applying Machine Learning Techniques to Implementation Science %A Huguet,Nathalie %A Chen,Jinying %A Parikh,Ravi B %A Marino,Miguel %A Flocke,Susan A %A Likumahuwa-Ackman,Sonja %A Bekelman,Justin %A DeVoe,Jennifer E %+ Department of Family Medicine, Oregon Health & Science University, 3181 SW Sam Jackson Park Road, Portland, OR, 97239, United States, 1 503 494 4404, huguetn@ohsu.edu %K implementation science %K machine learning %K implementation strategies %K techniques %K implementation %K prediction %K adaptation %K acceptance %K challenges %K scientist %D 2024 %7 22.4.2024 %9 Viewpoint %J Online J Public Health Inform %G English %X Machine learning (ML) approaches could expand the usefulness and application of implementation science methods in clinical medicine and public health settings. The aim of this viewpoint is to introduce a roadmap for applying ML techniques to address implementation science questions, such as predicting what will work best, for whom, under what circumstances, and with what predicted level of support, and what and when adaptation or deimplementation are needed. We describe how ML approaches could be used and discuss challenges that implementation scientists and methodologists will need to consider when using ML throughout the stages of implementation. %M 38648094 %R 10.2196/50201 %U https://ojphi.jmir.org/2024/1/e50201 %U https://doi.org/10.2196/50201 %U http://www.ncbi.nlm.nih.gov/pubmed/38648094 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e52344 %T Machine Learning–Based Prediction of Changes in the Clinical Condition of Patients With Complex Chronic Diseases: 2-Phase Pilot Prospective Single-Center Observational Study %A Alvarez-Romero,Celia %A Polo-Molina,Alejandro %A Sánchez-Úbeda,Eugenio Francisco %A Jimenez-De-Juan,Carlos %A Cuadri-Benitez,Maria Pastora %A Rivas-Gonzalez,Jose Antonio %A Portela,Jose %A Palacios,Rafael %A Rodriguez-Morcillo,Carlos %A Muñoz,Antonio %A Parra-Calderon,Carlos Luis %A Nieto-Martin,Maria Dolores %A Ollero-Baturone,Manuel %A Hernández-Quiles,Carlos %+ Internal Medicine Department, Virgen del Rocio University Hospital, Av Manuel Siurot s/n, Sevilla, 41013, Spain, 34 697950012, quiles_es@yahoo.es %K patients with complex chronic diseases %K functional impairment %K Barthel Index %K artificial intelligence %K machine learning %K prediction model %K pilot study %K chronic patients %K chronic %K development study %K prognostic %K diagnostic %K therapeutic %K wearable %K wearables %K wearable activity tracker %K mobility device %K device %K physical activity %K caregiver %D 2024 %7 19.4.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Functional impairment is one of the most decisive prognostic factors in patients with complex chronic diseases. A more significant functional impairment indicates that the disease is progressing, which requires implementing diagnostic and therapeutic actions that stop the exacerbation of the disease. Objective: This study aimed to predict alterations in the clinical condition of patients with complex chronic diseases by predicting the Barthel Index (BI), to assess their clinical and functional status using an artificial intelligence model and data collected through an internet of things mobility device. Methods: A 2-phase pilot prospective single-center observational study was designed. During both phases, patients were recruited, and a wearable activity tracker was allocated to gather physical activity data. Patients were categorized into class A (BI≤20; total dependence), class B (2060; moderate or mild dependence, or independent). Data preprocessing and machine learning techniques were used to analyze mobility data. A decision tree was used to achieve a robust and interpretable model. To assess the quality of the predictions, several metrics including the mean absolute error, median absolute error, and root mean squared error were considered. Statistical analysis was performed using SPSS and Python for the machine learning modeling. Results: Overall, 90 patients with complex chronic diseases were included: 50 during phase 1 (class A: n=10; class B: n=20; and class C: n=20) and 40 during phase 2 (class B: n=20 and class C: n=20). Most patients (n=85, 94%) had a caregiver. The mean value of the BI was 58.31 (SD 24.5). Concerning mobility aids, 60% (n=52) of patients required no aids, whereas the others required walkers (n=18, 20%), wheelchairs (n=15, 17%), canes (n=4, 7%), and crutches (n=1, 1%). Regarding clinical complexity, 85% (n=76) met patient with polypathology criteria with a mean of 2.7 (SD 1.25) categories, 69% (n=61) met the frailty criteria, and 21% (n=19) met the patients with complex chronic diseases criteria. The most characteristic symptoms were dyspnea (n=73, 82%), chronic pain (n=63, 70%), asthenia (n=62, 68%), and anxiety (n=41, 46%). Polypharmacy was presented in 87% (n=78) of patients. The most important variables for predicting the BI were identified as the maximum step count during evening and morning periods and the absence of a mobility device. The model exhibited consistency in the median prediction error with a median absolute error close to 5 in the training, validation, and production-like test sets. The model accuracy for identifying the BI class was 91%, 88%, and 90% in the training, validation, and test sets, respectively. Conclusions: Using commercially available mobility recording devices makes it possible to identify different mobility patterns and relate them to functional capacity in patients with polypathology according to the BI without using clinical parameters. %M 38640473 %R 10.2196/52344 %U https://formative.jmir.org/2024/1/e52344 %U https://doi.org/10.2196/52344 %U http://www.ncbi.nlm.nih.gov/pubmed/38640473 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e51858 %T AI-Led Mental Health Support (Wysa) for Health Care Workers During COVID-19: Service Evaluation %A Chang,Christel Lynne %A Sinha,Chaitali %A Roy,Madhavi %A Wong,John Chee Meng %+ Department of Psychological Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Level 9, NUHS Tower Block, 1E Kent Ridge Road, Singapore, 119228, Singapore, 65 6772 3481, pcmwcmj@nus.edu.sg %K AI %K app %K application %K artificial intelligence %K COVID-19 %K digital %K health care workers %K mental health %K pandemic %K Wysa %D 2024 %7 19.4.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: The impact that the COVID-19 pandemic has had on health care workers’ mental health, in particular, cannot be ignored. Not only did the pandemic exacerbate mental health challenges through elevated stress, anxiety, risk of infection, and social isolation, but regulations to minimize infection additionally hindered the conduct of traditional in-person mental health care. Objective: This study explores the feasibility of using Wysa, an artificial intelligence–led mental health app, among health care workers. Methods: A national tertiary health care cluster in Singapore piloted the use of Wysa among its own health care workers to support the management of their mental well-being during the pandemic (July 2020-June 2022). The adoption of this digital mental health intervention circumvented the limitations of in-person contact and enabled large-scale access to evidence-based care. Rates and patterns of user engagement were evaluated. Results: Overall, the opportunity to use Wysa was well-received. Out of the 527 staff who were onboarded in the app, 80.1% (422/527) completed a minimum of 2 sessions. On average, users completed 10.9 sessions over 3.80 weeks. The interventions most used were for sleep and anxiety, with a strong repeat-use rate. In this sample, 46.2% (73/158) of health care workers reported symptoms of anxiety (Generalized Anxiety Disorder Assessment-7 [GAD-7]), and 15.2% (24/158) were likely to have symptoms of depression (Patient Health Questionnaire-2 [PHQ-2]). Conclusions: Based on the present findings, Wysa appears to strongly engage those with none to moderate symptoms of anxiety. This evaluation demonstrates the viability of implementing Wysa as a standard practice among this sample of health care workers, which may support the use of similar digital interventions across other communities. %M 38640476 %R 10.2196/51858 %U https://formative.jmir.org/2024/1/e51858 %U https://doi.org/10.2196/51858 %U http://www.ncbi.nlm.nih.gov/pubmed/38640476 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e46777 %T The Alzheimer’s Knowledge Base: A Knowledge Graph for Alzheimer Disease Research %A Romano,Joseph D %A Truong,Van %A Kumar,Rachit %A Venkatesan,Mythreye %A Graham,Britney E %A Hao,Yun %A Matsumoto,Nick %A Li,Xi %A Wang,Zhiping %A Ritchie,Marylyn D %A Shen,Li %A Moore,Jason H %+ Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 403 Blockley Hall, 423 Guardian Drive, Philadelphia, PA, 19104, United States, 1 2155735571, joseph.romano@pennmedicine.upenn.edu %K Alzheimer disease %K knowledge graph %K knowledge base %K artificial intelligence %K drug repurposing %K drug discovery %K open source %K Alzheimer %K etiology %K heterogeneous graph %K therapeutic targets %K machine learning %K therapeutic discovery %D 2024 %7 18.4.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: As global populations age and become susceptible to neurodegenerative illnesses, new therapies for Alzheimer disease (AD) are urgently needed. Existing data resources for drug discovery and repurposing fail to capture relationships central to the disease’s etiology and response to drugs. Objective: We designed the Alzheimer’s Knowledge Base (AlzKB) to alleviate this need by providing a comprehensive knowledge representation of AD etiology and candidate therapeutics. Methods: We designed the AlzKB as a large, heterogeneous graph knowledge base assembled using 22 diverse external data sources describing biological and pharmaceutical entities at different levels of organization (eg, chemicals, genes, anatomy, and diseases). AlzKB uses a Web Ontology Language 2 ontology to enforce semantic consistency and allow for ontological inference. We provide a public version of AlzKB and allow users to run and modify local versions of the knowledge base. Results: AlzKB is freely available on the web and currently contains 118,902 entities with 1,309,527 relationships between those entities. To demonstrate its value, we used graph data science and machine learning to (1) propose new therapeutic targets based on similarities of AD to Parkinson disease and (2) repurpose existing drugs that may treat AD. For each use case, AlzKB recovers known therapeutic associations while proposing biologically plausible new ones. Conclusions: AlzKB is a new, publicly available knowledge resource that enables researchers to discover complex translational associations for AD drug discovery. Through 2 use cases, we show that it is a valuable tool for proposing novel therapeutic hypotheses based on public biomedical knowledge. %M 38635981 %R 10.2196/46777 %U https://www.jmir.org/2024/1/e46777 %U https://doi.org/10.2196/46777 %U http://www.ncbi.nlm.nih.gov/pubmed/38635981 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e47194 %T Identifying Links Between Productivity and Biobehavioral Rhythms Modeled From Multimodal Sensor Streams: Exploratory Quantitative Study %A Yan,Runze %A Liu,Xinwen %A Dutcher,Janine M %A Tumminia,Michael J %A Villalba,Daniella %A Cohen,Sheldon %A Creswell,John D %A Creswell,Kasey %A Mankoff,Jennifer %A Dey,Anind K %A Doryab,Afsaneh %+ University of Virginia, 351 McCormick Road, Charlottesville, VA, 22904, United States, 1 4342435823, ad4ks@virginia.edu %K biobehavioral rhythms %K productivity %K computational modeling %K mobile sensing %K mobile phone %D 2024 %7 18.4.2024 %9 Original Paper %J JMIR AI %G English %X Background: Biobehavioral rhythms are biological, behavioral, and psychosocial processes with repeating cycles. Abnormal rhythms have been linked to various health issues, such as sleep disorders, obesity, and depression. Objective: This study aims to identify links between productivity and biobehavioral rhythms modeled from passively collected mobile data streams. Methods: In this study, we used a multimodal mobile sensing data set consisting of data collected from smartphones and Fitbits worn by 188 college students over a continuous period of 16 weeks. The participants reported their self-evaluated daily productivity score (ranging from 0 to 4) during weeks 1, 6, and 15. To analyze the data, we modeled cyclic human behavior patterns based on multimodal mobile sensing data gathered during weeks 1, 6, 15, and the adjacent weeks. Our methodology resulted in the creation of a rhythm model for each sensor feature. Additionally, we developed a correlation-based approach to identify connections between rhythm stability and high or low productivity levels. Results: Differences exist in the biobehavioral rhythms of high- and low-productivity students, with those demonstrating greater rhythm stability also exhibiting higher productivity levels. Notably, a negative correlation (C=–0.16) was observed between productivity and the SE of the phase for the 24-hour period during week 1, with a higher SE indicative of lower rhythm stability. Conclusions: Modeling biobehavioral rhythms has the potential to quantify and forecast productivity. The findings have implications for building novel cyber-human systems that align with human beings’ biobehavioral rhythms to improve health, well-being, and work performance. %M 38875675 %R 10.2196/47194 %U https://ai.jmir.org/2024/1/e47194 %U https://doi.org/10.2196/47194 %U http://www.ncbi.nlm.nih.gov/pubmed/38875675 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54538 %T Integrating Biomarkers From Virtual Reality and Magnetic Resonance Imaging for the Early Detection of Mild Cognitive Impairment Using a Multimodal Learning Approach: Validation Study %A Park,Bogyeom %A Kim,Yuwon %A Park,Jinseok %A Choi,Hojin %A Kim,Seong-Eun %A Ryu,Hokyoung %A Seo,Kyoungwon %+ Department of Applied Artificial Intelligence, Seoul National University of Science and Technology, Sangsang Hall, 4th Fl, Gongneung-ro, Gongneung-dong, Nowon-gu, Seoul, 01811, Republic of Korea, 82 010 5668 8660, kwseo@seoultech.ac.kr %K magnetic resonance imaging %K MRI %K virtual reality %K VR %K early detection %K mild cognitive impairment %K multimodal learning %K hand movement %K eye movement %D 2024 %7 17.4.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Early detection of mild cognitive impairment (MCI), a transitional stage between normal aging and Alzheimer disease, is crucial for preventing the progression of dementia. Virtual reality (VR) biomarkers have proven to be effective in capturing behaviors associated with subtle deficits in instrumental activities of daily living, such as challenges in using a food-ordering kiosk, for early detection of MCI. On the other hand, magnetic resonance imaging (MRI) biomarkers have demonstrated their efficacy in quantifying observable structural brain changes that can aid in early MCI detection. Nevertheless, the relationship between VR-derived and MRI biomarkers remains an open question. In this context, we explored the integration of VR-derived and MRI biomarkers to enhance early MCI detection through a multimodal learning approach. Objective: We aimed to evaluate and compare the efficacy of VR-derived and MRI biomarkers in the classification of MCI while also examining the strengths and weaknesses of each approach. Furthermore, we focused on improving early MCI detection by leveraging multimodal learning to integrate VR-derived and MRI biomarkers. Methods: The study encompassed a total of 54 participants, comprising 22 (41%) healthy controls and 32 (59%) patients with MCI. Participants completed a virtual kiosk test to collect 4 VR-derived biomarkers (hand movement speed, scanpath length, time to completion, and the number of errors), and T1-weighted MRI scans were performed to collect 22 MRI biomarkers from both hemispheres. Analyses of covariance were used to compare these biomarkers between healthy controls and patients with MCI, with age considered as a covariate. Subsequently, the biomarkers that exhibited significant differences between the 2 groups were used to train and validate a multimodal learning model aimed at early screening for patients with MCI among healthy controls. Results: The support vector machine (SVM) using only VR-derived biomarkers achieved a sensitivity of 87.5% and specificity of 90%, whereas the MRI biomarkers showed a sensitivity of 90.9% and specificity of 71.4%. Moreover, a correlation analysis revealed a significant association between MRI-observed brain atrophy and impaired performance in instrumental activities of daily living in the VR environment. Notably, the integration of both VR-derived and MRI biomarkers into a multimodal SVM model yielded superior results compared to unimodal SVM models, achieving higher accuracy (94.4%), sensitivity (100%), specificity (90.9%), precision (87.5%), and F1-score (93.3%). Conclusions: The results indicate that VR-derived biomarkers, characterized by their high specificity, can be valuable as a robust, early screening tool for MCI in a broader older adult population. On the other hand, MRI biomarkers, known for their high sensitivity, excel at confirming the presence of MCI. Moreover, the multimodal learning approach introduced in our study provides valuable insights into the improvement of early MCI detection by integrating a diverse set of biomarkers. %M 38631021 %R 10.2196/54538 %U https://www.jmir.org/2024/1/e54538 %U https://doi.org/10.2196/54538 %U http://www.ncbi.nlm.nih.gov/pubmed/38631021 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e48330 %T Comparing Open-Access Database and Traditional Intensive Care Studies Using Machine Learning: Bibliometric Analysis Study %A Ke,Yuhe %A Yang,Rui %A Liu,Nan %+ Centre for Quantitative Medicine, Duke-NUS Medical School, National University of Singapore, 8 College Road, Singapore, 169857, Singapore, 65 66016503, liu.nan@duke-nus.edu.sg %K BERTopic %K critical care %K eICU %K machine learning %K MIMIC %K Medical Information Mart for Intensive Care %K natural language processing %D 2024 %7 17.4.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Intensive care research has predominantly relied on conventional methods like randomized controlled trials. However, the increasing popularity of open-access, free databases in the past decade has opened new avenues for research, offering fresh insights. Leveraging machine learning (ML) techniques enables the analysis of trends in a vast number of studies. Objective: This study aims to conduct a comprehensive bibliometric analysis using ML to compare trends and research topics in traditional intensive care unit (ICU) studies and those done with open-access databases (OADs). Methods: We used ML for the analysis of publications in the Web of Science database in this study. Articles were categorized into “OAD” and “traditional intensive care” (TIC) studies. OAD studies were included in the Medical Information Mart for Intensive Care (MIMIC), eICU Collaborative Research Database (eICU-CRD), Amsterdam University Medical Centers Database (AmsterdamUMCdb), High Time Resolution ICU Dataset (HiRID), and Pediatric Intensive Care database. TIC studies included all other intensive care studies. Uniform manifold approximation and projection was used to visualize the corpus distribution. The BERTopic technique was used to generate 30 topic-unique identification numbers and to categorize topics into 22 topic families. Results: A total of 227,893 records were extracted. After exclusions, 145,426 articles were identified as TIC and 1301 articles as OAD studies. TIC studies experienced exponential growth over the last 2 decades, culminating in a peak of 16,378 articles in 2021, while OAD studies demonstrated a consistent upsurge since 2018. Sepsis, ventilation-related research, and pediatric intensive care were the most frequently discussed topics. TIC studies exhibited broader coverage than OAD studies, suggesting a more extensive research scope. Conclusions: This study analyzed ICU research, providing valuable insights from a large number of publications. OAD studies complement TIC studies, focusing on predictive modeling, while TIC studies capture essential qualitative information. Integrating both approaches in a complementary manner is the future direction for ICU research. Additionally, natural language processing techniques offer a transformative alternative for literature review and bibliometric analysis. %M 38630522 %R 10.2196/48330 %U https://www.jmir.org/2024/1/e48330 %U https://doi.org/10.2196/48330 %U http://www.ncbi.nlm.nih.gov/pubmed/38630522 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e56655 %T Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study %A He,Zhe %A Bhasuran,Balu %A Jin,Qiao %A Tian,Shubo %A Hanna,Karim %A Shavor,Cindy %A Arguello,Lisbeth Garcia %A Murray,Patrick %A Lu,Zhiyong %+ School of Information, Florida State University, 142 Collegiate Loop, Tallahassee, FL, 32306, United States, 1 8506445775, zhe@fsu.edu %K large language models %K generative artificial intelligence %K generative AI %K ChatGPT %K laboratory test results %K patient education %K natural language processing %D 2024 %7 17.4.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Although patients have easy access to their electronic health records and laboratory test result data through patient portals, laboratory test results are often confusing and hard to understand. Many patients turn to web-based forums or question-and-answer (Q&A) sites to seek advice from their peers. The quality of answers from social Q&A sites on health-related questions varies significantly, and not all responses are accurate or reliable. Large language models (LLMs) such as ChatGPT have opened a promising avenue for patients to have their questions answered. Objective: We aimed to assess the feasibility of using LLMs to generate relevant, accurate, helpful, and unharmful responses to laboratory test–related questions asked by patients and identify potential issues that can be mitigated using augmentation approaches. Methods: We collected laboratory test result–related Q&A data from Yahoo! Answers and selected 53 Q&A pairs for this study. Using the LangChain framework and ChatGPT web portal, we generated responses to the 53 questions from 5 LLMs: GPT-4, GPT-3.5, LLaMA 2, MedAlpaca, and ORCA_mini. We assessed the similarity of their answers using standard Q&A similarity-based evaluation metrics, including Recall-Oriented Understudy for Gisting Evaluation, Bilingual Evaluation Understudy, Metric for Evaluation of Translation With Explicit Ordering, and Bidirectional Encoder Representations from Transformers Score. We used an LLM-based evaluator to judge whether a target model had higher quality in terms of relevance, correctness, helpfulness, and safety than the baseline model. We performed a manual evaluation with medical experts for all the responses to 7 selected questions on the same 4 aspects. Results: Regarding the similarity of the responses from 4 LLMs; the GPT-4 output was used as the reference answer, the responses from GPT-3.5 were the most similar, followed by those from LLaMA 2, ORCA_mini, and MedAlpaca. Human answers from Yahoo data were scored the lowest and, thus, as the least similar to GPT-4–generated answers. The results of the win rate and medical expert evaluation both showed that GPT-4’s responses achieved better scores than all the other LLM responses and human responses on all 4 aspects (relevance, correctness, helpfulness, and safety). LLM responses occasionally also suffered from lack of interpretation in one’s medical context, incorrect statements, and lack of references. Conclusions: By evaluating LLMs in generating responses to patients’ laboratory test result–related questions, we found that, compared to other 4 LLMs and human answers from a Q&A website, GPT-4’s responses were more accurate, helpful, relevant, and safer. There were cases in which GPT-4 responses were inaccurate and not individualized. We identified a number of ways to improve the quality of LLM responses, including prompt engineering, prompt augmentation, retrieval-augmented generation, and response evaluation. %M 38630520 %R 10.2196/56655 %U https://www.jmir.org/2024/1/e56655 %U https://doi.org/10.2196/56655 %U http://www.ncbi.nlm.nih.gov/pubmed/38630520 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e56572 %T A Roadmap for Using Causal Inference and Machine Learning to Personalize Asthma Medication Selection %A Nkoy,Flory L %A Stone,Bryan L %A Zhang,Yue %A Luo,Gang %+ Department of Biomedical Informatics and Medical Education, University of Washington, 850 Republican Street, Building C, Box 358047, Seattle, WA, 98195, United States, 1 2062214596, gangluo@cs.wisc.edu %K asthma %K causal inference %K forecasting %K machine learning %K decision support %K drug %K drugs %K pharmacy %K pharmacies %K pharmacology %K pharmacotherapy %K pharmaceutic %K pharmaceutics %K pharmaceuticals %K pharmaceutical %K medication %K medications %K medication selection %K respiratory %K pulmonary %K forecast %K ICS %K inhaled corticosteroid %K inhaler %K inhaled %K corticosteroid %K corticosteroids %K artificial intelligence %K personalized %K customized %D 2024 %7 17.4.2024 %9 Viewpoint %J JMIR Med Inform %G English %X Inhaled corticosteroid (ICS) is a mainstay treatment for controlling asthma and preventing exacerbations in patients with persistent asthma. Many types of ICS drugs are used, either alone or in combination with other controller medications. Despite the widespread use of ICSs, asthma control remains suboptimal in many people with asthma. Suboptimal control leads to recurrent exacerbations, causes frequent ER visits and inpatient stays, and is due to multiple factors. One such factor is the inappropriate ICS choice for the patient. While many interventions targeting other factors exist, less attention is given to inappropriate ICS choice. Asthma is a heterogeneous disease with variable underlying inflammations and biomarkers. Up to 50% of people with asthma exhibit some degree of resistance or insensitivity to certain ICSs due to genetic variations in ICS metabolizing enzymes, leading to variable responses to ICSs. Yet, ICS choice, especially in the primary care setting, is often not tailored to the patient’s characteristics. Instead, ICS choice is largely by trial and error and often dictated by insurance reimbursement, organizational prescribing policies, or cost, leading to a one-size-fits-all approach with many patients not achieving optimal control. There is a pressing need for a decision support tool that can predict an effective ICS at the point of care and guide providers to select the ICS that will most likely and quickly ease patient symptoms and improve asthma control. To date, no such tool exists. Predicting which patient will respond well to which ICS is the first step toward developing such a tool. However, no study has predicted ICS response, forming a gap. While the biologic heterogeneity of asthma is vast, few, if any, biomarkers and genotypes can be used to systematically profile all patients with asthma and predict ICS response. As endotyping or genotyping all patients is infeasible, readily available electronic health record data collected during clinical care offer a low-cost, reliable, and more holistic way to profile all patients. In this paper, we point out the need for developing a decision support tool to guide ICS selection and the gap in fulfilling the need. Then we outline an approach to close this gap via creating a machine learning model and applying causal inference to predict a patient’s ICS response in the next year based on the patient’s characteristics. The model uses electronic health record data to characterize all patients and extract patterns that could mirror endotype or genotype. This paper supplies a roadmap for future research, with the eventual goal of shifting asthma care from one-size-fits-all to personalized care, improve outcomes, and save health care resources. %M 38630536 %R 10.2196/56572 %U https://medinform.jmir.org/2024/1/e56572 %U https://doi.org/10.2196/56572 %U http://www.ncbi.nlm.nih.gov/pubmed/38630536 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e57778 %T Authors’ Reply: “Evaluating GPT-4’s Cognitive Functions Through the Bloom Taxonomy: Insights and Clarifications” %A Herrmann-Werner,Anne %A Festl-Wietek,Teresa %A Holderried,Friederike %A Herschbach,Lea %A Griewatz,Jan %A Masters,Ken %A Zipfel,Stephan %A Mahling,Moritz %+ Tübingen Institute for Medical Education, Faculty of Medicine, University of Tübingen, Elfriede-Aulhorn-Strasse 10, Tübingen, 72076, Germany, 49 7071 29 73715, teresa.festl-wietek@med.uni-tuebingen.de %K answer %K artificial intelligence %K assessment %K Bloom’s taxonomy %K ChatGPT %K classification %K error %K exam %K examination %K generative %K GPT-4 %K Generative Pre-trained Transformer 4 %K language model %K learning outcome %K LLM %K MCQ %K medical education %K medical exam %K multiple-choice question %K natural language processing %K NLP %K psychosomatic %K question %K response %K taxonomy %D 2024 %7 16.4.2024 %9 Letter to the Editor %J J Med Internet Res %G English %X %M 38625723 %R 10.2196/57778 %U https://www.jmir.org/2024/1/e57778 %U https://doi.org/10.2196/57778 %U http://www.ncbi.nlm.nih.gov/pubmed/38625723 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e56997 %T Evaluating GPT-4’s Cognitive Functions Through the Bloom Taxonomy: Insights and Clarifications %A Huang,Kuan-Ju %+ Department of Obstetrics and Gynecology, National Taiwan University Hospital Yunlin Branch, No 579, Sec 2, Yunlin Rd, Douliu City, Yunlin County, 640, Taiwan, 886 55323911 ext 563413, restroomer@icloud.com %K artificial intelligence %K ChatGPT %K Bloom taxonomy %K AI %K cognition %D 2024 %7 16.4.2024 %9 Letter to the Editor %J J Med Internet Res %G English %X %M 38625725 %R 10.2196/56997 %U https://www.jmir.org/2024/1/e56997 %U https://doi.org/10.2196/56997 %U http://www.ncbi.nlm.nih.gov/pubmed/38625725 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e55794 %T Adverse Event Signal Detection Using Patients’ Concerns in Pharmaceutical Care Records: Evaluation of Deep Learning Models %A Nishioka,Satoshi %A Watabe,Satoshi %A Yanagisawa,Yuki %A Sayama,Kyoko %A Kizaki,Hayato %A Imai,Shungo %A Someya,Mitsuhiro %A Taniguchi,Ryoo %A Yada,Shuntaro %A Aramaki,Eiji %A Hori,Satoko %+ Division of Drug Informatics, Keio University Faculty of Pharmacy, 1-5-30 Shibakoen, Minato-ku, Tokyo, 105-8512, Japan, 81 3 5400 2650, satokoh@keio.jp %K cancer %K anticancer drug %K adverse event %K side effect %K patient-reported outcome %K patients’ voice %K patient-oriented %K patient narrative %K natural language processing %K deep learning %K pharmaceutical care record %K SOAP %D 2024 %7 16.4.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Early detection of adverse events and their management are crucial to improving anticancer treatment outcomes, and listening to patients’ subjective opinions (patients’ voices) can make a major contribution to improving safety management. Recent progress in deep learning technologies has enabled various new approaches for the evaluation of safety-related events based on patient-generated text data, but few studies have focused on the improvement of real-time safety monitoring for individual patients. In addition, no study has yet been performed to validate deep learning models for screening patients’ narratives for clinically important adverse event signals that require medical intervention. In our previous work, novel deep learning models have been developed to detect adverse event signals for hand-foot syndrome or adverse events limiting patients’ daily lives from the authored narratives of patients with cancer, aiming ultimately to use them as safety monitoring support tools for individual patients. Objective: This study was designed to evaluate whether our deep learning models can screen clinically important adverse event signals that require intervention by health care professionals. The applicability of our deep learning models to data on patients’ concerns at pharmacies was also assessed. Methods: Pharmaceutical care records at community pharmacies were used for the evaluation of our deep learning models. The records followed the SOAP format, consisting of subjective (S), objective (O), assessment (A), and plan (P) columns. Because of the unique combination of patients’ concerns in the S column and the professional records of the pharmacists, this was considered a suitable data for the present purpose. Our deep learning models were applied to the S records of patients with cancer, and the extracted adverse event signals were assessed in relation to medical actions and prescribed drugs. Results: From 30,784 S records of 2479 patients with at least 1 prescription of anticancer drugs, our deep learning models extracted true adverse event signals with more than 80% accuracy for both hand-foot syndrome (n=152, 91%) and adverse events limiting patients’ daily lives (n=157, 80.1%). The deep learning models were also able to screen adverse event signals that require medical intervention by health care providers. The extracted adverse event signals could reflect the side effects of anticancer drugs used by the patients based on analysis of prescribed anticancer drugs. “Pain or numbness” (n=57, 36.3%), “fever” (n=46, 29.3%), and “nausea” (n=40, 25.5%) were common symptoms out of the true adverse event signals identified by the model for adverse events limiting patients’ daily lives. Conclusions: Our deep learning models were able to screen clinically important adverse event signals that require intervention for symptoms. It was also confirmed that these deep learning models could be applied to patients’ subjective information recorded in pharmaceutical care records accumulated during pharmacists’ daily work. %M 38625718 %R 10.2196/55794 %U https://www.jmir.org/2024/1/e55794 %U https://doi.org/10.2196/55794 %U http://www.ncbi.nlm.nih.gov/pubmed/38625718 %0 Journal Article %@ 1947-2579 %I JMIR Publications %V 16 %N %P e50771 %T Machine Learning for Prediction of Tuberculosis Detection: Case Study of Trained African Giant Pouched Rats %A Jonathan,Joan %A Barakabitze,Alcardo Alex %A Fast,Cynthia D %A Cox,Christophe %+ Department of Informatics and Information Technology, Sokoine University of Agriculture, PO Box 3038, Morogoro, United Republic of Tanzania, 255 763 630 054, joanjonathan@sua.ac.tz %K machine learning %K African giant pouched rat %K diagnosis %K tuberculosis %K health care %D 2024 %7 16.4.2024 %9 Original Paper %J Online J Public Health Inform %G English %X Background: Technological advancement has led to the growth and rapid increase of tuberculosis (TB) medical data generated from different health care areas, including diagnosis. Prioritizing better adoption and acceptance of innovative diagnostic technology to reduce the spread of TB significantly benefits developing countries. Trained TB-detection rats are used in Tanzania and Ethiopia for operational research to complement other TB diagnostic tools. This technology has increased new TB case detection owing to its speed, cost-effectiveness, and sensitivity. Objective: During the TB detection process, rats produce vast amounts of data, providing an opportunity to identify interesting patterns that influence TB detection performance. This study aimed to develop models that predict if the rat will hit (indicate the presence of TB within) the sample or not using machine learning (ML) techniques. The goal was to improve the diagnostic accuracy and performance of TB detection involving rats. Methods: APOPO (Anti-Persoonsmijnen Ontmijnende Product Ontwikkeling) Center in Morogoro provided data for this study from 2012 to 2019, and 366,441 observations were used to build predictive models using ML techniques, including decision tree, random forest, naïve Bayes, support vector machine, and k-nearest neighbor, by incorporating a variety of variables, such as the diagnostic results from partner health clinics using methods endorsed by the World Health Organization (WHO). Results: The support vector machine technique yielded the highest accuracy of 83.39% for prediction compared to other ML techniques used. Furthermore, this study found that the inclusion of variables related to whether the sample contained TB or not increased the performance accuracy of the predictive model. Conclusions: The inclusion of variables related to the diagnostic results of TB samples may improve the detection performance of the trained rats. The study results may be of importance to TB-detection rat trainers and TB decision-makers as the results may prompt them to take action to maintain the usefulness of the technology and increase the TB detection performance of trained rats. %M 38625737 %R 10.2196/50771 %U https://ojphi.jmir.org/2024/1/e50771 %U https://doi.org/10.2196/50771 %U http://www.ncbi.nlm.nih.gov/pubmed/38625737 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e55762 %T Assessing the Accuracy of Generative Conversational Artificial Intelligence in Debunking Sleep Health Myths: Mixed Methods Comparative Study With Expert Analysis %A Bragazzi,Nicola Luigi %A Garbarino,Sergio %+ Human Nutrition Unit, Department of Food and Drugs, University of Parma, Via Volturno 39, Parma, 43125, Italy, 39 0521 903121, nicolaluigi.bragazzi@unipr.it %K sleep %K sleep health %K sleep-related disbeliefs %K generative conversational artificial intelligence %K chatbot %K ChatGPT %K misinformation %K artificial intelligence %K comparative study %K expert analysis %K adequate sleep %K well-being %K sleep trackers %K sleep health education %K sleep-related %K chronic disease %K healthcare cost %K sleep timing %K sleep duration %K presleep behaviors %K sleep experts %K healthy behavior %K public health %K conversational agents %D 2024 %7 16.4.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Adequate sleep is essential for maintaining individual and public health, positively affecting cognition and well-being, and reducing chronic disease risks. It plays a significant role in driving the economy, public safety, and managing health care costs. Digital tools, including websites, sleep trackers, and apps, are key in promoting sleep health education. Conversational artificial intelligence (AI) such as ChatGPT (OpenAI, Microsoft Corp) offers accessible, personalized advice on sleep health but raises concerns about potential misinformation. This underscores the importance of ensuring that AI-driven sleep health information is accurate, given its significant impact on individual and public health, and the spread of sleep-related myths. Objective: This study aims to examine ChatGPT’s capability to debunk sleep-related disbeliefs. Methods: A mixed methods design was leveraged. ChatGPT categorized 20 sleep-related myths identified by 10 sleep experts and rated them in terms of falseness and public health significance, on a 5-point Likert scale. Sensitivity, positive predictive value, and interrater agreement were also calculated. A qualitative comparative analysis was also conducted. Results: ChatGPT labeled a significant portion (n=17, 85%) of the statements as “false” (n=9, 45%) or “generally false” (n=8, 40%), with varying accuracy across different domains. For instance, it correctly identified most myths about “sleep timing,” “sleep duration,” and “behaviors during sleep,” while it had varying degrees of success with other categories such as “pre-sleep behaviors” and “brain function and sleep.” ChatGPT’s assessment of the degree of falseness and public health significance, on the 5-point Likert scale, revealed an average score of 3.45 (SD 0.87) and 3.15 (SD 0.99), respectively, indicating a good level of accuracy in identifying the falseness of statements and a good understanding of their impact on public health. The AI-based tool showed a sensitivity of 85% and a positive predictive value of 100%. Overall, this indicates that when ChatGPT labels a statement as false, it is highly reliable, but it may miss identifying some false statements. When comparing with expert ratings, high intraclass correlation coefficients (ICCs) between ChatGPT’s appraisals and expert opinions could be found, suggesting that the AI’s ratings were generally aligned with expert views on falseness (ICC=.83, P<.001) and public health significance (ICC=.79, P=.001) of sleep-related myths. Qualitatively, both ChatGPT and sleep experts refuted sleep-related misconceptions. However, ChatGPT adopted a more accessible style and provided a more generalized view, focusing on broad concepts, while experts sometimes used technical jargon, providing evidence-based explanations. Conclusions: ChatGPT-4 can accurately address sleep-related queries and debunk sleep-related myths, with a performance comparable to sleep experts, even if, given its limitations, the AI cannot completely replace expert opinions, especially in nuanced and complex fields such as sleep health, but can be a valuable complement in the dissemination of updated information and promotion of healthy behaviors. %M 38501898 %R 10.2196/55762 %U https://formative.jmir.org/2024/1/e55762 %U https://doi.org/10.2196/55762 %U http://www.ncbi.nlm.nih.gov/pubmed/38501898 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e50475 %T Integrating Explainable Machine Learning in Clinical Decision Support Systems: Study Involving a Modified Design Thinking Approach %A Shulha,Michael %A Hovdebo,Jordan %A D’Souza,Vinita %A Thibault,Francis %A Harmouche,Rola %+ Lady Davis Institute for Medical Research, Jewish General Hospital, Centre intégré universitaire de santé et de services sociaux (CIUSSS) du Centre-Ouest-de-l'Île-de-Montréal, Pavilion B-274, 3755 Chem. de la Côte-Sainte-Catherine, Montreal, QC, H3T 1E2, Canada, 1 514 340 8222, michael.shulha.ccomtl@ssss.gouv.qc.ca %K explainable machine learning %K XML %K design thinking approach %K NASSS framework %K clinical decision support %K clinician engagement %K clinician-facing interface %K clinician trust in machine learning %K COVID-19 %K chest x-ray %K severity prediction %D 2024 %7 16.4.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Though there has been considerable effort to implement machine learning (ML) methods for health care, clinical implementation has lagged. Incorporating explainable machine learning (XML) methods through the development of a decision support tool using a design thinking approach is expected to lead to greater uptake of such tools. Objective: This work aimed to explore how constant engagement of clinician end users can address the lack of adoption of ML tools in clinical contexts due to their lack of transparency and address challenges related to presenting explainability in a decision support interface. Methods: We used a design thinking approach augmented with additional theoretical frameworks to provide more robust approaches to different phases of design. In particular, in the problem definition phase, we incorporated the nonadoption, abandonment, scale-up, spread, and sustainability of technology in health care (NASSS) framework to assess these aspects in a health care network. This process helped focus on the development of a prognostic tool that predicted the likelihood of admission to an intensive care ward based on disease severity in chest x-ray images. In the ideate, prototype, and test phases, we incorporated a metric framework to assess physician trust in artificial intelligence (AI) tools. This allowed us to compare physicians’ assessments of the domain representation, action ability, and consistency of the tool. Results: Physicians found the design of the prototype elegant, and domain appropriate representation of data was displayed in the tool. They appreciated the simplified explainability overlay, which only displayed the most predictive patches that cumulatively explained 90% of the final admission risk score. Finally, in terms of consistency, physicians unanimously appreciated the capacity to compare multiple x-ray images in the same view. They also appreciated the ability to toggle the explainability overlay so that both options made it easier for them to assess how consistently the tool was identifying elements of the x-ray image they felt would contribute to overall disease severity. Conclusions: The adopted approach is situated in an evolving space concerned with incorporating XML or AI technologies into health care software. We addressed the alignment of AI as it relates to clinician trust, describing an approach to wire framing and prototyping, which incorporates the use of a theoretical framework for trust in the design process itself. Moreover, we proposed that alignment of AI is dependent upon integration of end users throughout the larger design process. Our work shows the importance and value of engaging end users prior to tool development. We believe that the described approach is a unique and valuable contribution that outlines a direction for ML experts, user experience designers, and clinician end users on how to collaborate in the creation of trustworthy and usable XML-based clinical decision support tools. %M 38625728 %R 10.2196/50475 %U https://formative.jmir.org/2024/1/e50475 %U https://doi.org/10.2196/50475 %U http://www.ncbi.nlm.nih.gov/pubmed/38625728 %0 Journal Article %@ 2369-3762 %I %V 10 %N %P e57696 %T A Student’s Viewpoint on ChatGPT Use and Automation Bias in Medical Education %A Dsouza,Jeanne Maria %K AI %K artificial intelligence %K ChatGPT %K medical education %D 2024 %7 15.4.2024 %9 %J JMIR Med Educ %G English %X %R 10.2196/57696 %U https://mededu.jmir.org/2024/1/e57696 %U https://doi.org/10.2196/57696 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 13 %N %P e54490 %T Application of AI in Sepsis: Citation Network Analysis and Evidence Synthesis %A Wu,MeiJung %A Islam,Md Mohaimenul %A Poly,Tahmina Nasrin %A Lin,Ming-Chin %+ Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, 250 Wuxing St, Xinyi District, Taipei, 110, Taiwan, 886 66202589, arbiter@tmu.edu.tw %K AI %K artificial intelligence %K bibliometric analysis %K bibliometric %K citation %K deep learning %K machine learning %K network analysis %K publication %K sepsis %K trend %K visualization %K VOSviewer %K Web of Science %K WoS %D 2024 %7 15.4.2024 %9 Original %J Interact J Med Res %G English %X Background: Artificial intelligence (AI) has garnered considerable attention in the context of sepsis research, particularly in personalized diagnosis and treatment. Conducting a bibliometric analysis of existing publications can offer a broad overview of the field and identify current research trends and future research directions. Objective: The objective of this study is to leverage bibliometric data to provide a comprehensive overview of the application of AI in sepsis. Methods: We conducted a search in the Web of Science Core Collection database to identify relevant articles published in English until August 31, 2023. A predefined search strategy was used, evaluating titles, abstracts, and full texts as needed. We used the Bibliometrix and VOSviewer tools to visualize networks showcasing the co-occurrence of authors, research institutions, countries, citations, and keywords. Results: A total of 259 relevant articles published between 2014 and 2023 (until August) were identified. Over the past decade, the annual publication count has consistently risen. Leading journals in this domain include Critical Care Medicine (17/259, 6.6%), Frontiers in Medicine (17/259, 6.6%), and Scientific Reports (11/259, 4.2%). The United States (103/259, 39.8%), China (83/259, 32%), United Kingdom (14/259, 5.4%), and Taiwan (12/259, 4.6%) emerged as the most prolific countries in terms of publications. Notable institutions in this field include the University of California System, Emory University, and Harvard University. The key researchers working in this area include Ritankar Das, Chris Barton, and Rishikesan Kamaleswaran. Although the initial period witnessed a relatively low number of articles focused on AI applications for sepsis, there has been a significant surge in research within this area in recent years (2014-2023). Conclusions: This comprehensive analysis provides valuable insights into AI-related research conducted in the field of sepsis, aiding health care policy makers and researchers in understanding the potential of AI and formulating effective research plans. Such analysis serves as a valuable resource for determining the advantages, sustainability, scope, and potential impact of AI models in sepsis. %M 38621231 %R 10.2196/54490 %U https://www.i-jmr.org/2024/1/e54490 %U https://doi.org/10.2196/54490 %U http://www.ncbi.nlm.nih.gov/pubmed/38621231 %0 Journal Article %@ 2561-3278 %I JMIR Publications %V 9 %N %P e56246 %T Impact of Audio Data Compression on Feature Extraction for Vocal Biomarker Detection: Validation Study %A Oreskovic,Jessica %A Kaufman,Jaycee %A Fossat,Yan %+ Klick Labs, 175 Bloor St E #300, 3rd floor, Toronto, ON, M4W3R8, Canada, 1 6472068717, yfossat@klick.com %K vocal biomarker %K biomarker %K biomarkers %K sound %K sounds %K audio %K compression %K voice %K acoustic %K acoustics %K audio compression %K feature extraction %K Python %K speech %K detect %K detection %K algorithm %K algorithms %D 2024 %7 15.4.2024 %9 Original Paper %J JMIR Biomed Eng %G English %X Background: Vocal biomarkers, derived from acoustic analysis of vocal characteristics, offer noninvasive avenues for medical screening, diagnostics, and monitoring. Previous research demonstrated the feasibility of predicting type 2 diabetes mellitus through acoustic analysis of smartphone-recorded speech. Building upon this work, this study explores the impact of audio data compression on acoustic vocal biomarker development, which is critical for broader applicability in health care. Objective: The objective of this research is to analyze how common audio compression algorithms (MP3, M4A, and WMA) applied by 3 different conversion tools at 2 bitrates affect features crucial for vocal biomarker detection. Methods: The impact of audio data compression on acoustic vocal biomarker development was investigated using uncompressed voice samples converted into MP3, M4A, and WMA formats at 2 bitrates (320 and 128 kbps) with MediaHuman (MH) Audio Converter, WonderShare (WS) UniConverter, and Fast Forward Moving Picture Experts Group (FFmpeg). The data set comprised recordings from 505 participants, totaling 17,298 audio files, collected using a smartphone. Participants recorded a fixed English sentence up to 6 times daily for up to 14 days. Feature extraction, including pitch, jitter, intensity, and Mel-frequency cepstral coefficients (MFCCs), was conducted using Python and Parselmouth. The Wilcoxon signed rank test and the Bonferroni correction for multiple comparisons were used for statistical analysis. Results: In this study, 36,970 audio files were initially recorded from 505 participants, with 17,298 recordings meeting the fixed sentence criteria after screening. Differences between the audio conversion software, MH, WS, and FFmpeg, were notable, impacting compression outcomes such as constant or variable bitrates. Analysis encompassed diverse data compression formats and a wide array of voice features and MFCCs. Wilcoxon signed rank tests yielded P values, with those below the Bonferroni-corrected significance level indicating significant alterations due to compression. The results indicated feature-specific impacts of compression across formats and bitrates. MH-converted files exhibited greater resilience compared to WS-converted files. Bitrate also influenced feature stability, with 38 cases affected uniquely by a single bitrate. Notably, voice features showed greater stability than MFCCs across conversion methods. Conclusions: Compression effects were found to be feature specific, with MH and FFmpeg showing greater resilience. Some features were consistently affected, emphasizing the importance of understanding feature resilience for diagnostic applications. Considering the implementation of vocal biomarkers in health care, finding features that remain consistent through compression for data storage or transmission purposes is valuable. Focused on specific features and formats, future research could broaden the scope to include diverse features, real-time compression algorithms, and various recording methods. This study enhances our understanding of audio compression’s influence on voice features and MFCCs, providing insights for developing applications across fields. The research underscores the significance of feature stability in working with compressed audio data, laying a foundation for informed voice data use in evolving technological landscapes. %M 38875677 %R 10.2196/56246 %U https://biomedeng.jmir.org/2024/1/e56246 %U https://doi.org/10.2196/56246 %U http://www.ncbi.nlm.nih.gov/pubmed/38875677 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e52412 %T Novel Approach for Detecting Respiratory Syncytial Virus in Pediatric Patients Using Machine Learning Models Based on Patient-Reported Symptoms: Model Development and Validation Study %A Kawamoto,Shota %A Morikawa,Yoshihiko %A Yahagi,Naohisa %+ Graduate School of Media and Governance, Keio University, 5322 Endo, Fujisawa, 252-0882, Japan, 81 466 49 3404, yahagin@sfc.keio.ac.jp %K respiratory syncytial virus %K machine learning %K self-reported information %K clinical decision support system %K decision support %K decision-making %K artificial intelligence %K model development %K evaluation study %K detection %K respiratory %K respiratory virus %K virus %K machine learning model %K pediatric %K Japan %K detection model %D 2024 %7 12.4.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Respiratory syncytial virus (RSV) affects children, causing serious infections, particularly in high-risk groups. Given the seasonality of RSV and the importance of rapid isolation of infected individuals, there is an urgent need for more efficient diagnostic methods to expedite this process. Objective: This study aimed to investigate the performance of a machine learning model that leverages the temporal diversity of symptom onset for detecting RSV infections and elucidate its discriminatory ability. Methods: The study was conducted in pediatric and emergency outpatient settings in Japan. We developed a detection model that remotely confirms RSV infection based on patient-reported symptom information obtained using a structured electronic template incorporating the differential points of skilled pediatricians. An extreme gradient boosting–based machine learning model was developed using the data of 4174 patients aged ≤24 months who underwent RSV rapid antigen testing. These patients visited either the pediatric or emergency department of Yokohama City Municipal Hospital between January 1, 2009, and December 31, 2015. The primary outcome was the diagnostic accuracy of the machine learning model for RSV infection, as determined by rapid antigen testing, measured using the area under the receiver operating characteristic curve. The clinical efficacy was evaluated by calculating the discriminative performance based on the number of days elapsed since the onset of the first symptom and exclusion rates based on thresholds of reasonable sensitivity and specificity. Results: Our model demonstrated an area under the receiver operating characteristic curve of 0.811 (95% CI 0.784-0.833) with good calibration and 0.746 (95% CI 0.694-0.794) for patients within 3 days of onset. It accurately captured the temporal evolution of symptoms; based on adjusted thresholds equivalent to those of a rapid antigen test, our model predicted that 6.9% (95% CI 5.4%-8.5%) of patients in the entire cohort would be positive and 68.7% (95% CI 65.4%-71.9%) would be negative. Our model could eliminate the need for additional testing in approximately three-quarters of all patients. Conclusions: Our model may facilitate the immediate detection of RSV infection in outpatient settings and, potentially, in home environments. This approach could streamline the diagnostic process, reduce discomfort caused by invasive tests in children, and allow rapid implementation of appropriate treatments and isolation at home. The findings underscore the potential of machine learning in augmenting clinical decision-making in the early detection of RSV infection. %M 38608268 %R 10.2196/52412 %U https://formative.jmir.org/2024/1/e52412 %U https://doi.org/10.2196/52412 %U http://www.ncbi.nlm.nih.gov/pubmed/38608268 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51250 %T Application of AI in Multilevel Pain Assessment Using Facial Images: Systematic Review and Meta-Analysis %A Huo,Jian %A Yu,Yan %A Lin,Wei %A Hu,Anmin %A Wu,Chaoran %+ Department of Anesthesia, Shenzhen People's Hospital, The First Affiliated Hospital of Southern University of Science and Technology, Shenzhen Key Medical Discipline, No 1017, Dongmen North Road, Shenzhen, 518020, China, 86 18100282848, wu.chaoran@szhospital.com %K computer vision %K facial image %K monitoring %K multilevel pain assessment %K pain %K postoperative %K status %D 2024 %7 12.4.2024 %9 Review %J J Med Internet Res %G English %X Background: The continuous monitoring and recording of patients’ pain status is a major problem in current research on postoperative pain management. In the large number of original or review articles focusing on different approaches for pain assessment, many researchers have investigated how computer vision (CV) can help by capturing facial expressions. However, there is a lack of proper comparison of results between studies to identify current research gaps. Objective: The purpose of this systematic review and meta-analysis was to investigate the diagnostic performance of artificial intelligence models for multilevel pain assessment from facial images. Methods: The PubMed, Embase, IEEE, Web of Science, and Cochrane Library databases were searched for related publications before September 30, 2023. Studies that used facial images alone to estimate multiple pain values were included in the systematic review. A study quality assessment was conducted using the Quality Assessment of Diagnostic Accuracy Studies, 2nd edition tool. The performance of these studies was assessed by metrics including sensitivity, specificity, log diagnostic odds ratio (LDOR), and area under the curve (AUC). The intermodal variability was assessed and presented by forest plots. Results: A total of 45 reports were included in the systematic review. The reported test accuracies ranged from 0.27-0.99, and the other metrics, including the mean standard error (MSE), mean absolute error (MAE), intraclass correlation coefficient (ICC), and Pearson correlation coefficient (PCC), ranged from 0.31-4.61, 0.24-2.8, 0.19-0.83, and 0.48-0.92, respectively. In total, 6 studies were included in the meta-analysis. Their combined sensitivity was 98% (95% CI 96%-99%), specificity was 98% (95% CI 97%-99%), LDOR was 7.99 (95% CI 6.73-9.31), and AUC was 0.99 (95% CI 0.99-1). The subgroup analysis showed that the diagnostic performance was acceptable, although imbalanced data were still emphasized as a major problem. All studies had at least one domain with a high risk of bias, and for 20% (9/45) of studies, there were no applicability concerns. Conclusions: This review summarizes recent evidence in automatic multilevel pain estimation from facial expressions and compared the test accuracy of results in a meta-analysis. Promising performance for pain estimation from facial images was established by current CV algorithms. Weaknesses in current studies were also identified, suggesting that larger databases and metrics evaluating multiclass classification performance could improve future studies. Trial Registration: PROSPERO CRD42023418181; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=418181 %M 38607660 %R 10.2196/51250 %U https://www.jmir.org/2024/1/e51250 %U https://doi.org/10.2196/51250 %U http://www.ncbi.nlm.nih.gov/pubmed/38607660 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e52612 %T Applications of Artificial Intelligence in Emergency Departments to Improve Wait Times: Protocol for an Integrative Living Review %A Ahmadzadeh,Bahareh %A Patey,Christopher %A Hurley,Oliver %A Knight,John %A Norman,Paul %A Farrell,Alison %A Czarnuch,Stephen %A Asghari,Shabnam %+ Centre for Rural Health Studies, Faculty of Medicine, Memorial University of Newfoundland, 300 Prince Philip Drive, St. John's, NL, A1B3V6, Canada, 1 709 777 2142, sasghari@mun.ca %K emergency department %K ED %K wait time %K artificial intelligence %K AI %K living systematic review %K LSR %D 2024 %7 12.4.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Long wait times in the emergency department (ED) are a major issue for health care systems all over the world. The application of artificial intelligence (AI) is a novel strategy to reduce ED wait times when compared to the interventions included in previous research endeavors. To date, comprehensive systematic reviews that include studies involving AI applications in the context of EDs have covered a wide range of AI implementation issues. However, the lack of an iterative update strategy limits the use of these reviews. Since the subject of AI development is cutting edge and is continuously changing, reviews in this area must be frequently updated to remain relevant. Objective: This study aims to provide a summary of the evidence that is currently available regarding how AI can affect ED wait times; discuss the applications of AI in improving wait times; and periodically assess the depth, breadth, and quality of the evidence supporting the application of AI in reducing ED wait times. Methods: We plan to conduct a living systematic review (LSR). Our strategy involves conducting continuous monitoring of evidence, with biannual search updates and annual review updates. Upon completing the initial round of the review, we will refine the search strategy and establish clear schedules for updating the LSR. An interpretive synthesis using Whittemore and Knafl’s framework will be performed to compile and summarize the findings. The review will be carried out using an integrated knowledge translation strategy, and knowledge users will be involved at all stages of the review to guarantee applicability, usability, and clarity of purpose. Results: The literature search was completed by September 22, 2023, and identified 17,569 articles. The title and abstract screening were completed by December 9, 2023. In total, 70 papers were eligible. The full-text screening is in progress. Conclusions: The review will summarize AI applications that improve ED wait time. The LSR enables researchers to maintain high methodological rigor while enhancing the timeliness, applicability, and value of the review. International Registered Report Identifier (IRRID): DERR1-10.2196/52612 %M 38607662 %R 10.2196/52612 %U https://www.researchprotocols.org/2024/1/e52612 %U https://doi.org/10.2196/52612 %U http://www.ncbi.nlm.nih.gov/pubmed/38607662 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51138 %T A Perspective on Crowdsourcing and Human-in-the-Loop Workflows in Precision Health %A Washington,Peter %+ Information and Computer Sciences, University of Hawaii at Manoa, 1680 East-West Road, Honolulu, HI, 96822, United States, pyw@hawaii.edu %K crowdsourcing %K digital medicine %K human-in-the-loop %K human in the loop %K human-AI collaboration %K machine learning %K precision health %K artificial intelligence %K AI %D 2024 %7 11.4.2024 %9 Viewpoint %J J Med Internet Res %G English %X Modern machine learning approaches have led to performant diagnostic models for a variety of health conditions. Several machine learning approaches, such as decision trees and deep neural networks, can, in principle, approximate any function. However, this power can be considered to be both a gift and a curse, as the propensity toward overfitting is magnified when the input data are heterogeneous and high dimensional and the output class is highly nonlinear. This issue can especially plague diagnostic systems that predict behavioral and psychiatric conditions that are diagnosed with subjective criteria. An emerging solution to this issue is crowdsourcing, where crowd workers are paid to annotate complex behavioral features in return for monetary compensation or a gamified experience. These labels can then be used to derive a diagnosis, either directly or by using the labels as inputs to a diagnostic machine learning model. This viewpoint describes existing work in this emerging field and discusses ongoing challenges and opportunities with crowd-powered diagnostic systems, a nascent field of study. With the correct considerations, the addition of crowdsourcing to human-in-the-loop machine learning workflows for the prediction of complex and nuanced health conditions can accelerate screening, diagnostics, and ultimately access to care. %M 38602750 %R 10.2196/51138 %U https://www.jmir.org/2024/1/e51138 %U https://doi.org/10.2196/51138 %U http://www.ncbi.nlm.nih.gov/pubmed/38602750 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e52483 %T Embracing ChatGPT for Medical Education: Exploring Its Impact on Doctors and Medical Students %A Wu,Yijun %A Zheng,Yue %A Feng,Baijie %A Yang,Yuqi %A Kang,Kai %A Zhao,Ailin %+ Department of Hematology, West China Hospital, Sichuan University, 37 Guoxue Street, Chengdu, China, 86 17888841669, irenez20@outlook.com %K artificial intelligence %K AI %K ChatGPT %K medical education %K doctors %K medical students %D 2024 %7 10.4.2024 %9 Viewpoint %J JMIR Med Educ %G English %X ChatGPT (OpenAI), a cutting-edge natural language processing model, holds immense promise for revolutionizing medical education. With its remarkable performance in language-related tasks, ChatGPT offers personalized and efficient learning experiences for medical students and doctors. Through training, it enhances clinical reasoning and decision-making skills, leading to improved case analysis and diagnosis. The model facilitates simulated dialogues, intelligent tutoring, and automated question-answering, enabling the practical application of medical knowledge. However, integrating ChatGPT into medical education raises ethical and legal concerns. Safeguarding patient data and adhering to data protection regulations are critical. Transparent communication with students, physicians, and patients is essential to ensure their understanding of the technology’s purpose and implications, as well as the potential risks and benefits. Maintaining a balance between personalized learning and face-to-face interactions is crucial to avoid hindering critical thinking and communication skills. Despite challenges, ChatGPT offers transformative opportunities. Integrating it with problem-based learning, team-based learning, and case-based learning methodologies can further enhance medical education. With proper regulation and supervision, ChatGPT can contribute to a well-rounded learning environment, nurturing skilled and knowledgeable medical professionals ready to tackle health care challenges. By emphasizing ethical considerations and human-centric approaches, ChatGPT’s potential can be fully harnessed in medical education, benefiting both students and patients alike. %M 38598263 %R 10.2196/52483 %U https://mededu.jmir.org/2024/1/e52483 %U https://doi.org/10.2196/52483 %U http://www.ncbi.nlm.nih.gov/pubmed/38598263 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 11 %N %P e55988 %T Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz’s Theory of Basic Values %A Hadar-Shoval,Dorit %A Asraf,Kfir %A Mizrachi,Yonathan %A Haber,Yuval %A Elyoseph,Zohar %+ Department of Brain Sciences, Faculty of Medicine, Imperial College London, Fulham Palace Rd, London, W6 8RF, United Kingdom, 44 547836088, Zohar.j.a@gmail.com %K large language models %K LLMs %K large language model %K LLM %K machine learning %K ML %K natural language processing %K NLP %K deep learning %K ChatGPT %K Chat-GPT %K chatbot %K chatbots %K chat-bot %K chat-bots %K Claude %K values %K Bard %K artificial intelligence %K AI %K algorithm %K algorithms %K predictive model %K predictive models %K predictive analytics %K predictive system %K practical model %K practical models %K mental health %K mental illness %K mental illnesses %K mental disease %K mental diseases %K mental disorder %K mental disorders %K mobile health %K mHealth %K eHealth %K mood disorder %K mood disorders %D 2024 %7 9.4.2024 %9 Original Paper %J JMIR Ment Health %G English %X Background: Large language models (LLMs) hold potential for mental health applications. However, their opaque alignment processes may embed biases that shape problematic perspectives. Evaluating the values embedded within LLMs that guide their decision-making have ethical importance. Schwartz’s theory of basic values (STBV) provides a framework for quantifying cultural value orientations and has shown utility for examining values in mental health contexts, including cultural, diagnostic, and therapist-client dynamics. Objective: This study aimed to (1) evaluate whether the STBV can measure value-like constructs within leading LLMs and (2) determine whether LLMs exhibit distinct value-like patterns from humans and each other. Methods: In total, 4 LLMs (Bard, Claude 2, Generative Pretrained Transformer [GPT]-3.5, GPT-4) were anthropomorphized and instructed to complete the Portrait Values Questionnaire—Revised (PVQ-RR) to assess value-like constructs. Their responses over 10 trials were analyzed for reliability and validity. To benchmark the LLMs’ value profiles, their results were compared to published data from a diverse sample of 53,472 individuals across 49 nations who had completed the PVQ-RR. This allowed us to assess whether the LLMs diverged from established human value patterns across cultural groups. Value profiles were also compared between models via statistical tests. Results: The PVQ-RR showed good reliability and validity for quantifying value-like infrastructure within the LLMs. However, substantial divergence emerged between the LLMs’ value profiles and population data. The models lacked consensus and exhibited distinct motivational biases, reflecting opaque alignment processes. For example, all models prioritized universalism and self-direction, while de-emphasizing achievement, power, and security relative to humans. Successful discriminant analysis differentiated the 4 LLMs’ distinct value profiles. Further examination found the biased value profiles strongly predicted the LLMs’ responses when presented with mental health dilemmas requiring choosing between opposing values. This provided further validation for the models embedding distinct motivational value-like constructs that shape their decision-making. Conclusions: This study leveraged the STBV to map the motivational value-like infrastructure underpinning leading LLMs. Although the study demonstrated the STBV can effectively characterize value-like infrastructure within LLMs, substantial divergence from human values raises ethical concerns about aligning these models with mental health applications. The biases toward certain cultural value sets pose risks if integrated without proper safeguards. For example, prioritizing universalism could promote unconditional acceptance even when clinically unwise. Furthermore, the differences between the LLMs underscore the need to standardize alignment processes to capture true cultural diversity. Thus, any responsible integration of LLMs into mental health care must account for their embedded biases and motivation mismatches to ensure equitable delivery across diverse populations. Achieving this will require transparency and refinement of alignment techniques to instill comprehensive human values. %M 38593424 %R 10.2196/55988 %U https://mental.jmir.org/2024/1/e55988 %U https://doi.org/10.2196/55988 %U http://www.ncbi.nlm.nih.gov/pubmed/38593424 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e53888 %T Exploring the Use and Implications of AI in Sexual and Reproductive Health and Rights: Protocol for a Scoping Review %A Tamrat,Tigest %A Zhao,Yu %A Schalet,Denise %A AlSalamah,Shada %A Pujari,Sameer %A Say,Lale %+ UNDP/UNFPA/UNICEF/WHO/World Bank Special Programme of Research, Development and Research Training in Human Reproduction, Department of Sexual and Reproductive Health and Research, World Health Organization, 20 Avenue Appia, Geneva, 1218, Switzerland, 41 22 791 4417, tamratt@who.int %K artificial intelligence %K AI %K sexual health %K reproductive health %K maternal health %K gender %K machine learning %K natural language processing %K review %K systematic documentation %K protocol %K scoping review %K electronic database %K technical consultation %K intervention %K methodology %K qualitative %K World Health Organization %K WHO %K decision-making %D 2024 %7 9.4.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Artificial intelligence (AI) has emerged as a transformative force across the health sector and has garnered significant attention within sexual and reproductive health and rights (SRHR) due to polarizing views on its opportunities to advance care and the heightened risks and implications it brings to people’s well-being and bodily autonomy. As the fields of AI and SRHR evolve, clarity is needed to bridge our understanding of how AI is being used within this historically politicized health area and raise visibility on the critical issues that can facilitate its responsible and meaningful use. Objective: This paper presents the protocol for a scoping review to synthesize empirical studies that focus on the intersection of AI and SRHR. The review aims to identify the characteristics of AI systems and tools applied within SRHR, regarding health domains, intended purpose, target users, AI data life cycle, and evidence on benefits and harms. Methods: The scoping review follows the standard methodology developed by Arksey and O’Malley. We will search the following electronic databases: MEDLINE (PubMed), Scopus, Web of Science, and CINAHL. Inclusion criteria comprise the use of AI systems and tools in sexual and reproductive health and clear methodology describing either quantitative or qualitative approaches, including program descriptions. Studies will be excluded if they focus entirely on digital interventions that do not explicitly use AI systems and tools, are about robotics or nonhuman subjects, or are commentaries. We will not exclude articles based on geographic location, language, or publication date. The study will present the uses of AI across sexual and reproductive health domains, the intended purpose of the AI system and tools, and maturity within the AI life cycle. Outcome measures will be reported on the effect, accuracy, acceptability, resource use, and feasibility of studies that have deployed and evaluated AI systems and tools. Ethical and legal considerations, as well as findings from qualitative studies, will be synthesized through a narrative thematic analysis. We will use the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) format for the publication of the findings. Results: The database searches resulted in 12,793 records when the searches were conducted in October 2023. Screening is underway, and the analysis is expected to be completed by July 2024. Conclusions: The findings will provide key insights on usage patterns and evidence on the use of AI in SRHR, as well as convey key ethical, safety, and legal considerations. The outcomes of this scoping review are contributing to a technical brief developed by the World Health Organization and will guide future research and practice in this highly charged area of work. Trial Registration: OSF Registries osf.io/ma4d9; https://osf.io/ma4d9 International Registered Report Identifier (IRRID): PRR1-10.2196/53888 %M 38593433 %R 10.2196/53888 %U https://www.researchprotocols.org/2024/1/e53888 %U https://doi.org/10.2196/53888 %U http://www.ncbi.nlm.nih.gov/pubmed/38593433 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e55627 %T Evaluating ChatGPT-4’s Diagnostic Accuracy: Impact of Visual Data Integration %A Hirosawa,Takanobu %A Harada,Yukinori %A Tokumasu,Kazuki %A Ito,Takahiro %A Suzuki,Tomoharu %A Shimizu,Taro %+ Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, 880 Kitakobayashi, Mibu-cho, Shimotsuga, 321-0293, Japan, 81 282 87 2498, hirosawa@dokkyomed.ac.jp %K artificial intelligence %K large language model %K LLM %K LLMs %K language model %K language models %K ChatGPT %K GPT %K ChatGPT-4V %K ChatGPT-4 Vision %K clinical decision support %K natural language processing %K decision support %K NLP %K diagnostic excellence %K diagnosis %K diagnoses %K diagnose %K diagnostic %K diagnostics %K image %K images %K imaging %D 2024 %7 9.4.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: In the evolving field of health care, multimodal generative artificial intelligence (AI) systems, such as ChatGPT-4 with vision (ChatGPT-4V), represent a significant advancement, as they integrate visual data with text data. This integration has the potential to revolutionize clinical diagnostics by offering more comprehensive analysis capabilities. However, the impact on diagnostic accuracy of using image data to augment ChatGPT-4 remains unclear. Objective: This study aims to assess the impact of adding image data on ChatGPT-4’s diagnostic accuracy and provide insights into how image data integration can enhance the accuracy of multimodal AI in medical diagnostics. Specifically, this study endeavored to compare the diagnostic accuracy between ChatGPT-4V, which processed both text and image data, and its counterpart, ChatGPT-4, which only uses text data. Methods: We identified a total of 557 case reports published in the American Journal of Case Reports from January 2022 to March 2023. After excluding cases that were nondiagnostic, pediatric, and lacking image data, we included 363 case descriptions with their final diagnoses and associated images. We compared the diagnostic accuracy of ChatGPT-4V and ChatGPT-4 without vision based on their ability to include the final diagnoses within differential diagnosis lists. Two independent physicians evaluated their accuracy, with a third resolving any discrepancies, ensuring a rigorous and objective analysis. Results: The integration of image data into ChatGPT-4V did not significantly enhance diagnostic accuracy, showing that final diagnoses were included in the top 10 differential diagnosis lists at a rate of 85.1% (n=309), comparable to the rate of 87.9% (n=319) for the text-only version (P=.33). Notably, ChatGPT-4V’s performance in correctly identifying the top diagnosis was inferior, at 44.4% (n=161), compared with 55.9% (n=203) for the text-only version (P=.002, χ2 test). Additionally, ChatGPT-4’s self-reports showed that image data accounted for 30% of the weight in developing the differential diagnosis lists in more than half of cases. Conclusions: Our findings reveal that currently, ChatGPT-4V predominantly relies on textual data, limiting its ability to fully use the diagnostic potential of visual information. This study underscores the need for further development of multimodal generative AI systems to effectively integrate and use clinical image data. Enhancing the diagnostic performance of such AI systems through improved multimodal data integration could significantly benefit patient care by providing more accurate and comprehensive diagnostic insights. Future research should focus on overcoming these limitations, paving the way for the practical application of advanced AI in medicine. %M 38592758 %R 10.2196/55627 %U https://medinform.jmir.org/2024/1/e55627 %U https://doi.org/10.2196/55627 %U http://www.ncbi.nlm.nih.gov/pubmed/38592758 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e55318 %T An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study %A Sivarajkumar,Sonish %A Kelley,Mark %A Samolyk-Mazzanti,Alyssa %A Visweswaran,Shyam %A Wang,Yanshan %+ Department of Health Information Management, University of Pittsburgh, 6026 Forbes Tower, Pittsburgh, PA, 15260, United States, 1 4123832712, yanshan.wang@pitt.edu %K large language model %K LLM %K LLMs %K natural language processing %K NLP %K in-context learning %K prompt engineering %K evaluation %K zero-shot %K few shot %K prompting %K GPT %K language model %K language %K models %K machine learning %K clinical data %K clinical information %K extraction %K BARD %K Gemini %K LLaMA-2 %K heuristic %K prompt %K prompts %K ensemble %D 2024 %7 8.4.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Large language models (LLMs) have shown remarkable capabilities in natural language processing (NLP), especially in domains where labeled data are scarce or expensive, such as the clinical domain. However, to unlock the clinical knowledge hidden in these LLMs, we need to design effective prompts that can guide them to perform specific clinical NLP tasks without any task-specific training data. This is known as in-context learning, which is an art and science that requires understanding the strengths and weaknesses of different LLMs and prompt engineering approaches. Objective: The objective of this study is to assess the effectiveness of various prompt engineering techniques, including 2 newly introduced types—heuristic and ensemble prompts, for zero-shot and few-shot clinical information extraction using pretrained language models. Methods: This comprehensive experimental study evaluated different prompt types (simple prefix, simple cloze, chain of thought, anticipatory, heuristic, and ensemble) across 5 clinical NLP tasks: clinical sense disambiguation, biomedical evidence extraction, coreference resolution, medication status extraction, and medication attribute extraction. The performance of these prompts was assessed using 3 state-of-the-art language models: GPT-3.5 (OpenAI), Gemini (Google), and LLaMA-2 (Meta). The study contrasted zero-shot with few-shot prompting and explored the effectiveness of ensemble approaches. Results: The study revealed that task-specific prompt tailoring is vital for the high performance of LLMs for zero-shot clinical NLP. In clinical sense disambiguation, GPT-3.5 achieved an accuracy of 0.96 with heuristic prompts and 0.94 in biomedical evidence extraction. Heuristic prompts, alongside chain of thought prompts, were highly effective across tasks. Few-shot prompting improved performance in complex scenarios, and ensemble approaches capitalized on multiple prompt strengths. GPT-3.5 consistently outperformed Gemini and LLaMA-2 across tasks and prompt types. Conclusions: This study provides a rigorous evaluation of prompt engineering methodologies and introduces innovative techniques for clinical information extraction, demonstrating the potential of in-context learning in the clinical domain. These findings offer clear guidelines for future prompt-based clinical NLP research, facilitating engagement by non-NLP experts in clinical NLP advancements. To the best of our knowledge, this is one of the first works on the empirical evaluation of different prompt engineering approaches for clinical NLP in this era of generative artificial intelligence, and we hope that it will inspire and inform future research in this area. %M 38587879 %R 10.2196/55318 %U https://medinform.jmir.org/2024/1/e55318 %U https://doi.org/10.2196/55318 %U http://www.ncbi.nlm.nih.gov/pubmed/38587879 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52935 %T Evaluation of Large Language Model Performance and Reliability for Citations and References in Scholarly Writing: Cross-Disciplinary Study %A Mugaanyi,Joseph %A Cai,Liuying %A Cheng,Sumei %A Lu,Caide %A Huang,Jing %+ Department of Hepato-Pancreato-Biliary Surgery, Ningbo Medical Center Lihuili Hospital, Health Science Center, Ningbo University, No 1111 Jiangnan Road, Ningbo, 315000, China, 86 13819803591, huangjingonline@163.com %K large language models %K accuracy %K academic writing %K AI %K cross-disciplinary evaluation %K scholarly writing %K ChatGPT %K GPT-3.5 %K writing tool %K scholarly %K academic discourse %K LLMs %K machine learning algorithms %K NLP %K natural language processing %K citations %K references %K natural science %K humanities %K chatbot %K artificial intelligence %D 2024 %7 5.4.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Large language models (LLMs) have gained prominence since the release of ChatGPT in late 2022. Objective: The aim of this study was to assess the accuracy of citations and references generated by ChatGPT (GPT-3.5) in two distinct academic domains: the natural sciences and humanities. Methods: Two researchers independently prompted ChatGPT to write an introduction section for a manuscript and include citations; they then evaluated the accuracy of the citations and Digital Object Identifiers (DOIs). Results were compared between the two disciplines. Results: Ten topics were included, including 5 in the natural sciences and 5 in the humanities. A total of 102 citations were generated, with 55 in the natural sciences and 47 in the humanities. Among these, 40 citations (72.7%) in the natural sciences and 36 citations (76.6%) in the humanities were confirmed to exist (P=.42). There were significant disparities found in DOI presence in the natural sciences (39/55, 70.9%) and the humanities (18/47, 38.3%), along with significant differences in accuracy between the two disciplines (18/55, 32.7% vs 4/47, 8.5%). DOI hallucination was more prevalent in the humanities (42/55, 89.4%). The Levenshtein distance was significantly higher in the humanities than in the natural sciences, reflecting the lower DOI accuracy. Conclusions: ChatGPT’s performance in generating citations and references varies across disciplines. Differences in DOI standards and disciplinary nuances contribute to performance variations. Researchers should consider the strengths and limitations of artificial intelligence writing tools with respect to citation accuracy. The use of domain-specific models may enhance accuracy. %M 38578685 %R 10.2196/52935 %U https://www.jmir.org/2024/1/e52935 %U https://doi.org/10.2196/52935 %U http://www.ncbi.nlm.nih.gov/pubmed/38578685 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e53367 %T Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection From Physician Notes: Retrospective Cohort Study %A McMurry,Andrew J %A Zipursky,Amy R %A Geva,Alon %A Olson,Karen L %A Jones,James R %A Ignatov,Vladimir %A Miller,Timothy A %A Mandl,Kenneth D %+ Computational Health Informatics Program, Boston Children's Hospital, Landmark 5506 Mail Stop BCH3187, 401 Park Drive, Boston, MA, 02215, United States, 1 6173554145, kenneth_mandl@harvard.edu %K natural language processing %K COVID-19 %K artificial intelligence %K AI %K public health, biosurveillance %K surveillance %K respiratory %K infectious %K pulmonary %K SARS-CoV-2 %K symptom %K symptoms %K detect %K detection %K pipeline %K pipelines %K clinical note %K clinical notes %K documentation %K emergency %K urgent %K pediatric %K pediatrics %K paediatric %K paediatrics %K child %K children %K youth %K adolescent %K adolescents %K teen %K teens %K teenager %K teenagers %K diagnose %K diagnosis %K diagnostic %K diagnostics %D 2024 %7 4.4.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Real-time surveillance of emerging infectious diseases necessitates a dynamically evolving, computable case definition, which frequently incorporates symptom-related criteria. For symptom detection, both population health monitoring platforms and research initiatives primarily depend on structured data extracted from electronic health records. Objective: This study sought to validate and test an artificial intelligence (AI)–based natural language processing (NLP) pipeline for detecting COVID-19 symptoms from physician notes in pediatric patients. We specifically study patients presenting to the emergency department (ED) who can be sentinel cases in an outbreak. Methods: Subjects in this retrospective cohort study are patients who are 21 years of age and younger, who presented to a pediatric ED at a large academic children’s hospital between March 1, 2020, and May 31, 2022. The ED notes for all patients were processed with an NLP pipeline tuned to detect the mention of 11 COVID-19 symptoms based on Centers for Disease Control and Prevention (CDC) criteria. For a gold standard, 3 subject matter experts labeled 226 ED notes and had strong agreement (F1-score=0.986; positive predictive value [PPV]=0.972; and sensitivity=1.0). F1-score, PPV, and sensitivity were used to compare the performance of both NLP and the International Classification of Diseases, 10th Revision (ICD-10) coding to the gold standard chart review. As a formative use case, variations in symptom patterns were measured across SARS-CoV-2 variant eras. Results: There were 85,678 ED encounters during the study period, including 4% (n=3420) with patients with COVID-19. NLP was more accurate at identifying encounters with patients that had any of the COVID-19 symptoms (F1-score=0.796) than ICD-10 codes (F1-score =0.451). NLP accuracy was higher for positive symptoms (sensitivity=0.930) than ICD-10 (sensitivity=0.300). However, ICD-10 accuracy was higher for negative symptoms (specificity=0.994) than NLP (specificity=0.917). Congestion or runny nose showed the highest accuracy difference (NLP: F1-score=0.828 and ICD-10: F1-score=0.042). For encounters with patients with COVID-19, prevalence estimates of each NLP symptom differed across variant eras. Patients with COVID-19 were more likely to have each NLP symptom detected than patients without this disease. Effect sizes (odds ratios) varied across pandemic eras. Conclusions: This study establishes the value of AI-based NLP as a highly effective tool for real-time COVID-19 symptom detection in pediatric patients, outperforming traditional ICD-10 methods. It also reveals the evolving nature of symptom prevalence across different virus variants, underscoring the need for dynamic, technology-driven approaches in infectious disease surveillance. %M 38573752 %R 10.2196/53367 %U https://www.jmir.org/2024/1/e53367 %U https://doi.org/10.2196/53367 %U http://www.ncbi.nlm.nih.gov/pubmed/38573752 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e54787 %T Understanding Physician’s Perspectives on AI in Health Care: Protocol for a Sequential Multiple Assignment Randomized Vignette Study %A Kim,Jane Paik %A Yang,Hyun-Joon %A Kim,Bohye %A Ryan,Katie %A Roberts,Laura Weiss %+ Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, 1070 Arastradero Rd, Stanford, CA, 94304, United States, 1 650 736 8996, janepkim@stanford.edu %K AI-based clinical decision support %K decision-making %K hypothetical vignettes %K physician perspective %K web-based survey %K hypothesis-driven research %K ethics %K stakeholder attitudes %D 2024 %7 4.4.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: As the availability and performance of artificial intelligence (AI)–based clinical decision support (CDS) systems improve, physicians and other care providers poised to be on the front lines will be increasingly tasked with using these tools in patient care and incorporating their outputs into clinical decision-making processes. Vignette studies provide a means to explore emerging hypotheses regarding how context-specific factors, such as clinical risk, the amount of information provided about the AI, and the AI result, may impact physician acceptance and use of AI-based CDS tools. To best anticipate how such factors influence the decision-making of frontline physicians in clinical scenarios involving AI decision-support tools, hypothesis-driven research is needed that enables scenario testing before the implementation and deployment of these tools. Objective: This study’s objectives are to (1) design an original, web-based vignette-based survey that features hypothetical scenarios based on emerging or real-world applications of AI-based CDS systems that will vary systematically by features related to clinical risk, the amount of information provided about the AI, and the AI result; and (2) test and determine causal effects of specific factors on the judgments and perceptions salient to physicians’ clinical decision-making. Methods: US-based physicians with specialties in family or internal medicine will be recruited through email and mail (target n=420). Through a web-based survey, participants will be randomized to a 3-part “sequential multiple assignment randomization trial (SMART) vignette” detailing a hypothetical clinical scenario involving an AI decision support tool. The SMART vignette design is similar to the SMART design but adapted to a survey design. Each respondent will be randomly assigned to 1 of the possible vignette variations of the factors we are testing at each stage, which include the level of clinical risk, the amount of information provided about the AI, and the certainty of the AI output. Respondents will be given questions regarding their hypothetical decision-making in response to the hypothetical scenarios. Results: The study is currently in progress and data collection is anticipated to be completed in 2024. Conclusions: The web-based vignette study will provide information on how contextual factors such as clinical risk, the amount of information provided about an AI tool, and the AI result influence physicians’ reactions to hypothetical scenarios that are based on emerging applications of AI in frontline health care settings. Our newly proposed “SMART vignette” design offers several benefits not afforded by the extensively used traditional vignette design, due to the 2 aforementioned features. These advantages are (1) increased validity of analyses targeted at understanding the impact of a factor on the decision outcome, given previous outcomes and other contextual factors; and (2) balanced sample sizes across groups. This study will generate a better understanding of physician decision-making within this context. International Registered Report Identifier (IRRID): DERR1-10.2196/54787 %M 38573756 %R 10.2196/54787 %U https://www.researchprotocols.org/2024/1/e54787 %U https://doi.org/10.2196/54787 %U http://www.ncbi.nlm.nih.gov/pubmed/38573756 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e49643 %T Data-Driven Identification of Factors That Influence the Quality of Adverse Event Reports: 15-Year Interpretable Machine Learning and Time-Series Analyses of VigiBase and QUEST %A Choo,Sim Mei %A Sartori,Daniele %A Lee,Sing Chet %A Yang,Hsuan-Chia %A Syed-Abdul,Shabbir %+ Graduate Institute of Biomedical Informatics, Taipei Medical University, 301 Yuantong Rd, , Taipei, 235, Taiwan, 886 66202589 ext 10930, drshabbir@tmu.edu.tw %K pharmacovigilance %K medication safety %K big data analysis %K feature selection %K interpretable machine learning %D 2024 %7 3.4.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: The completeness of adverse event (AE) reports, crucial for assessing putative causal relationships, is measured using the vigiGrade completeness score in VigiBase, the World Health Organization global database of reported potential AEs. Malaysian reports have surpassed the global average score (approximately 0.44), achieving a 5-year average of 0.79 (SD 0.23) as of 2019 and approaching the benchmark for well-documented reports (0.80). However, the contributing factors to this relatively high report completeness score remain unexplored. Objective: This study aims to explore the main drivers influencing the completeness of Malaysian AE reports in VigiBase over a 15-year period using vigiGrade. A secondary objective was to understand the strategic measures taken by the Malaysian authorities leading to enhanced report completeness across different time frames. Methods: We analyzed 132,738 Malaysian reports (2005-2019) recorded in VigiBase up to February 2021 split into historical International Drug Information System (INTDIS; n=63,943, 48.17% in 2005-2016) and newer E2B (n=68,795, 51.83% in 2015-2019) format subsets. For machine learning analyses, we performed a 2-stage feature selection followed by a random forest classifier to identify the top features predicting well-documented reports. We subsequently applied tree Shapley additive explanations to examine the magnitude, prevalence, and direction of feature effects. In addition, we conducted time-series analyses to evaluate chronological trends and potential influences of key interventions on reporting quality. Results: Among the analyzed reports, 42.84% (56,877/132,738) were well documented, with an increase of 65.37% (53,929/82,497) since 2015. Over two-thirds (46,186/68,795, 67.14%) of the Malaysian E2B reports were well documented compared to INTDIS reports at 16.72% (10,691/63,943). For INTDIS reports, higher pharmacovigilance center staffing was the primary feature positively associated with being well documented. In recent E2B reports, the top positive features included reaction abated upon drug dechallenge, reaction onset or drug use duration of <1 week, dosing interval of <1 day, reports from public specialist hospitals, reports by pharmacists, and reaction duration between 1 and 6 days. In contrast, reports from product registration holders and other health care professionals and reactions involving product substitution issues negatively affected the quality of E2B reports. Multifaceted strategies and interventions comprising policy changes, continuity of education, and human resource development laid the groundwork for AE reporting in Malaysia, whereas advancements in technological infrastructure, pharmacovigilance databases, and reporting tools concurred with increases in both the quantity and quality of AE reports. Conclusions: Through interpretable machine learning and time-series analyses, this study identified key features that positively or negatively influence the completeness of Malaysian AE reports and unveiled how Malaysia has developed its pharmacovigilance capacity via multifaceted strategies and interventions. These findings will guide future work in enhancing pharmacovigilance and public health. %M 38568722 %R 10.2196/49643 %U https://medinform.jmir.org/2024/1/e49643 %U https://doi.org/10.2196/49643 %U http://www.ncbi.nlm.nih.gov/pubmed/38568722 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e52289 %T Mining Clinical Notes for Physical Rehabilitation Exercise Information: Natural Language Processing Algorithm Development and Validation Study %A Sivarajkumar,Sonish %A Gao,Fengyi %A Denny,Parker %A Aldhahwani,Bayan %A Visweswaran,Shyam %A Bove,Allyn %A Wang,Yanshan %+ Department of Health Information Management, University of Pittsburgh, 6026 Forbes Tower, Pittsburgh, PA, 15260, United States, 1 4123832712, yanshan.wang@pitt.edu %K natural language processing %K electronic health records %K rehabilitation %K physical exercise %K ChatGPT %K artificial intelligence %K stroke %K physical rehabilitation %K rehabilitation therapy %K exercise %K machine learning %D 2024 %7 3.4.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: The rehabilitation of a patient who had a stroke requires precise, personalized treatment plans. Natural language processing (NLP) offers the potential to extract valuable exercise information from clinical notes, aiding in the development of more effective rehabilitation strategies. Objective: This study aims to develop and evaluate a variety of NLP algorithms to extract and categorize physical rehabilitation exercise information from the clinical notes of patients who had a stroke treated at the University of Pittsburgh Medical Center. Methods: A cohort of 13,605 patients diagnosed with stroke was identified, and their clinical notes containing rehabilitation therapy notes were retrieved. A comprehensive clinical ontology was created to represent various aspects of physical rehabilitation exercises. State-of-the-art NLP algorithms were then developed and compared, including rule-based, machine learning–based algorithms (support vector machine, logistic regression, gradient boosting, and AdaBoost) and large language model (LLM)–based algorithms (ChatGPT [OpenAI]). The study focused on key performance metrics, particularly F1-scores, to evaluate algorithm effectiveness. Results: The analysis was conducted on a data set comprising 23,724 notes with detailed demographic and clinical characteristics. The rule-based NLP algorithm demonstrated superior performance in most areas, particularly in detecting the “Right Side” location with an F1-score of 0.975, outperforming gradient boosting by 0.063. Gradient boosting excelled in “Lower Extremity” location detection (F1-score: 0.978), surpassing rule-based NLP by 0.023. It also showed notable performance in the “Passive Range of Motion” detection with an F1-score of 0.970, a 0.032 improvement over rule-based NLP. The rule-based algorithm efficiently handled “Duration,” “Sets,” and “Reps” with F1-scores up to 0.65. LLM-based NLP, particularly ChatGPT with few-shot prompts, achieved high recall but generally lower precision and F1-scores. However, it notably excelled in “Backward Plane” motion detection, achieving an F1-score of 0.846, surpassing the rule-based algorithm’s 0.720. Conclusions: The study successfully developed and evaluated multiple NLP algorithms, revealing the strengths and weaknesses of each in extracting physical rehabilitation exercise information from clinical notes. The detailed ontology and the robust performance of the rule-based and gradient boosting algorithms demonstrate significant potential for enhancing precision rehabilitation. These findings contribute to the ongoing efforts to integrate advanced NLP techniques into health care, moving toward predictive models that can recommend personalized rehabilitation treatments for optimal patient outcomes. %M 38568736 %R 10.2196/52289 %U https://medinform.jmir.org/2024/1/e52289 %U https://doi.org/10.2196/52289 %U http://www.ncbi.nlm.nih.gov/pubmed/38568736 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e48862 %T Interpretable Deep Learning System for Identifying Critical Patients Through the Prediction of Triage Level, Hospitalization, and Length of Stay: Prospective Study %A Lin,Yu-Ting %A Deng,Yuan-Xiang %A Tsai,Chu-Lin %A Huang,Chien-Hua %A Fu,Li-Chen %+ Department of Computer Science and Information Engineering, National Taiwan University, CSIE Der Tian Hall No. 1, Sec. 4, Roosevelt Road, Taipei, 106319, Taiwan, 886 935545846, lichen@ntu.edu.tw %K emergency department %K triage system %K hospital admission %K length of stay %K multimodal integration %D 2024 %7 1.4.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Triage is the process of accurately assessing patients’ symptoms and providing them with proper clinical treatment in the emergency department (ED). While many countries have developed their triage process to stratify patients’ clinical severity and thus distribute medical resources, there are still some limitations of the current triage process. Since the triage level is mainly identified by experienced nurses based on a mix of subjective and objective criteria, mis-triage often occurs in the ED. It can not only cause adverse effects on patients, but also impose an undue burden on the health care delivery system. Objective: Our study aimed to design a prediction system based on triage information, including demographics, vital signs, and chief complaints. The proposed system can not only handle heterogeneous data, including tabular data and free-text data, but also provide interpretability for better acceptance by the ED staff in the hospital. Methods: In this study, we proposed a system comprising 3 subsystems, with each of them handling a single task, including triage level prediction, hospitalization prediction, and length of stay prediction. We used a large amount of retrospective data to pretrain the model, and then, we fine-tuned the model on a prospective data set with a golden label. The proposed deep learning framework was built with TabNet and MacBERT (Chinese version of bidirectional encoder representations from transformers [BERT]). Results: The performance of our proposed model was evaluated on data collected from the National Taiwan University Hospital (901 patients were included). The model achieved promising results on the collected data set, with accuracy values of 63%, 82%, and 71% for triage level prediction, hospitalization prediction, and length of stay prediction, respectively. Conclusions: Our system improved the prediction of 3 different medical outcomes when compared with other machine learning methods. With the pretrained vital sign encoder and repretrained mask language modeling MacBERT encoder, our multimodality model can provide a deeper insight into the characteristics of electronic health records. Additionally, by providing interpretability, we believe that the proposed system can assist nursing staff and physicians in taking appropriate medical decisions. %M 38557661 %R 10.2196/48862 %U https://medinform.jmir.org/2024/1/e48862 %U https://doi.org/10.2196/48862 %U http://www.ncbi.nlm.nih.gov/pubmed/38557661 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e54857 %T Investigating the Impact of AI on Shared Decision-Making in Post-Kidney Transplant Care (PRIMA-AI): Protocol for a Randomized Controlled Trial %A Osmanodja,Bilgin %A Sassi,Zeineb %A Eickmann,Sascha %A Hansen,Carla Maria %A Roller,Roland %A Burchardt,Aljoscha %A Samhammer,David %A Dabrock,Peter %A Möller,Sebastian %A Budde,Klemens %A Herrmann,Anne %+ Department of Nephrology and Medical Intensive Care, Charité – Universitätsmedizin Berlin, Charitéplatz 1, Berlin, 10117, Germany, 49 30450614368, bilgin.osmanodja@charite.de %K shared decision-making %K SDM %K kidney transplantation %K artificial intelligence %K AI %K decision-support system %K DSS %K qualitative research %D 2024 %7 1.4.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Patients after kidney transplantation eventually face the risk of graft loss with the concomitant need for dialysis or retransplantation. Choosing the right kidney replacement therapy after graft loss is an important preference-sensitive decision for kidney transplant recipients. However, the rate of conversations about treatment options after kidney graft loss has been shown to be as low as 13% in previous studies. It is unknown whether the implementation of artificial intelligence (AI)–based risk prediction models can increase the number of conversations about treatment options after graft loss and how this might influence the associated shared decision-making (SDM). Objective: This study aims to explore the impact of AI-based risk prediction for the risk of graft loss on the frequency of conversations about the treatment options after graft loss, as well as the associated SDM process. Methods: This is a 2-year, prospective, randomized, 2-armed, parallel-group, single-center trial in a German kidney transplant center. All patients will receive the same routine post–kidney transplant care that usually includes follow-up visits every 3 months at the kidney transplant center. For patients in the intervention arm, physicians will be assisted by a validated and previously published AI-based risk prediction system that estimates the risk for graft loss in the next year, starting from 3 months after randomization until 24 months after randomization. The study population will consist of 122 kidney transplant recipients >12 months after transplantation, who are at least 18 years of age, are able to communicate in German, and have an estimated glomerular filtration rate <30 mL/min/1.73 m2. Patients with multi-organ transplantation, or who are not able to communicate in German, as well as underage patients, cannot participate. For the primary end point, the proportion of patients who have had a conversation about their treatment options after graft loss is compared at 12 months after randomization. Additionally, 2 different assessment tools for SDM, the CollaboRATE mean score and the Control Preference Scale, are compared between the 2 groups at 12 months and 24 months after randomization. Furthermore, recordings of patient-physician conversations, as well as semistructured interviews with patients, support persons, and physicians, are performed to support the quantitative results. Results: The enrollment for the study is ongoing. The first results are expected to be submitted for publication in 2025. Conclusions: This is the first study to examine the influence of AI-based risk prediction on physician-patient interaction in the context of kidney transplantation. We use a mixed methods approach by combining a randomized design with a simple quantitative end point (frequency of conversations), different quantitative measurements for SDM, and several qualitative research methods (eg, records of physician-patient conversations and semistructured interviews) to examine the implementation of AI-based risk prediction in the clinic. Trial Registration: ClinicalTrials.gov NCT06056518; https://clinicaltrials.gov/study/NCT06056518 International Registered Report Identifier (IRRID): PRR1-10.2196/54857 %M 38557315 %R 10.2196/54857 %U https://www.researchprotocols.org/2024/1/e54857 %U https://doi.org/10.2196/54857 %U http://www.ncbi.nlm.nih.gov/pubmed/38557315 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e53343 %T Identification of Predictors for Clinical Deterioration in Patients With COVID-19 via Electronic Nursing Records: Retrospective Observational Study %A Sung,Sumi %A Kim,Youlim %A Kim,Su Hwan %A Jung,Hyesil %+ Department of Nursing, College of Medicine, Inha University, 100 Inha-ro, Michuhol-gu, Incheon, 22212, Republic of Korea, 82 32 860 8206, hsjung@inha.ac.kr %K COVID-19 %K infectious %K respiratory %K SARS-CoV-2 %K nursing records %K SNOMED CT %K random forest %K logistic regression %K EHR %K EHRs %K machine learning %K documentation %K deterioration %K health records %K health record %K patient record %K patient records %K nursing %K standardization %K standard %K standards %K standardized %K standardize %K nomenclature %K term %K terms %K terminologies %K terminology %D 2024 %7 29.3.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Few studies have used standardized nursing records with Systematized Nomenclature of Medicine–Clinical Terms (SNOMED CT) to identify predictors of clinical deterioration. Objective: This study aims to standardize the nursing documentation records of patients with COVID-19 using SNOMED CT and identify predictive factors of clinical deterioration in patients with COVID-19 via standardized nursing records. Methods: In this study, 57,558 nursing statements from 226 patients with COVID-19 were analyzed. Among these, 45,852 statements were from 207 patients in the stable (control) group and 11,706 from 19 patients in the exacerbated (case) group who were transferred to the intensive care unit within 7 days. The data were collected between December 2019 and June 2022. These nursing statements were standardized using the SNOMED CT International Edition released on November 30, 2022. The 260 unique nursing statements that accounted for the top 90% of 57,558 statements were selected as the mapping source and mapped into SNOMED CT concepts based on their meaning by 2 experts with more than 5 years of SNOMED CT mapping experience. To identify the main features of nursing statements associated with the exacerbation of patient condition, random forest algorithms were used, and optimal hyperparameters were selected for nursing problems or outcomes and nursing procedure–related statements. Additionally, logistic regression analysis was conducted to identify features that determine clinical deterioration in patients with COVID-19. Results: All nursing statements were semantically mapped to SNOMED CT concepts for “clinical finding,” “situation with explicit context,” and “procedure” hierarchies. The interrater reliability of the mapping results was 87.7%. The most important features calculated by random forest were “oxygen saturation below reference range,” “dyspnea,” “tachypnea,” and “cough” in “clinical finding,” and “oxygen therapy,” “pulse oximetry monitoring,” “temperature taking,” “notification of physician,” and “education about isolation for infection control” in “procedure.” Among these, “dyspnea” and “inadequate food diet” in “clinical finding” increased clinical deterioration risk (dyspnea: odds ratio [OR] 5.99, 95% CI 2.25-20.29; inadequate food diet: OR 10.0, 95% CI 2.71-40.84), and “oxygen therapy” and “notification of physician” in “procedure” also increased the risk of clinical deterioration in patients with COVID-19 (oxygen therapy: OR 1.89, 95% CI 1.25-3.05; notification of physician: OR 1.72, 95% CI 1.02-2.97). Conclusions: The study used SNOMED CT to express and standardize nursing statements. Further, it revealed the importance of standardized nursing records as predictive variables for clinical deterioration in patients. %M 38414056 %R 10.2196/53343 %U https://www.jmir.org/2024/1/e53343 %U https://doi.org/10.2196/53343 %U http://www.ncbi.nlm.nih.gov/pubmed/38414056 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54580 %T An Entity Extraction Pipeline for Medical Text Records Using Large Language Models: Analytical Study %A Wang,Lei %A Ma,Yinyao %A Bi,Wenshuai %A Lv,Hanlin %A Li,Yuxiang %+ BGI Research, 1-2F, Building 2, Wuhan Optics Valley International Biomedical Enterprise Accelerator Phase 3.1, No 388 Gaoxin Road 2, Donghu New Technology Development Zone, Wuhan, 430074, China, 86 18707190886, lvhanlin@genomics.cn %K clinical data extraction %K large language models %K feature hallucination %K modular approach %K unstructured data processing %D 2024 %7 29.3.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: The study of disease progression relies on clinical data, including text data, and extracting valuable features from text data has been a research hot spot. With the rise of large language models (LLMs), semantic-based extraction pipelines are gaining acceptance in clinical research. However, the security and feature hallucination issues of LLMs require further attention. Objective: This study aimed to introduce a novel modular LLM pipeline, which could semantically extract features from textual patient admission records. Methods: The pipeline was designed to process a systematic succession of concept extraction, aggregation, question generation, corpus extraction, and question-and-answer scale extraction, which was tested via 2 low-parameter LLMs: Qwen-14B-Chat (QWEN) and Baichuan2-13B-Chat (BAICHUAN). A data set of 25,709 pregnancy cases from the People’s Hospital of Guangxi Zhuang Autonomous Region, China, was used for evaluation with the help of a local expert’s annotation. The pipeline was evaluated with the metrics of accuracy and precision, null ratio, and time consumption. Additionally, we evaluated its performance via a quantified version of Qwen-14B-Chat on a consumer-grade GPU. Results: The pipeline demonstrates a high level of precision in feature extraction, as evidenced by the accuracy and precision results of Qwen-14B-Chat (95.52% and 92.93%, respectively) and Baichuan2-13B-Chat (95.86% and 90.08%, respectively). Furthermore, the pipeline exhibited low null ratios and variable time consumption. The INT4-quantified version of QWEN delivered an enhanced performance with 97.28% accuracy and a 0% null ratio. Conclusions: The pipeline exhibited consistent performance across different LLMs and efficiently extracted clinical features from textual data. It also showed reliable performance on consumer-grade hardware. This approach offers a viable and effective solution for mining clinical research data from textual records. %M 38551633 %R 10.2196/54580 %U https://www.jmir.org/2024/1/e54580 %U https://doi.org/10.2196/54580 %U http://www.ncbi.nlm.nih.gov/pubmed/38551633 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e47652 %T Privacy-Preserving Federated Survival Support Vector Machines for Cross-Institutional Time-To-Event Analysis: Algorithm Development and Validation %A Späth,Julian %A Sewald,Zeno %A Probul,Niklas %A Berland,Magali %A Almeida,Mathieu %A Pons,Nicolas %A Le Chatelier,Emmanuelle %A Ginès,Pere %A Solé,Cristina %A Juanola,Adrià %A Pauling,Josch %A Baumbach,Jan %+ Institute for Computational Systems Biology, University of Hamburg, Notkestrasse 9, Hamburg, 22607, Germany, 49 15750665331, julian.alexander.spaeth@uni-hamburg.de %K federated learning %K survival analysis %K support vector machine %K machine learning %K federated %K algorithm %K survival %K FeatureCloud %K predict %K predictive %K prediction %K predictions %K Implementation science %K Implementation %K centralized model %K privacy regulation %D 2024 %7 29.3.2024 %9 Original Paper %J JMIR AI %G English %X Background: Central collection of distributed medical patient data is problematic due to strict privacy regulations. Especially in clinical environments, such as clinical time-to-event studies, large sample sizes are critical but usually not available at a single institution. It has been shown recently that federated learning, combined with privacy-enhancing technologies, is an excellent and privacy-preserving alternative to data sharing. Objective: This study aims to develop and validate a privacy-preserving, federated survival support vector machine (SVM) and make it accessible for researchers to perform cross-institutional time-to-event analyses. Methods: We extended the survival SVM algorithm to be applicable in federated environments. We further implemented it as a FeatureCloud app, enabling it to run in the federated infrastructure provided by the FeatureCloud platform. Finally, we evaluated our algorithm on 3 benchmark data sets, a large sample size synthetic data set, and a real-world microbiome data set and compared the results to the corresponding central method. Results: Our federated survival SVM produces highly similar results to the centralized model on all data sets. The maximal difference between the model weights of the central model and the federated model was only 0.001, and the mean difference over all data sets was 0.0002. We further show that by including more data in the analysis through federated learning, predictions are more accurate even in the presence of site-dependent batch effects. Conclusions: The federated survival SVM extends the palette of federated time-to-event analysis methods by a robust machine learning approach. To our knowledge, the implemented FeatureCloud app is the first publicly available implementation of a federated survival SVM, is freely accessible for all kinds of researchers, and can be directly used within the FeatureCloud platform. %M 38875678 %R 10.2196/47652 %U https://ai.jmir.org/2024/1/e47652 %U https://doi.org/10.2196/47652 %U http://www.ncbi.nlm.nih.gov/pubmed/38875678 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e52566 %T AI Analysis of General Medicine in Japan: Present and Future Considerations %A Aoki,Nozomi %A Miyagami,Taiju %A Saita,Mizue %A Naito,Toshio %+ Department of General Medicine, Juntendo University Faculty of Medicine, 2-1-1Hongo, Tokyo, 113-8421, Japan, 81 3 3813 3111, miya0829gami@gmail.com %K artificial intelligence %K physicians %K hospitalists %K polypharmacy %K sexism %K Japan %K AI %K artificial intelligence %K medicine %K Japan %K gender-biased %K physicians %K physician %K medical care %K gender %K polypharmacy %K women %K Pharmacology %K older adults %K geriatric %K elderly %K Japanese %D 2024 %7 29.3.2024 %9 Viewpoint %J JMIR Form Res %G English %X This paper presents an interpretation of artificial intelligence (AI)–generated depictions of the present and future of general medicine in Japan. Using text inputs, the AI tool generated fictitious images based on neural network analyses. We believe that our study makes a significant contribution to the literature because the direction of general medicine in Japan has long been unclear, despite constant discussion. Our AI analysis shows that Japanese medicine is currently plagued by issues with polypharmacy, likely because of the aging patient population. Additionally, the analysis indicated a distressed female physician and evoked a sense of anxiety about the future of female physicians. It discusses whether the ability to encourage the success of female physicians is a turning point for the future of medicine in Japan. %M 38551640 %R 10.2196/52566 %U https://formative.jmir.org/2024/1/e52566 %U https://doi.org/10.2196/52566 %U http://www.ncbi.nlm.nih.gov/pubmed/38551640 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e46287 %T Augmenting K-Means Clustering With Qualitative Data to Discover the Engagement Patterns of Older Adults With Multimorbidity When Using Digital Health Technologies: Proof-of-Concept Trial %A Sheng,Yiyang %A Bond,Raymond %A Jaiswal,Rajesh %A Dinsmore,John %A Doyle,Julie %+ NetwellCASALA, Dundalk Institution of Technology, Dublin Road, PJ Carrolls Building, Dundalk Institute of Technology, Co.Louth, Ireland, Dundalk, A91 K584, Ireland, 353 894308214, shengexz@gmail.com %K aging %K digital health %K multimorbidity %K chronic disease %K engagement %K k-means clustering %D 2024 %7 28.3.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Multiple chronic conditions (multimorbidity) are becoming more prevalent among aging populations. Digital health technologies have the potential to assist in the self-management of multimorbidity, improving the awareness and monitoring of health and well-being, supporting a better understanding of the disease, and encouraging behavior change. Objective: The aim of this study was to analyze how 60 older adults (mean age 74, SD 6.4; range 65-92 years) with multimorbidity engaged with digital symptom and well-being monitoring when using a digital health platform over a period of approximately 12 months. Methods: Principal component analysis and clustering analysis were used to group participants based on their levels of engagement, and the data analysis focused on characteristics (eg, age, sex, and chronic health conditions), engagement outcomes, and symptom outcomes of the different clusters that were discovered. Results: Three clusters were identified: the typical user group, the least engaged user group, and the highly engaged user group. Our findings show that age, sex, and the types of chronic health conditions do not influence engagement. The 3 primary factors influencing engagement were whether the same device was used to submit different health and well-being parameters, the number of manual operations required to take a reading, and the daily routine of the participants. The findings also indicate that higher levels of engagement may improve the participants’ outcomes (eg, reduce symptom exacerbation and increase physical activity). Conclusions: The findings indicate potential factors that influence older adult engagement with digital health technologies for home-based multimorbidity self-management. The least engaged user groups showed decreased health and well-being outcomes related to multimorbidity self-management. Addressing the factors highlighted in this study in the design and implementation of home-based digital health technologies may improve symptom management and physical activity outcomes for older adults self-managing multimorbidity. %M 38546724 %R 10.2196/46287 %U https://www.jmir.org/2024/1/e46287 %U https://doi.org/10.2196/46287 %U http://www.ncbi.nlm.nih.gov/pubmed/38546724 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e41065 %T Development and External Validation of Machine Learning Models for Diabetic Microvascular Complications: Cross-Sectional Study With Metabolites %A He,Feng %A Ng Yin Ling,Clarissa %A Nusinovici,Simon %A Cheng,Ching-Yu %A Wong,Tien Yin %A Li,Jialiang %A Sabanayagam,Charumathi %+ Singapore Eye Research Institute, Singapore National Eye Centre, The Academia, 20 College Road, Discovery Tower Level 6, Singapore, 169856, Singapore, 65 6576 7286, charumathi.sabanayagam@seri.com.sg %K machine learning %K diabetic microvascular complication %K diabetic kidney disease %K diabetic retinopathy %K biomarkers %K metabolomics %K complication %K adult %K cardiovascular disease %K metabolites %K biomedical big data %K kidney disease %D 2024 %7 28.3.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Diabetic kidney disease (DKD) and diabetic retinopathy (DR) are major diabetic microvascular complications, contributing significantly to morbidity, disability, and mortality worldwide. The kidney and the eye, having similar microvascular structures and physiological and pathogenic features, may experience similar metabolic changes in diabetes. Objective: This study aimed to use machine learning (ML) methods integrated with metabolic data to identify biomarkers associated with DKD and DR in a multiethnic Asian population with diabetes, as well as to improve the performance of DKD and DR detection models beyond traditional risk factors. Methods: We used ML algorithms (logistic regression [LR] with Least Absolute Shrinkage and Selection Operator and gradient-boosting decision tree) to analyze 2772 adults with diabetes from the Singapore Epidemiology of Eye Diseases study, a population-based cross-sectional study conducted in Singapore (2004-2011). From 220 circulating metabolites and 19 risk factors, we selected the most important variables associated with DKD (defined as an estimated glomerular filtration rate <60 mL/min/1.73 m2) and DR (defined as an Early Treatment Diabetic Retinopathy Study severity level ≥20). DKD and DR detection models were developed based on the variable selection results and externally validated on a sample of 5843 participants with diabetes from the UK biobank (2007-2010). Machine-learned model performance (area under the receiver operating characteristic curve [AUC] with 95% CI, sensitivity, and specificity) was compared to that of traditional LR adjusted for age, sex, diabetes duration, hemoglobin A1c, systolic blood pressure, and BMI. Results: Singapore Epidemiology of Eye Diseases participants had a median age of 61.7 (IQR 53.5-69.4) years, with 49.1% (1361/2772) being women, 20.2% (555/2753) having DKD, and 25.4% (685/2693) having DR. UK biobank participants had a median age of 61.0 (IQR 55.0-65.0) years, with 35.8% (2090/5843) being women, 6.7% (374/5570) having DKD, and 6.1% (355/5843) having DR. The ML algorithms identified diabetes duration, insulin usage, age, and tyrosine as the most important factors of both DKD and DR. DKD was additionally associated with cardiovascular disease history, antihypertensive medication use, and 3 metabolites (lactate, citrate, and cholesterol esters to total lipids ratio in intermediate-density lipoprotein), while DR was additionally associated with hemoglobin A1c, blood glucose, pulse pressure, and alanine. Machine-learned models for DKD and DR detection outperformed traditional LR models in both internal (AUC 0.838 vs 0.743 for DKD and 0.790 vs 0.764 for DR) and external validation (AUC 0.791 vs 0.691 for DKD and 0.778 vs 0.760 for DR). Conclusions: This study highlighted diabetes duration, insulin usage, age, and circulating tyrosine as important factors in detecting DKD and DR. The integration of ML with biomedical big data enables biomarker discovery and improves disease detection beyond traditional risk factors. %M 38546730 %R 10.2196/41065 %U https://www.jmir.org/2024/1/e41065 %U https://doi.org/10.2196/41065 %U http://www.ncbi.nlm.nih.gov/pubmed/38546730 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e50568 %T Target Product Profile for a Machine Learning–Automated Retinal Imaging Analysis Software for Use in English Diabetic Eye Screening: Protocol for a Mixed Methods Study %A Macdonald,Trystan %A Dinnes,Jacqueline %A Maniatopoulos,Gregory %A Taylor-Phillips,Sian %A Shinkins,Bethany %A Hogg,Jeffry %A Dunbar,John Kevin %A Solebo,Ameenat Lola %A Sutton,Hannah %A Attwood,John %A Pogose,Michael %A Given-Wilson,Rosalind %A Greaves,Felix %A Macrae,Carl %A Pearson,Russell %A Bamford,Daniel %A Tufail,Adnan %A Liu,Xiaoxuan %A Denniston,Alastair K %+ Ophthalmology Department, Queen Elizabeth Hospital Birmingham, University Hospitals Birmingham National Health Service Foundation Trust, Mindelsohn Way, Edgbaston, Birmingham, B15 2TH, United Kingdom, 44 1213716905, a.denniston@bham.ac.uk %K artificial intelligence %K design %K developers %K diabetes mellitus %K diabetic eye screening %K diabetic retinopathy %K diabetic %K DM %K England %K eye screening %K imaging analysis software %K implementation %K machine learning %K retinal imaging %K study protocol %K target product profile %D 2024 %7 27.3.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Diabetic eye screening (DES) represents a significant opportunity for the application of machine learning (ML) technologies, which may improve clinical and service outcomes. However, successful integration of ML into DES requires careful product development, evaluation, and implementation. Target product profiles (TPPs) summarize the requirements necessary for successful implementation so these can guide product development and evaluation. Objective: This study aims to produce a TPP for an ML-automated retinal imaging analysis software (ML-ARIAS) system for use in DES in England. Methods: This work will consist of 3 phases. Phase 1 will establish the characteristics to be addressed in the TPP. A list of candidate characteristics will be generated from the following sources: an overview of systematic reviews of diagnostic test TPPs; a systematic review of digital health TPPs; and the National Institute for Health and Care Excellence’s Evidence Standards Framework for Digital Health Technologies. The list of characteristics will be refined and validated by a study advisory group (SAG) made up of representatives from key stakeholders in DES. This includes people with diabetes; health care professionals; health care managers and leaders; and regulators and policy makers. In phase 2, specifications for these characteristics will be drafted following a series of semistructured interviews with participants from these stakeholder groups. Data collected from these interviews will be analyzed using the shortlist of characteristics as a framework, after which specifications will be drafted to create a draft TPP. Following approval by the SAG, in phase 3, the draft will enter an internet-based Delphi consensus study with participants sought from the groups previously identified, as well as ML-ARIAS developers, to ensure feasibility. Participants will be invited to score characteristic and specification pairs on a scale from “definitely exclude” to “definitely include,” and suggest edits. The document will be iterated between rounds based on participants’ feedback. Feedback on the draft document will be sought from a group of ML-ARIAS developers before its final contents are agreed upon in an in-person consensus meeting. At this meeting, representatives from the stakeholder groups previously identified (minus ML-ARIAS developers, to avoid bias) will be presented with the Delphi results and feedback of the user group and asked to agree on the final contents by vote. Results: Phase 1 was completed in November 2023. Phase 2 is underway and expected to finish in March 2024. Phase 3 is expected to be complete in July 2024. Conclusions: The multistakeholder development of a TPP for an ML-ARIAS for use in DES in England will help developers produce tools that serve the needs of patients, health care providers, and their staff. The TPP development process will also provide methods and a template to produce similar documents in other disease areas. International Registered Report Identifier (IRRID): DERR1-10.2196/50568 %M 38536234 %R 10.2196/50568 %U https://www.researchprotocols.org/2024/1/e50568 %U https://doi.org/10.2196/50568 %U http://www.ncbi.nlm.nih.gov/pubmed/38536234 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e43554 %T The Impact of Wireless Emergency Alerts on a Floating Population in Seoul, South Korea: Panel Data Analysis %A Yoon,Sungwook %A Lim,Hyungsoo %A Park,Sungho %+ Business School, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea, 82 2 880 6949, spark104@snu.ac.kr %K COVID-19 %K empirical identification %K floating population %K social distancing %K wireless emergency alert %D 2024 %7 25.3.2024 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Wireless emergency alerts (WEAs), which deliver disaster information directly to individuals’ mobile phones, have been widely used to provide information related to COVID-19 and to encourage compliance with social distancing guidelines during the COVID-19 pandemic. The floating population refers to the number of people temporarily staying in a specific area, and this demographic data can be a useful indicator to understand the level of social distancing people are complying with during the COVID-19 pandemic. Objective: This study aimed to empirically analyze the impact of WEAs on the floating population where WEAs were transmitted in the early stages of the COVID-19 pandemic. As most WEA messages focus on compliance with the government’s social distancing guidelines, one of the goals of transmitting WEAs during the COVID-19 pandemic is to control the floating population at an appropriate level. Methods: We investigated the empirical impact of WEAs on the floating population across 25 districts in Seoul by estimating a panel regression model at the district-hour level with a series of fixed effects. The main independent variables were the number of instant WEAs, the daily cumulative number of WEAs, the total cumulative number of WEAs, and information extracted from WEAs by natural language processing at the district-hour level. The data set provided a highly informative empirical setting as WEAs were sent by different local governments with various identifiable district-hour–level data. Results: The estimates of the impact of WEAs on the floating population were significantly negative (–0.013, P=.02 to –0.014, P=.01) across all specifications, implying that an additional WEA issuance reduced the floating population by 1.3% (=100(1–e–0.013)) to 1.4% (=100(1–e–0.014)). Although the coefficients of DCN (the daily cumulative number of WEAs) were also negative (–0.0034, P=.34 to –0.0052, P=.15) across all models, they were not significant. The impact of WEAs on the floating population doubled (–0.025, P=.02 to –0.033, P=.005) when the first 82 days of observations were used as subsamples to reduce the possibility of people blocking WEAs. Conclusions: Our results suggest that issuing WEAs and distributing information related to COVID-19 to a specific district was associated with a decrease in the floating population of that district. Furthermore, among the various types of information in the WEAs, location information was the only significant type of information that was related to a decrease in the floating population. This study makes important contributions. First, this study measured the impact of WEAs in a highly informative empirical setting. Second, this study adds to the existing literature on the mechanisms by which WEAs can affect public response. Lastly, this study has important implications for making optimal WEAs and suggests that location information should be included. %M 38526536 %R 10.2196/43554 %U https://publichealth.jmir.org/2024/1/e43554 %U https://doi.org/10.2196/43554 %U http://www.ncbi.nlm.nih.gov/pubmed/38526536 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e50652 %T Using Longitudinal Twitter Data for Digital Epidemiology of Childhood Health Outcomes: An Annotated Data Set and Deep Neural Network Classifiers %A Klein,Ari Z %A Gutiérrez Gómez,José Agustín %A Levine,Lisa D %A Gonzalez-Hernandez,Graciela %+ Department of Computational Biomedicine, Cedars-Sinai Medical Center, Pacific Design Center, Ste G549F, 700 N San Vicente Blvd, West Hollywood, CA, 90069, United States, 1 310 423 3521, Graciela.GonzalezHernandez@csmc.edu %K natural language processing %K machine learning %K data mining %K social media %K Twitter %K pregnancy %K epidemiology %K developmental disabilities %K asthma %D 2024 %7 25.3.2024 %9 Research Letter %J J Med Internet Res %G English %X We manually annotated 9734 tweets that were posted by users who reported their pregnancy on Twitter, and used them to train, evaluate, and deploy deep neural network classifiers (F1-score=0.93) to detect tweets that report having a child with attention-deficit/hyperactivity disorder (678 users), autism spectrum disorders (1744 users), delayed speech (902 users), or asthma (1255 users), demonstrating the potential of Twitter as a complementary resource for assessing associations between pregnancy exposures and childhood health outcomes on a large scale. %M 38526542 %R 10.2196/50652 %U https://www.jmir.org/2024/1/e50652 %U https://doi.org/10.2196/50652 %U http://www.ncbi.nlm.nih.gov/pubmed/38526542 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e52482 %T Efficient Machine Reading Comprehension for Health Care Applications: Algorithm Development and Validation of a Context Extraction Approach %A Nguyen,Duy-Anh %A Li,Minyi %A Lambert,Gavin %A Kowalczyk,Ryszard %A McDonald,Rachael %A Vo,Quoc Bao %+ School of Software and Electrical Engineering, Swinburne University of Technology, John St, Hawthorn, 3122, Australia, 61 392148444, anhngd93@gmail.com %K question answering %K machine reading comprehension %K context extraction %K covid19 %K health care %D 2024 %7 25.3.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Extractive methods for machine reading comprehension (MRC) tasks have achieved comparable or better accuracy than human performance on benchmark data sets. However, such models are not as successful when adapted to complex domains such as health care. One of the main reasons is that the context that the MRC model needs to process when operating in a complex domain can be much larger compared with an average open-domain context. This causes the MRC model to make less accurate and slower predictions. A potential solution to this problem is to reduce the input context of the MRC model by extracting only the necessary parts from the original context. Objective: This study aims to develop a method for extracting useful contexts from long articles as an additional component to the question answering task, enabling the MRC model to work more efficiently and accurately. Methods: Existing approaches to context extraction in MRC are based on sentence selection strategies, in which the models are trained to find the sentences containing the answer. We found that using only the sentences containing the answer was insufficient for the MRC model to predict correctly. We conducted a series of empirical studies and observed a strong relationship between the usefulness of the context and the confidence score output of the MRC model. Our investigation showed that a precise input context can boost the prediction correctness of the MRC and greatly reduce inference time. We proposed a method to estimate the utility of each sentence in a context in answering the question and then extract a new, shorter context according to these estimations. We generated a data set to train 2 models for estimating sentence utility, based on which we selected more precise contexts that improved the MRC model’s performance. Results: We demonstrated our approach on the Question Answering Data Set for COVID-19 and Biomedical Semantic Indexing and Question Answering data sets and showed that the approach benefits the downstream MRC model. First, the method substantially reduced the inference time of the entire question answering system by 6 to 7 times. Second, our approach helped the MRC model predict the answer more correctly compared with using the original context (F1-score increased from 0.724 to 0.744 for the Question Answering Data Set for COVID-19 and from 0.651 to 0.704 for the Biomedical Semantic Indexing and Question Answering). We also found a potential problem where extractive transformer MRC models predict poorly despite being given a more precise context in some cases. Conclusions: The proposed context extraction method allows the MRC model to achieve improved prediction correctness and a significantly reduced MRC inference time. This approach works technically with any MRC model and has potential in tasks involving processing long texts. %M 38526545 %R 10.2196/52482 %U https://formative.jmir.org/2024/1/e52482 %U https://doi.org/10.2196/52482 %U http://www.ncbi.nlm.nih.gov/pubmed/38526545 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e55615 %T Personalized AI-Driven Real-Time Models to Predict Stress-Induced Blood Pressure Spikes Using Wearable Devices: Proposal for a Prospective Cohort Study %A Kargarandehkordi,Ali %A Slade,Christopher %A Washington,Peter %+ Department of Information and Computer Sciences, University of Hawaii at Manoa, 1680 East-West Road, Honolulu, HI, 96822, United States, 1 5126800926, pyw@hawaii.edu %K stress %K hypertension %K precision health %K personalized artificial intelligence %K wearables %K ecological momentary assessments %K passive sensing %K mobile phone %D 2024 %7 25.3.2024 %9 Proposal %J JMIR Res Protoc %G English %X Background: Referred to as the “silent killer,” elevated blood pressure (BP) often goes unnoticed due to the absence of apparent symptoms, resulting in cumulative harm over time. Chronic stress has been consistently linked to increased BP. Prior studies have found that elevated BP often arises due to a stressful lifestyle, although the effect of exact stressors varies drastically between individuals. The heterogeneous nature of both the stress and BP response to a multitude of lifestyle decisions can make it difficult if not impossible to pinpoint the most deleterious behaviors using the traditional mechanism of clinical interviews. Objective: The aim of this study is to leverage machine learning (ML) algorithms for real-time predictions of stress-induced BP spikes using consumer wearable devices such as Fitbit, providing actionable insights to both patients and clinicians to improve diagnostics and enable proactive health monitoring. This study also seeks to address the significant challenges in identifying specific deleterious behaviors associated with stress-induced hypertension through the development of personalized artificial intelligence models for individual patients, departing from the conventional approach of using generalized models. Methods: The study proposes the development of ML algorithms to analyze biosignals obtained from these wearable devices, aiming to make real-time predictions about BP spikes. Given the longitudinal nature of the data set comprising time-series data from wearables (eg, Fitbit) and corresponding time-stamped labels representing stress levels from Ecological Momentary Assessment reports, the adoption of self-supervised learning for pretraining the network and using transformer models for fine-tuning the model on a personalized prediction task is proposed. Transformer models, with their self-attention mechanisms, dynamically weigh the importance of different time steps, enabling the model to focus on relevant temporal features and dependencies, facilitating accurate prediction. Results: Supported as a pilot project from the Robert C Perry Fund of the Hawaii Community Foundation, the study team has developed the core study app, CardioMate. CardioMate not only reminds participants to initiate BP readings using an Omron HeartGuide wearable monitor but also prompts them multiple times a day to report stress levels. Additionally, it collects other useful information including medications, environmental conditions, and daily interactions. Through the app’s messaging system, efficient contact and interaction between users and study admins ensure smooth progress. Conclusions: Personalized ML when applied to biosignals offers the potential for real-time digital health interventions for chronic stress and its symptoms. The project’s clinical use for Hawaiians with stress-induced high BP combined with its methodological innovation of personalized artificial intelligence models highlights its significance in advancing health care interventions. Through iterative refinement and optimization, the aim is to develop a personalized deep-learning framework capable of accurately predicting stress-induced BP spikes, thereby promoting individual well-being and health outcomes. International Registered Report Identifier (IRRID): DERR1-10.2196/55615 %M 38526539 %R 10.2196/55615 %U https://www.researchprotocols.org/2024/1/e55615 %U https://doi.org/10.2196/55615 %U http://www.ncbi.nlm.nih.gov/pubmed/38526539 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 7 %N %P e53564 %T Strategies to Mitigate Age-Related Bias in Machine Learning: Scoping Review %A Chu,Charlene %A Donato-Woodger,Simon %A Khan,Shehroz S %A Shi,Tianyu %A Leslie,Kathleen %A Abbasgholizadeh-Rahimi,Samira %A Nyrup,Rune %A Grenier,Amanda %+ Lawrence Bloomberg Faculty of Nursing, University of Toronto, 155 College Street, Toronto, ON, M5T 1P8, Canada, 1 416 946 0217, charlene.chu@utoronto.ca %K age %K ageing %K ageism %K aging %K algorithm %K algorithmic bias %K artificial intelligence %K bias %K digital ageism %K elder %K elderly %K geriatric %K gerontology %K machine learning %K older adult %K older people %K older person %K review methodology %K review methods %K scoping %K search %K searching %K synthesis %D 2024 %7 22.3.2024 %9 Review %J JMIR Aging %G English %X Background: Research suggests that digital ageism, that is, age-related bias, is present in the development and deployment of machine learning (ML) models. Despite the recognition of the importance of this problem, there is a lack of research that specifically examines the strategies used to mitigate age-related bias in ML models and the effectiveness of these strategies. Objective: To address this gap, we conducted a scoping review of mitigation strategies to reduce age-related bias in ML. Methods: We followed a scoping review methodology framework developed by Arksey and O’Malley. The search was developed in conjunction with an information specialist and conducted in 6 electronic databases (IEEE Xplore, Scopus, Web of Science, CINAHL, EMBASE, and the ACM digital library), as well as 2 additional gray literature databases (OpenGrey and Grey Literature Report). Results: We identified 8 publications that attempted to mitigate age-related bias in ML approaches. Age-related bias was introduced primarily due to a lack of representation of older adults in the data. Efforts to mitigate bias were categorized into one of three approaches: (1) creating a more balanced data set, (2) augmenting and supplementing their data, and (3) modifying the algorithm directly to achieve a more balanced result. Conclusions: Identifying and mitigating related biases in ML models is critical to fostering fairness, equity, inclusion, and social benefits. Our analysis underscores the ongoing need for rigorous research and the development of effective mitigation approaches to address digital ageism, ensuring that ML systems are used in a way that upholds the interests of all individuals. Trial Registration: Open Science Framework AMG5P; https://osf.io/amg5p %M 38517459 %R 10.2196/53564 %U https://aging.jmir.org/2024/1/e53564 %U https://doi.org/10.2196/53564 %U http://www.ncbi.nlm.nih.gov/pubmed/38517459 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e52462 %T Automated Category and Trend Analysis of Scientific Articles on Ophthalmology Using Large Language Models: Development and Usability Study %A Raja,Hina %A Munawar,Asim %A Mylonas,Nikolaos %A Delsoz,Mohammad %A Madadi,Yeganeh %A Elahi,Muhammad %A Hassan,Amr %A Abu Serhan,Hashem %A Inam,Onur %A Hernandez,Luis %A Chen,Hao %A Tran,Sang %A Munir,Wuqaas %A Abd-Alrazaq,Alaa %A Yousefi,Siamak %+ Department of Ophthalmology, University of Tennessee Health Science Center, 930 Madison Avenue, Ste. 468, Memphis, TN, 38111, United States, 1 9016595035, hinaraja65@gmail.com %K Bidirectional and Auto-Regressive Transformers %K BART %K bidirectional encoder representations from transformers %K BERT %K ophthalmology %K text classification %K large language model %K LLM %K trend analysis %D 2024 %7 22.3.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: In this paper, we present an automated method for article classification, leveraging the power of large language models (LLMs). Objective: The aim of this study is to evaluate the applicability of various LLMs based on textual content of scientific ophthalmology papers. Methods: We developed a model based on natural language processing techniques, including advanced LLMs, to process and analyze the textual content of scientific papers. Specifically, we used zero-shot learning LLMs and compared Bidirectional and Auto-Regressive Transformers (BART) and its variants with Bidirectional Encoder Representations from Transformers (BERT) and its variants, such as distilBERT, SciBERT, PubmedBERT, and BioBERT. To evaluate the LLMs, we compiled a data set (retinal diseases [RenD] ) of 1000 ocular disease–related articles, which were expertly annotated by a panel of 6 specialists into 19 distinct categories. In addition to the classification of articles, we also performed analysis on different classified groups to find the patterns and trends in the field. Results: The classification results demonstrate the effectiveness of LLMs in categorizing a large number of ophthalmology papers without human intervention. The model achieved a mean accuracy of 0.86 and a mean F1-score of 0.85 based on the RenD data set. Conclusions: The proposed framework achieves notable improvements in both accuracy and efficiency. Its application in the domain of ophthalmology showcases its potential for knowledge organization and retrieval. We performed a trend analysis that enables researchers and clinicians to easily categorize and retrieve relevant papers, saving time and effort in literature review and information gathering as well as identification of emerging scientific trends within different disciplines. Moreover, the extendibility of the model to other scientific fields broadens its impact in facilitating research and trend analysis across diverse disciplines. %M 38517457 %R 10.2196/52462 %U https://formative.jmir.org/2024/1/e52462 %U https://doi.org/10.2196/52462 %U http://www.ncbi.nlm.nih.gov/pubmed/38517457 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e52073 %T Preliminary Evidence of the Use of Generative AI in Health Care Clinical Services: Systematic Narrative Review %A Yim,Dobin %A Khuntia,Jiban %A Parameswaran,Vijaya %A Meyers,Arlen %+ University of Colorado Denver, 1475 Lawrence St., Denver, CO, United States, 1 3038548024, jiban.khuntia@ucdenver.edu %K generative artificial intelligence tools and applications %K GenAI %K service %K clinical %K health care %K transformation %K digital %D 2024 %7 20.3.2024 %9 Review %J JMIR Med Inform %G English %X Background: Generative artificial intelligence tools and applications (GenAI) are being increasingly used in health care. Physicians, specialists, and other providers have started primarily using GenAI as an aid or tool to gather knowledge, provide information, train, or generate suggestive dialogue between physicians and patients or between physicians and patients’ families or friends. However, unless the use of GenAI is oriented to be helpful in clinical service encounters that can improve the accuracy of diagnosis, treatment, and patient outcomes, the expected potential will not be achieved. As adoption continues, it is essential to validate the effectiveness of the infusion of GenAI as an intelligent technology in service encounters to understand the gap in actual clinical service use of GenAI. Objective: This study synthesizes preliminary evidence on how GenAI assists, guides, and automates clinical service rendering and encounters in health care The review scope was limited to articles published in peer-reviewed medical journals. Methods: We screened and selected 0.38% (161/42,459) of articles published between January 1, 2020, and May 31, 2023, identified from PubMed. We followed the protocols outlined in the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines to select highly relevant studies with at least 1 element on clinical use, evaluation, and validation to provide evidence of GenAI use in clinical services. The articles were classified based on their relevance to clinical service functions or activities using the descriptive and analytical information presented in the articles. Results: Of 161 articles, 141 (87.6%) reported using GenAI to assist services through knowledge access, collation, and filtering. GenAI was used for disease detection (19/161, 11.8%), diagnosis (14/161, 8.7%), and screening processes (12/161, 7.5%) in the areas of radiology (17/161, 10.6%), cardiology (12/161, 7.5%), gastrointestinal medicine (4/161, 2.5%), and diabetes (6/161, 3.7%). The literature synthesis in this study suggests that GenAI is mainly used for diagnostic processes, improvement of diagnosis accuracy, and screening and diagnostic purposes using knowledge access. Although this solves the problem of knowledge access and may improve diagnostic accuracy, it is oriented toward higher value creation in health care. Conclusions: GenAI informs rather than assisting or automating clinical service functions in health care. There is potential in clinical service, but it has yet to be actualized for GenAI. More clinical service–level evidence that GenAI is used to streamline some functions or provides more automated help than only information retrieval is needed. To transform health care as purported, more studies related to GenAI applications must automate and guide human-performed services and keep up with the optimism that forward-thinking health care organizations will take advantage of GenAI. %M 38506918 %R 10.2196/52073 %U https://medinform.jmir.org/2024/1/e52073 %U https://doi.org/10.2196/52073 %U http://www.ncbi.nlm.nih.gov/pubmed/38506918 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e51151 %T Incorporating ChatGPT in Medical Informatics Education: Mixed Methods Study on Student Perceptions and Experiential Integration Proposals %A Magalhães Araujo,Sabrina %A Cruz-Correia,Ricardo %+ Center for Health Technology and Services Research, Faculty of Medicine, University of Porto, Rua Dr Plácido da Costa, s/n, Porto, 4200-450, Portugal, 351 220 426 91 ext 26911, saraujo@med.up.pt %K education %K medical informatics %K artificial intelligence %K AI %K generative language model %K ChatGPT %D 2024 %7 20.3.2024 %9 Original Paper %J JMIR Med Educ %G English %X Background: The integration of artificial intelligence (AI) technologies, such as ChatGPT, in the educational landscape has the potential to enhance the learning experience of medical informatics students and prepare them for using AI in professional settings. The incorporation of AI in classes aims to develop critical thinking by encouraging students to interact with ChatGPT and critically analyze the responses generated by the chatbot. This approach also helps students develop important skills in the field of biomedical and health informatics to enhance their interaction with AI tools. Objective: The aim of the study is to explore the perceptions of students regarding the use of ChatGPT as a learning tool in their educational context and provide professors with examples of prompts for incorporating ChatGPT into their teaching and learning activities, thereby enhancing the educational experience for students in medical informatics courses. Methods: This study used a mixed methods approach to gain insights from students regarding the use of ChatGPT in education. To accomplish this, a structured questionnaire was applied to evaluate students’ familiarity with ChatGPT, gauge their perceptions of its use, and understand their attitudes toward its use in academic and learning tasks. Learning outcomes of 2 courses were analyzed to propose ChatGPT’s incorporation in master’s programs in medicine and medical informatics. Results: The majority of students expressed satisfaction with the use of ChatGPT in education, finding it beneficial for various purposes, including generating academic content, brainstorming ideas, and rewriting text. While some participants raised concerns about potential biases and the need for informed use, the overall perception was positive. Additionally, the study proposed integrating ChatGPT into 2 specific courses in the master’s programs in medicine and medical informatics. The incorporation of ChatGPT was envisioned to enhance student learning experiences and assist in project planning, programming code generation, examination preparation, workflow exploration, and technical interview preparation, thus advancing medical informatics education. In medical teaching, it will be used as an assistant for simplifying the explanation of concepts and solving complex problems, as well as for generating clinical narratives and patient simulators. Conclusions: The study’s valuable insights into medical faculty students’ perspectives and integration proposals for ChatGPT serve as an informative guide for professors aiming to enhance medical informatics education. The research delves into the potential of ChatGPT, emphasizes the necessity of collaboration in academic environments, identifies subject areas with discernible benefits, and underscores its transformative role in fostering innovative and engaging learning experiences. The envisaged proposals hold promise in empowering future health care professionals to work in the rapidly evolving era of digital health care. %M 38506920 %R 10.2196/51151 %U https://mededu.jmir.org/2024/1/e51151 %U https://doi.org/10.2196/51151 %U http://www.ncbi.nlm.nih.gov/pubmed/38506920 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 10 %N %P e52322 %T Machine Learning Approaches to Predict Symptoms in People With Cancer: Systematic Review %A Zeinali,Nahid %A Youn,Nayung %A Albashayreh,Alaa %A Fan,Weiguo %A Gilbertson White,Stéphanie %+ College of Nursing, University of Iowa, 452 CNB, 50 Newton Rd 52246, Iowa City, IA, 52246, United States, 1 319 335 7023, stephanie-gilbertson-white@uiowa.edu %K machine learning %K ML %K deep learning %K DL %K cancer symptoms %K prediction model %D 2024 %7 19.3.2024 %9 Review %J JMIR Cancer %G English %X Background: People with cancer frequently experience severe and distressing symptoms associated with cancer and its treatments. Predicting symptoms in patients with cancer continues to be a significant challenge for both clinicians and researchers. The rapid evolution of machine learning (ML) highlights the need for a current systematic review to improve cancer symptom prediction. Objective: This systematic review aims to synthesize the literature that has used ML algorithms to predict the development of cancer symptoms and to identify the predictors of these symptoms. This is essential for integrating new developments and identifying gaps in existing literature. Methods: We conducted this systematic review in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) checklist. We conducted a systematic search of CINAHL, Embase, and PubMed for English records published from 1984 to August 11, 2023, using the following search terms: cancer, neoplasm, specific symptoms, neural networks, machine learning, specific algorithm names, and deep learning. All records that met the eligibility criteria were individually reviewed by 2 coauthors, and key findings were extracted and synthesized. We focused on studies using ML algorithms to predict cancer symptoms, excluding nonhuman research, technical reports, reviews, book chapters, conference proceedings, and inaccessible full texts. Results: A total of 42 studies were included, the majority of which were published after 2017. Most studies were conducted in North America (18/42, 43%) and Asia (16/42, 38%). The sample sizes in most studies (27/42, 64%) typically ranged from 100 to 1000 participants. The most prevalent category of algorithms was supervised ML, accounting for 39 (93%) of the 42 studies. Each of the methods—deep learning, ensemble classifiers, and unsupervised ML—constituted 3 (3%) of the 42 studies. The ML algorithms with the best performance were logistic regression (9/42, 17%), random forest (7/42, 13%), artificial neural networks (5/42, 9%), and decision trees (5/42, 9%). The most commonly included primary cancer sites were the head and neck (9/42, 22%) and breast (8/42, 19%), with 17 (41%) of the 42 studies not specifying the site. The most frequently studied symptoms were xerostomia (9/42, 14%), depression (8/42, 13%), pain (8/42, 13%), and fatigue (6/42, 10%). The significant predictors were age, gender, treatment type, treatment number, cancer site, cancer stage, chemotherapy, radiotherapy, chronic diseases, comorbidities, physical factors, and psychological factors. Conclusions: This review outlines the algorithms used for predicting symptoms in individuals with cancer. Given the diversity of symptoms people with cancer experience, analytic approaches that can handle complex and nonlinear relationships are critical. This knowledge can pave the way for crafting algorithms tailored to a specific symptom. In addition, to improve prediction precision, future research should compare cutting-edge ML strategies such as deep learning and ensemble methods with traditional statistical models. %M 38502171 %R 10.2196/52322 %U https://cancer.jmir.org/2024/1/e52322 %U https://doi.org/10.2196/52322 %U http://www.ncbi.nlm.nih.gov/pubmed/38502171 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e50369 %T Development and Validation of an Interpretable Conformal Predictor to Predict Sepsis Mortality Risk: Retrospective Cohort Study %A Yang,Meicheng %A Chen,Hui %A Hu,Wenhan %A Mischi,Massimo %A Shan,Caifeng %A Li,Jianqing %A Long,Xi %A Liu,Chengyu %+ State Key Laboratory of Digital Medical Engineering, School of Instrument Science and Engineering, Southeast University, 35 Jinxianghe Road, Nanjing, 210096, China, 86 25 83793993, chengyu@seu.edu.cn %K sepsis %K critical care %K clinical decision-making %K mortality prediction %K conformal prediction %D 2024 %7 18.3.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Early and reliable identification of patients with sepsis who are at high risk of mortality is important to improve clinical outcomes. However, 3 major barriers to artificial intelligence (AI) models, including the lack of interpretability, the difficulty in generalizability, and the risk of automation bias, hinder the widespread adoption of AI models for use in clinical practice. Objective: This study aimed to develop and validate (internally and externally) a conformal predictor of sepsis mortality risk in patients who are critically ill, leveraging AI-assisted prediction modeling. The proposed approach enables explaining the model output and assessing its confidence level. Methods: We retrospectively extracted data on adult patients with sepsis from a database collected in a teaching hospital at Beth Israel Deaconess Medical Center for model training and internal validation. A large multicenter critical care database from the Philips eICU Research Institute was used for external validation. A total of 103 clinical features were extracted from the first day after admission. We developed an AI model using gradient-boosting machines to predict the mortality risk of sepsis and used Mondrian conformal prediction to estimate the prediction uncertainty. The Shapley additive explanation method was used to explain the model. Results: A total of 16,746 (80%) patients from Beth Israel Deaconess Medical Center were used to train the model. When tested on the internal validation population of 4187 (20%) patients, the model achieved an area under the receiver operating characteristic curve of 0.858 (95% CI 0.845-0.871), which was reduced to 0.800 (95% CI 0.789-0.811) when externally validated on 10,362 patients from the Philips eICU database. At a specified confidence level of 90% for the internal validation cohort the percentage of error predictions (n=438) out of all predictions (n=4187) was 10.5%, with 1229 (29.4%) predictions requiring clinician review. In contrast, the AI model without conformal prediction made 1449 (34.6%) errors. When externally validated, more predictions (n=4004, 38.6%) were flagged for clinician review due to interdatabase heterogeneity. Nevertheless, the model still produced significantly lower error rates compared to the point predictions by AI (n=1221, 11.8% vs n=4540, 43.8%). The most important predictors identified in this predictive model were Acute Physiology Score III, age, urine output, vasopressors, and pulmonary infection. Clinically relevant risk factors contributing to a single patient were also examined to show how the risk arose. Conclusions: By combining model explanation and conformal prediction, AI-based systems can be better translated into medical practice for clinical decision-making. %M 38498038 %R 10.2196/50369 %U https://www.jmir.org/2024/1/e50369 %U https://doi.org/10.2196/50369 %U http://www.ncbi.nlm.nih.gov/pubmed/38498038 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e47979 %T Predicting COVID-19 Vaccination Uptake Using a Small and Interpretable Set of Judgment and Demographic Variables: Cross-Sectional Cognitive Science Study %A Vike,Nicole L %A Bari,Sumra %A Stefanopoulos,Leandros %A Lalvani,Shamal %A Kim,Byoung Woo %A Maglaveras,Nicos %A Block,Martin %A Breiter,Hans C %A Katsaggelos,Aggelos K %+ Department of Computer Science, University of Cincinnati, 2901 Woodside Drive, Cincinnati, OH, 45219, United States, 1 617 413 0953, breitehs@ucmail.uc.edu %K reward %K aversion %K judgment %K relative preference theory %K cognitive science %K behavioral economics %K machine learning %K balanced random forest %K mediation %K moderation %K mobile phone %K smartphone %D 2024 %7 18.3.2024 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Despite COVID-19 vaccine mandates, many chose to forgo vaccination, raising questions about the psychology underlying how judgment affects these choices. Research shows that reward and aversion judgments are important for vaccination choice; however, no studies have integrated such cognitive science with machine learning to predict COVID-19 vaccine uptake. Objective: This study aims to determine the predictive power of a small but interpretable set of judgment variables using 3 machine learning algorithms to predict COVID-19 vaccine uptake and interpret what profile of judgment variables was important for prediction. Methods: We surveyed 3476 adults across the United States in December 2021. Participants answered demographic, COVID-19 vaccine uptake (ie, whether participants were fully vaccinated), and COVID-19 precaution questions. Participants also completed a picture-rating task using images from the International Affective Picture System. Images were rated on a Likert-type scale to calibrate the degree of liking and disliking. Ratings were computationally modeled using relative preference theory to produce a set of graphs for each participant (minimum R2>0.8). In total, 15 judgment features were extracted from these graphs, 2 being analogous to risk and loss aversion from behavioral economics. These judgment variables, along with demographics, were compared between those who were fully vaccinated and those who were not. In total, 3 machine learning approaches (random forest, balanced random forest [BRF], and logistic regression) were used to test how well judgment, demographic, and COVID-19 precaution variables predicted vaccine uptake. Mediation and moderation were implemented to assess statistical mechanisms underlying successful prediction. Results: Age, income, marital status, employment status, ethnicity, educational level, and sex differed by vaccine uptake (Wilcoxon rank sum and chi-square P<.001). Most judgment variables also differed by vaccine uptake (Wilcoxon rank sum P<.05). A similar area under the receiver operating characteristic curve (AUROC) was achieved by the 3 machine learning frameworks, although random forest and logistic regression produced specificities between 30% and 38% (vs 74.2% for BRF), indicating a lower performance in predicting unvaccinated participants. BRF achieved high precision (87.8%) and AUROC (79%) with moderate to high accuracy (70.8%) and balanced recall (69.6%) and specificity (74.2%). It should be noted that, for BRF, the negative predictive value was <50% despite good specificity. For BRF and random forest, 63% to 75% of the feature importance came from the 15 judgment variables. Furthermore, age, income, and educational level mediated relationships between judgment variables and vaccine uptake. Conclusions: The findings demonstrate the underlying importance of judgment variables for vaccine choice and uptake, suggesting that vaccine education and messaging might target varying judgment profiles to improve uptake. These methods could also be used to aid vaccine rollouts and health care preparedness by providing location-specific details (eg, identifying areas that may experience low vaccination and high hospitalization). %M 38315620 %R 10.2196/47979 %U https://publichealth.jmir.org/2024/1/e47979 %U https://doi.org/10.2196/47979 %U http://www.ncbi.nlm.nih.gov/pubmed/38315620 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e47923 %T Methods and Annotated Data Sets Used to Predict the Gender and Age of Twitter Users: Scoping Review %A O'Connor,Karen %A Golder,Su %A Weissenbacher,Davy %A Klein,Ari Z %A Magge,Arjun %A Gonzalez-Hernandez,Graciela %+ Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Dr, Philadelphia, PA, 19004, United States, 1 215 573 8089, karoc@pennmedicine.upenn.edu %K social media %K demographics %K Twitter %K age %K gender %K prediction %K real-world data %K neural network %K machine learning %K gender prediction %K age prediction %D 2024 %7 15.3.2024 %9 Review %J J Med Internet Res %G English %X Background: Patient health data collected from a variety of nontraditional resources, commonly referred to as real-world data, can be a key information source for health and social science research. Social media platforms, such as Twitter (Twitter, Inc), offer vast amounts of real-world data. An important aspect of incorporating social media data in scientific research is identifying the demographic characteristics of the users who posted those data. Age and gender are considered key demographics for assessing the representativeness of the sample and enable researchers to study subgroups and disparities effectively. However, deciphering the age and gender of social media users poses challenges. Objective: This scoping review aims to summarize the existing literature on the prediction of the age and gender of Twitter users and provide an overview of the methods used. Methods: We searched 15 electronic databases and carried out reference checking to identify relevant studies that met our inclusion criteria: studies that predicted the age or gender of Twitter users using computational methods. The screening process was performed independently by 2 researchers to ensure the accuracy and reliability of the included studies. Results: Of the initial 684 studies retrieved, 74 (10.8%) studies met our inclusion criteria. Among these 74 studies, 42 (57%) focused on predicting gender, 8 (11%) focused on predicting age, and 24 (32%) predicted a combination of both age and gender. Gender prediction was predominantly approached as a binary classification task, with the reported performance of the methods ranging from 0.58 to 0.96 F1-score or 0.51 to 0.97 accuracy. Age prediction approaches varied in terms of classification groups, with a higher range of reported performance, ranging from 0.31 to 0.94 F1-score or 0.43 to 0.86 accuracy. The heterogeneous nature of the studies and the reporting of dissimilar performance metrics made it challenging to quantitatively synthesize results and draw definitive conclusions. Conclusions: Our review found that although automated methods for predicting the age and gender of Twitter users have evolved to incorporate techniques such as deep neural networks, a significant proportion of the attempts rely on traditional machine learning methods, suggesting that there is potential to improve the performance of these tasks by using more advanced methods. Gender prediction has generally achieved a higher reported performance than age prediction. However, the lack of standardized reporting of performance metrics or standard annotated corpora to evaluate the methods used hinders any meaningful comparison of the approaches. Potential biases stemming from the collection and labeling of data used in the studies was identified as a problem, emphasizing the need for careful consideration and mitigation of biases in future studies. This scoping review provides valuable insights into the methods used for predicting the age and gender of Twitter users, along with the challenges and considerations associated with these methods. %M 38488839 %R 10.2196/47923 %U https://www.jmir.org/2024/1/e47923 %U https://doi.org/10.2196/47923 %U http://www.ncbi.nlm.nih.gov/pubmed/38488839 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e50882 %T Quality and Dependability of ChatGPT and DingXiangYuan Forums for Remote Orthopedic Consultations: Comparative Analysis %A Xue,Zhaowen %A Zhang,Yiming %A Gan,Wenyi %A Wang,Huajun %A She,Guorong %A Zheng,Xiaofei %+ Department of Bone and Joint Surgery and Sports Medicine Center, The First Affiliated Hospital, The First Affiliated Hospital of Jinan University, No. 613, Huangpu Avenue West, Tianhe District, Guangzhou, 510630, China, 86 13076855735, zhengxiaofei12@163.com %K artificial intelligence %K ChatGPT %K consultation %K musculoskeletal %K natural language processing %K remote medical consultation %K orthopaedic %K orthopaedics %D 2024 %7 14.3.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: The widespread use of artificial intelligence, such as ChatGPT (OpenAI), is transforming sectors, including health care, while separate advancements of the internet have enabled platforms such as China’s DingXiangYuan to offer remote medical services. Objective: This study evaluates ChatGPT-4’s responses against those of professional health care providers in telemedicine, assessing artificial intelligence’s capability to support the surge in remote medical consultations and its impact on health care delivery. Methods: We sourced remote orthopedic consultations from “Doctor DingXiang,” with responses from its certified physicians as the control and ChatGPT’s responses as the experimental group. In all, 3 blindfolded, experienced orthopedic surgeons assessed responses against 7 criteria: “logical reasoning,” “internal information,” “external information,” “guiding function,” “therapeutic effect,” “medical knowledge popularization education,” and “overall satisfaction.” We used Fleiss κ to measure agreement among multiple raters. Results: Initially, consultation records for a cumulative count of 8 maladies (equivalent to 800 cases) were gathered. We ultimately included 73 consultation records by May 2023, following primary and rescreening, in which no communication records containing private information, images, or voice messages were transmitted. After statistical scoring, we discovered that ChatGPT’s “internal information” score (mean 4.61, SD 0.52 points vs mean 4.66, SD 0.49 points; P=.43) and “therapeutic effect” score (mean 4.43, SD 0.75 points vs mean 4.55, SD 0.62 points; P=.32) were lower than those of the control group, but the differences were not statistically significant. ChatGPT showed better performance with a higher “logical reasoning” score (mean 4.81, SD 0.36 points vs mean 4.75, SD 0.39 points; P=.38), “external information” score (mean 4.06, SD 0.72 points vs mean 3.92, SD 0.77 points; P=.25), and “guiding function” score (mean 4.73, SD 0.51 points vs mean 4.72, SD 0.54 points; P=.96), although the differences were not statistically significant. Meanwhile, the “medical knowledge popularization education” score of ChatGPT was better than that of the control group (mean 4.49, SD 0.67 points vs mean 3.87, SD 1.01 points; P<.001), and the difference was statistically significant. In terms of “overall satisfaction,” the difference was not statistically significant between the groups (mean 8.35, SD 1.38 points vs mean 8.37, SD 1.24 points; P=.92). According to how Fleiss κ values were interpreted, 6 of the control group’s score points were classified as displaying “fair agreement” (P<.001), and 1 was classified as showing “substantial agreement” (P<.001). In the experimental group, 3 points were classified as indicating “fair agreement,” while 4 suggested “moderate agreement” (P<.001). Conclusions: ChatGPT-4 matches the expertise found in DingXiangYuan forums’ paid consultations, excelling particularly in scientific education. It presents a promising alternative for remote health advice. For health care professionals, it could act as an aid in patient education, while patients may use it as a convenient tool for health inquiries. %M 38483451 %R 10.2196/50882 %U https://www.jmir.org/2024/1/e50882 %U https://doi.org/10.2196/50882 %U http://www.ncbi.nlm.nih.gov/pubmed/38483451 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e42904 %T Validation of 3 Computer-Aided Facial Phenotyping Tools (DeepGestalt, GestaltMatcher, and D-Score): Comparative Diagnostic Accuracy Study %A Reiter,Alisa Maria Vittoria %A Pantel,Jean Tori %A Danyel,Magdalena %A Horn,Denise %A Ott,Claus-Eric %A Mensah,Martin Atta %+ Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Augustenburger Platz 1, Berlin, 13353, Germany, 49 30450569132, martin-atta.mensah@charite.de %K facial phenotyping %K DeepGestalt %K facial recognition %K Face2Gene %K medical genetics %K diagnostic accuracy %K genetic syndrome %K machine learning %K GestaltMatcher %K D-Score %K genetics %D 2024 %7 13.3.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: While characteristic facial features provide important clues for finding the correct diagnosis in genetic syndromes, valid assessment can be challenging. The next-generation phenotyping algorithm DeepGestalt analyzes patient images and provides syndrome suggestions. GestaltMatcher matches patient images with similar facial features. The new D-Score provides a score for the degree of facial dysmorphism. Objective: We aimed to test state-of-the-art facial phenotyping tools by benchmarking GestaltMatcher and D-Score and comparing them to DeepGestalt. Methods: Using a retrospective sample of 4796 images of patients with 486 different genetic syndromes (London Medical Database, GestaltMatcher Database, and literature images) and 323 inconspicuous control images, we determined the clinical use of D-Score, GestaltMatcher, and DeepGestalt, evaluating sensitivity; specificity; accuracy; the number of supported diagnoses; and potential biases such as age, sex, and ethnicity. Results: DeepGestalt suggested 340 distinct syndromes and GestaltMatcher suggested 1128 syndromes. The top-30 sensitivity was higher for DeepGestalt (88%, SD 18%) than for GestaltMatcher (76%, SD 26%). DeepGestalt generally assigned lower scores but provided higher scores for patient images than for inconspicuous control images, thus allowing the 2 cohorts to be separated with an area under the receiver operating characteristic curve (AUROC) of 0.73. GestaltMatcher could not separate the 2 classes (AUROC 0.55). Trained for this purpose, D-Score achieved the highest discriminatory power (AUROC 0.86). D-Score’s levels increased with the age of the depicted individuals. Male individuals yielded higher D-scores than female individuals. Ethnicity did not appear to influence D-scores. Conclusions: If used with caution, algorithms such as D-score could help clinicians with constrained resources or limited experience in syndromology to decide whether a patient needs further genetic evaluation. Algorithms such as DeepGestalt could support diagnosing rather common genetic syndromes with facial abnormalities, whereas algorithms such as GestaltMatcher could suggest rare diagnoses that are unknown to the clinician in patients with a characteristic, dysmorphic face. %M 38477981 %R 10.2196/42904 %U https://www.jmir.org/2024/1/e42904 %U https://doi.org/10.2196/42904 %U http://www.ncbi.nlm.nih.gov/pubmed/38477981 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e53656 %T What Is the Performance of ChatGPT in Determining the Gender of Individuals Based on Their First and Last Names? %A Sebo,Paul %+ University Institute for Primary Care, University of Geneva, Rue Michel-Servet 1, Geneva, 1211, Switzerland, 41 223794390, paulsebo@hotmail.com %K accuracy %K artificial intelligence %K AI %K ChatGPT %K gender %K gender detection tool %K misclassification %K name %K performance %K gender detection %K gender detection tools %K inequalities %K language model %K NamSor %K Gender API %K Switzerland %K physicians %K gender bias %K disparities %K gender disparities %K gender gap %D 2024 %7 13.3.2024 %9 Research Letter %J JMIR AI %G English %X %M 38875596 %R 10.2196/53656 %U https://ai.jmir.org/2024/1/e53656 %U https://doi.org/10.2196/53656 %U http://www.ncbi.nlm.nih.gov/pubmed/38875596 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e52211 %T The Impact of Expectation Management and Model Transparency on Radiologists’ Trust and Utilization of AI Recommendations for Lung Nodule Assessment on Computed Tomography: Simulated Use Study %A Ewals,Lotte J S %A Heesterbeek,Lynn J J %A Yu,Bin %A van der Wulp,Kasper %A Mavroeidis,Dimitrios %A Funk,Mathias %A Snijders,Chris C P %A Jacobs,Igor %A Nederend,Joost %A Pluyter,Jon R %A , %+ Catharina Cancer Institute, Catharina Hospital Eindhoven, Michelangelolaan 2, Eindhoven, 5623 EJ, Netherlands, 31 40 239 9111, lotte.ewals@catharinaziekenhuis.nl %K application %K artificial intelligence %K AI %K computer-aided detection or diagnosis %K CAD %K design %K human centered %K human computer interaction %K HCI %K interaction %K mental model %K radiologists %K trust %D 2024 %7 13.3.2024 %9 Original Paper %J JMIR AI %G English %X Background: Many promising artificial intelligence (AI) and computer-aided detection and diagnosis systems have been developed, but few have been successfully integrated into clinical practice. This is partially owing to a lack of user-centered design of AI-based computer-aided detection or diagnosis (AI-CAD) systems. Objective: We aimed to assess the impact of different onboarding tutorials and levels of AI model explainability on radiologists’ trust in AI and the use of AI recommendations in lung nodule assessment on computed tomography (CT) scans. Methods: In total, 20 radiologists from 7 Dutch medical centers performed lung nodule assessment on CT scans under different conditions in a simulated use study as part of a 2×2 repeated-measures quasi-experimental design. Two types of AI onboarding tutorials (reflective vs informative) and 2 levels of AI output (black box vs explainable) were designed. The radiologists first received an onboarding tutorial that was either informative or reflective. Subsequently, each radiologist assessed 7 CT scans, first without AI recommendations. AI recommendations were shown to the radiologist, and they could adjust their initial assessment. Half of the participants received the recommendations via black box AI output and half received explainable AI output. Mental model and psychological trust were measured before onboarding, after onboarding, and after assessing the 7 CT scans. We recorded whether radiologists changed their assessment on found nodules, malignancy prediction, and follow-up advice for each CT assessment. In addition, we analyzed whether radiologists’ trust in their assessments had changed based on the AI recommendations. Results: Both variations of onboarding tutorials resulted in a significantly improved mental model of the AI-CAD system (informative P=.01 and reflective P=.01). After using AI-CAD, psychological trust significantly decreased for the group with explainable AI output (P=.02). On the basis of the AI recommendations, radiologists changed the number of reported nodules in 27 of 140 assessments, malignancy prediction in 32 of 140 assessments, and follow-up advice in 12 of 140 assessments. The changes were mostly an increased number of reported nodules, a higher estimated probability of malignancy, and earlier follow-up. The radiologists’ confidence in their found nodules changed in 82 of 140 assessments, in their estimated probability of malignancy in 50 of 140 assessments, and in their follow-up advice in 28 of 140 assessments. These changes were predominantly increases in confidence. The number of changed assessments and radiologists’ confidence did not significantly differ between the groups that received different onboarding tutorials and AI outputs. Conclusions: Onboarding tutorials help radiologists gain a better understanding of AI-CAD and facilitate the formation of a correct mental model. If AI explanations do not consistently substantiate the probability of malignancy across patient cases, radiologists’ trust in the AI-CAD system can be impaired. Radiologists’ confidence in their assessments was improved by using the AI recommendations. %M 38875574 %R 10.2196/52211 %U https://ai.jmir.org/2024/1/e52211 %U https://doi.org/10.2196/52211 %U http://www.ncbi.nlm.nih.gov/pubmed/38875574 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 7 %N %P e55508 %T Assessing the Utility of Multimodal Large Language Models (GPT-4 Vision and Large Language and Vision Assistant) in Identifying Melanoma Across Different Skin Tones %A Cirone,Katrina %A Akrout,Mohamed %A Abid,Latif %A Oakley,Amanda %+ Schulich School of Medicine and Dentistry, Western University, 1151 Richmond Street, London, ON, N6A 5C1, Canada, 1 6475324596, kcirone2024@meds.uwo.ca %K melanoma %K nevus %K skin pigmentation %K artificial intelligence %K AI %K multimodal large language models %K large language model %K large language models %K LLM %K LLMs %K machine learning %K expert systems %K natural language processing %K NLP %K GPT %K GPT-4V %K dermatology %K skin %K lesion %K lesions %K cancer %K oncology %K visual %D 2024 %7 13.3.2024 %9 Research Letter %J JMIR Dermatol %G English %X The large language models GPT-4 Vision and Large Language and Vision Assistant are capable of understanding and accurately differentiating between benign lesions and melanoma, indicating potential incorporation into dermatologic care, medical research, and education. %M 38477960 %R 10.2196/55508 %U https://derma.jmir.org/2024/1/e55508 %U https://doi.org/10.2196/55508 %U http://www.ncbi.nlm.nih.gov/pubmed/38477960 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e54393 %T Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study %A Nakao,Takahiro %A Miki,Soichiro %A Nakamura,Yuta %A Kikuchi,Tomohiro %A Nomura,Yukihiro %A Hanaoka,Shouhei %A Yoshikawa,Takeharu %A Abe,Osamu %+ Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan, 81 358008666, tanakao-tky@umin.ac.jp %K AI %K artificial intelligence %K LLM %K large language model %K language model %K language models %K ChatGPT %K GPT-4 %K GPT-4V %K generative pretrained transformer %K image %K images %K imaging %K response %K responses %K exam %K examination %K exams %K examinations %K answer %K answers %K NLP %K natural language processing %K chatbot %K chatbots %K conversational agent %K conversational agents %K medical education %D 2024 %7 12.3.2024 %9 Original Paper %J JMIR Med Educ %G English %X Background: Previous research applying large language models (LLMs) to medicine was focused on text-based information. Recently, multimodal variants of LLMs acquired the capability of recognizing images. Objective: We aim to evaluate the image recognition capability of generative pretrained transformer (GPT)-4V, a recent multimodal LLM developed by OpenAI, in the medical field by testing how visual information affects its performance to answer questions in the 117th Japanese National Medical Licensing Examination. Methods: We focused on 108 questions that had 1 or more images as part of a question and presented GPT-4V with the same questions under two conditions: (1) with both the question text and associated images and (2) with the question text only. We then compared the difference in accuracy between the 2 conditions using the exact McNemar test. Results: Among the 108 questions with images, GPT-4V’s accuracy was 68% (73/108) when presented with images and 72% (78/108) when presented without images (P=.36). For the 2 question categories, clinical and general, the accuracies with and those without images were 71% (70/98) versus 78% (76/98; P=.21) and 30% (3/10) versus 20% (2/10; P≥.99), respectively. Conclusions: The additional information from the images did not significantly improve the performance of GPT-4V in the Japanese National Medical Licensing Examination. %M 38470459 %R 10.2196/54393 %U https://mededu.jmir.org/2024/1/e54393 %U https://doi.org/10.2196/54393 %U http://www.ncbi.nlm.nih.gov/pubmed/38470459 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e48295 %T Machine Learning Methods Using Artificial Intelligence Deployed on Electronic Health Record Data for Identification and Referral of At-Risk Patients From Primary Care Physicians to Eye Care Specialists: Retrospective, Case-Controlled Study %A Young,Joshua A %A Chang,Chin-Wen %A Scales,Charles W %A Menon,Saurabh V %A Holy,Chantal E %A Blackie,Caroline Adrienne %+ Medical and Scientific Operations, Johnson & Johnson MedTech, Vision, 7500 Centurion Parkway, Jacksonville, FL, 32256, United States, 1 9044331000, cblackie@its.jnj.com %K decision support for health professionals %K tools, programs and algorithms %K electronic health record %K primary care %K artificial intelligence %K AI %K prediction accuracy %K triaging %K AI model %K eye care %K ophthalmic %D 2024 %7 12.3.2024 %9 Original Paper %J JMIR AI %G English %X Background: Identification and referral of at-risk patients from primary care practitioners (PCPs) to eye care professionals remain a challenge. Approximately 1.9 million Americans suffer from vision loss as a result of undiagnosed or untreated ophthalmic conditions. In ophthalmology, artificial intelligence (AI) is used to predict glaucoma progression, recognize diabetic retinopathy (DR), and classify ocular tumors; however, AI has not yet been used to triage primary care patients for ophthalmology referral. Objective: This study aimed to build and compare machine learning (ML) methods, applicable to electronic health records (EHRs) of PCPs, capable of triaging patients for referral to eye care specialists. Methods: Accessing the Optum deidentified EHR data set, 743,039 patients with 5 leading vision conditions (age-related macular degeneration [AMD], visually significant cataract, DR, glaucoma, or ocular surface disease [OSD]) were exact-matched on age and gender to 743,039 controls without eye conditions. Between 142 and 182 non-ophthalmic parameters per patient were input into 5 ML methods: generalized linear model, L1-regularized logistic regression, random forest, Extreme Gradient Boosting (XGBoost), and J48 decision tree. Model performance was compared for each pathology to select the most predictive algorithm. The area under the curve (AUC) was assessed for all algorithms for each outcome. Results: XGBoost demonstrated the best performance, showing, respectively, a prediction accuracy and an AUC of 78.6% (95% CI 78.3%-78.9%) and 0.878 for visually significant cataract, 77.4% (95% CI 76.7%-78.1%) and 0.858 for exudative AMD, 79.2% (95% CI 78.8%-79.6%) and 0.879 for nonexudative AMD, 72.2% (95% CI 69.9%-74.5%) and 0.803 for OSD requiring medication, 70.8% (95% CI 70.5%-71.1%) and 0.785 for glaucoma, 85.0% (95% CI 84.2%-85.8%) and 0.924 for type 1 nonproliferative diabetic retinopathy (NPDR), 82.2% (95% CI 80.4%-84.0%) and 0.911 for type 1 proliferative diabetic retinopathy (PDR), 81.3% (95% CI 81.0%-81.6%) and 0.891 for type 2 NPDR, and 82.1% (95% CI 81.3%-82.9%) and 0.900 for type 2 PDR. Conclusions: The 5 ML methods deployed were able to successfully identify patients with elevated odds ratios (ORs), thus capable of patient triage, for ocular pathology ranging from 2.4 (95% CI 2.4-2.5) for glaucoma to 5.7 (95% CI 5.0-6.4) for type 1 NPDR, with an average OR of 3.9. The application of these models could enable PCPs to better identify and triage patients at risk for treatable ophthalmic pathology. Early identification of patients with unrecognized sight-threatening conditions may lead to earlier treatment and a reduced economic burden. More importantly, such triage may improve patients’ lives. %M 38875582 %R 10.2196/48295 %U https://ai.jmir.org/2024/1/e48295 %U https://doi.org/10.2196/48295 %U http://www.ncbi.nlm.nih.gov/pubmed/38875582 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e54593 %T Improving Quality of ICD-10 (International Statistical Classification of Diseases, Tenth Revision) Coding Using AI: Protocol for a Crossover Randomized Controlled Trial %A Chomutare,Taridzo %A Lamproudis,Anastasios %A Budrionis,Andrius %A Svenning,Therese Olsen %A Hind,Lill Irene %A Ngo,Phuong Dinh %A Mikalsen,Karl Øyvind %A Dalianis,Hercules %+ Department of Computer Science, UiT The Arctic University of Norway, Realfagbygget, Hansine Hansens vei 54, Tromsø, 9019, Norway, 47 47680032, taridzo.chomutare@uit.no %K International Classification of Diseases, Tenth Revision %K ICD-10 %K International Classification of Diseases, Eleventh Revision %K ICD-11 %K Easy-ICD %K clinical coding %K artificial intelligence %K machine learning %K deep learning %D 2024 %7 12.3.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Computer-assisted clinical coding (CAC) tools are designed to help clinical coders assign standardized codes, such as the ICD-10 (International Statistical Classification of Diseases, Tenth Revision), to clinical texts, such as discharge summaries. Maintaining the integrity of these standardized codes is important both for the functioning of health systems and for ensuring data used for secondary purposes are of high quality. Clinical coding is an error-prone cumbersome task, and the complexity of modern classification systems such as the ICD-11 (International Classification of Diseases, Eleventh Revision) presents significant barriers to implementation. To date, there have only been a few user studies; therefore, our understanding is still limited regarding the role CAC systems can play in reducing the burden of coding and improving the overall quality of coding. Objective: The objective of the user study is to generate both qualitative and quantitative data for measuring the usefulness of a CAC system, Easy-ICD, that was developed for recommending ICD-10 codes. Specifically, our goal is to assess whether our tool can reduce the burden on clinical coders and also improve coding quality. Methods: The user study is based on a crossover randomized controlled trial study design, where we measure the performance of clinical coders when they use our CAC tool versus when they do not. Performance is measured by the time it takes them to assign codes to both simple and complex clinical texts as well as the coding quality, that is, the accuracy of code assignment. Results: We expect the study to provide us with a measurement of the effectiveness of the CAC system compared to manual coding processes, both in terms of time use and coding quality. Positive outcomes from this study will imply that CAC tools hold the potential to reduce the burden on health care staff and will have major implications for the adoption of artificial intelligence–based CAC innovations to improve coding practice. Expected results to be published summer 2024. Conclusions: The planned user study promises a greater understanding of the impact CAC systems might have on clinical coding in real-life settings, especially with regard to coding time and quality. Further, the study may add new insights on how to meaningfully exploit current clinical text mining capabilities, with a view to reducing the burden on clinical coders, thus lowering the barriers and paving a more sustainable path to the adoption of modern coding systems, such as the new ICD-11. Trial Registration: clinicaltrials.gov NCT06286865; https://clinicaltrials.gov/study/NCT06286865 International Registered Report Identifier (IRRID): DERR1-10.2196/54593 %M 38470476 %R 10.2196/54593 %U https://www.researchprotocols.org/2024/1/e54593 %U https://doi.org/10.2196/54593 %U http://www.ncbi.nlm.nih.gov/pubmed/38470476 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e47803 %T Optimization of Using Multiple Machine Learning Approaches in Atrial Fibrillation Detection Based on a Large-Scale Data Set of 12-Lead Electrocardiograms: Cross-Sectional Study %A Chuang,Beau Bo-Sheng %A Yang,Albert C %+ Digital Medicine and Smart Healthcare Research Center, National Yang Ming Chiao Tung University, No 155, Li-Nong St, Sec.2, Beitou District, Taipei, 112304, Taiwan, 886 228267995, accyang@nycu.edu.tw %K machine learning %K atrial fibrillation %K light gradient boosting machine %K power spectral density %K digital health %K electrocardiogram %K machine learning algorithm %K atrial fibrillation detection %K real-time %K detection %K electrocardiography leads %K clinical outcome %D 2024 %7 11.3.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Atrial fibrillation (AF) represents a hazardous cardiac arrhythmia that significantly elevates the risk of stroke and heart failure. Despite its severity, its diagnosis largely relies on the proficiency of health care professionals. At present, the real-time identification of paroxysmal AF is hindered by the lack of automated techniques. Consequently, a highly effective machine learning algorithm specifically designed for AF detection could offer substantial clinical benefits. We hypothesized that machine learning algorithms have the potential to identify and extract features of AF with a high degree of accuracy, given the intricate and distinctive patterns present in electrocardiogram (ECG) recordings of AF. Objective: This study aims to develop a clinically valuable machine learning algorithm that can accurately detect AF and compare different leads’ performances of AF detection. Methods: We used 12-lead ECG recordings sourced from the 2020 PhysioNet Challenge data sets. The Welch method was used to extract power spectral features of the 12-lead ECGs within a frequency range of 0.083 to 24.92 Hz. Subsequently, various machine learning techniques were evaluated and optimized to classify sinus rhythm (SR) and AF based on these power spectral features. Furthermore, we compared the effects of different frequency subbands and different lead selections on machine learning performances. Results: The light gradient boosting machine (LightGBM) was found to be the most effective in classifying AF and SR, achieving an average F1-score of 0.988 across all ECG leads. Among the frequency subbands, the 0.083 to 4.92 Hz range yielded the highest F1-score of 0.985. In interlead comparisons, aVR had the highest performance (F1=0.993), with minimal differences observed between leads. Conclusions: In conclusion, this study successfully used machine learning methodologies, particularly the LightGBM model, to differentiate SR and AF based on power spectral features derived from 12-lead ECGs. The performance marked by an average F1-score of 0.988 and minimal interlead variation underscores the potential of machine learning algorithms to bolster real-time AF detection. This advancement could significantly improve patient care in intensive care units as well as facilitate remote monitoring through wearable devices, ultimately enhancing clinical outcomes. %M 38466973 %R 10.2196/47803 %U https://formative.jmir.org/2024/1/e47803 %U https://doi.org/10.2196/47803 %U http://www.ncbi.nlm.nih.gov/pubmed/38466973 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e53008 %T Generative AI in Medical Practice: In-Depth Exploration of Privacy and Security Challenges %A Chen,Yan %A Esmaeilzadeh,Pouyan %+ Department of Information Systems and Business Analytics, College of Business, Florida International University, Modesto A Maidique Campus, 11200 SW 8th St, RB 261 B, Miami, FL, 33199, United States, 1 3053483302, pesmaeil@fiu.edu %K artificial intelligence %K AI %K generative artificial intelligence %K generative AI %K medical practices %K potential benefits %K security and privacy threats %D 2024 %7 8.3.2024 %9 Viewpoint %J J Med Internet Res %G English %X As advances in artificial intelligence (AI) continue to transform and revolutionize the field of medicine, understanding the potential uses of generative AI in health care becomes increasingly important. Generative AI, including models such as generative adversarial networks and large language models, shows promise in transforming medical diagnostics, research, treatment planning, and patient care. However, these data-intensive systems pose new threats to protected health information. This Viewpoint paper aims to explore various categories of generative AI in health care, including medical diagnostics, drug discovery, virtual health assistants, medical research, and clinical decision support, while identifying security and privacy threats within each phase of the life cycle of such systems (ie, data collection, model development, and implementation phases). The objectives of this study were to analyze the current state of generative AI in health care, identify opportunities and privacy and security challenges posed by integrating these technologies into existing health care infrastructure, and propose strategies for mitigating security and privacy risks. This study highlights the importance of addressing the security and privacy threats associated with generative AI in health care to ensure the safe and effective use of these systems. The findings of this study can inform the development of future generative AI systems in health care and help health care organizations better understand the potential benefits and risks associated with these systems. By examining the use cases and benefits of generative AI across diverse domains within health care, this paper contributes to theoretical discussions surrounding AI ethics, security vulnerabilities, and data privacy regulations. In addition, this study provides practical insights for stakeholders looking to adopt generative AI solutions within their organizations. %M 38457208 %R 10.2196/53008 %U https://www.jmir.org/2024/1/e53008 %U https://doi.org/10.2196/53008 %U http://www.ncbi.nlm.nih.gov/pubmed/38457208 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e45202 %T A Deep Learning–Based Approach for Prediction of Vancomycin Treatment Monitoring: Retrospective Study Among Patients With Critical Illness %A Kim,Dohyun %A Choi,Hyun-Soo %A Lee,DongHoon %A Kim,Minkyu %A Kim,Yoon %A Han,Seon-Sook %A Heo,Yeonjeong %A Park,Ju-Hee %A Park,Jinkyeong %+ Department of Pulmonary, Allergy and Critical Care Medicine, School of Medicine, Kyung Hee University Hospital at Gangdong, 892, Dongnam-ro, Gangdong-gu, Seoul, 05278, Republic of Korea, 82 1027747808, pjk3318@gmail.com %K critically ill %K deep learning %K inflammation %K machine learning %K pharmacokinetic %K therapeutic drug monitoring %K vancomycin %D 2024 %7 8.3.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Vancomycin pharmacokinetics are highly variable in patients with critical illnesses, and clinicians commonly use population pharmacokinetic (PPK) models based on a Bayesian approach to dose. However, these models are population-dependent, may only sometimes meet the needs of individual patients, and are only used by experienced clinicians as a reference for making treatment decisions. To assist real-world clinicians, we developed a deep learning–based decision-making system that predicts vancomycin therapeutic drug monitoring (TDM) levels in patients in intensive care unit. Objective: This study aimed to establish joint multilayer perceptron (JointMLP), a new deep-learning model for predicting vancomycin TDM levels, and compare its performance with the PPK models, extreme gradient boosting (XGBoost), and TabNet. Methods: We used a 977-case data set split into training and testing groups in a 9:1 ratio. We performed external validation of the model using 1429 cases from Kangwon National University Hospital and 2394 cases from the Medical Information Mart for Intensive Care–IV (MIMIC-IV). In addition, we performed 10-fold cross-validation on the internal training data set and calculated the 95% CIs using the metric. Finally, we evaluated the generalization ability of the JointMLP model using the MIMIC-IV data set. Results: Our JointMLP model outperformed other models in predicting vancomycin TDM levels in internal and external data sets. Compared to PPK, the JointMLP model improved predictive power by up to 31% (mean absolute error [MAE] 6.68 vs 5.11) on the internal data set and 81% (MAE 11.87 vs 6.56) on the external data set. In addition, the JointMLP model significantly outperforms XGBoost and TabNet, with a 13% (MAE 5.75 vs 5.11) and 14% (MAE 5.85 vs 5.11) improvement in predictive accuracy on the inner data set, respectively. On both the internal and external data sets, our JointMLP model performed well compared to XGBoost and TabNet, achieving prediction accuracy improvements of 34% and 14%, respectively. Additionally, our JointMLP model showed higher robustness to outlier data than the other models, as evidenced by its higher root mean squared error performance across all data sets. The mean errors and variances of the JointMLP model were close to zero and smaller than those of the PPK model in internal and external data sets. Conclusions: Our JointMLP approach can help optimize treatment outcomes in patients with critical illnesses in an intensive care unit setting, reducing side effects associated with suboptimal vancomycin administration. These include increased risk of bacterial resistance, extended hospital stays, and increased health care costs. In addition, the superior performance of our model compared to existing models highlights its potential to help real-world clinicians. %M 38152042 %R 10.2196/45202 %U https://formative.jmir.org/2024/1/e45202 %U https://doi.org/10.2196/45202 %U http://www.ncbi.nlm.nih.gov/pubmed/38152042 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e46817 %T Comparison of the Discrimination Performance of AI Scoring and the Brixia Score in Predicting COVID-19 Severity on Chest X-Ray Imaging: Diagnostic Accuracy Study %A Tenda,Eric Daniel %A Yunus,Reyhan Eddy %A Zulkarnaen,Benny %A Yugo,Muhammad Reynalzi %A Pitoyo,Ceva Wicaksono %A Asaf,Moses Mazmur %A Islamiyati,Tiara Nur %A Pujitresnani,Arierta %A Setiadharma,Andry %A Henrina,Joshua %A Rumende,Cleopas Martin %A Wulani,Vally %A Harimurti,Kuntjoro %A Lydia,Aida %A Shatri,Hamzah %A Soewondo,Pradana %A Yusuf,Prasandhya Astagiri %+ Department of Medical Physiology and Biophysics/ Medical Technology Cluster IMERI, Faculty of Medicine, Universitas Indonesia, Jalan Salemba Raya No.6, Jakarta, 10430, Indonesia, 62 812 8459 4272, prasandhya.a.yusuf@ui.ac.id %K artificial intelligence %K Brixia %K chest x-ray %K COVID-19 %K CAD4COVID %K pneumonia %K radiograph %K artificial intelligence scoring system %K AI scoring system %K prediction %K disease severity %D 2024 %7 7.3.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: The artificial intelligence (AI) analysis of chest x-rays can increase the precision of binary COVID-19 diagnosis. However, it is unknown if AI-based chest x-rays can predict who will develop severe COVID-19, especially in low- and middle-income countries. Objective: The study aims to compare the performance of human radiologist Brixia scores versus 2 AI scoring systems in predicting the severity of COVID-19 pneumonia. Methods: We performed a cross-sectional study of 300 patients suspected with and with confirmed COVID-19 infection in Jakarta, Indonesia. A total of 2 AI scores were generated using CAD4COVID x-ray software. Results: The AI probability score had slightly lower discrimination (area under the curve [AUC] 0.787, 95% CI 0.722-0.852). The AI score for the affected lung area (AUC 0.857, 95% CI 0.809-0.905) was almost as good as the human Brixia score (AUC 0.863, 95% CI 0.818-0.908). Conclusions: The AI score for the affected lung area and the human radiologist Brixia score had similar and good discrimination performance in predicting COVID-19 severity. Our study demonstrated that using AI-based diagnostic tools is possible, even in low-resource settings. However, before it is widely adopted in daily practice, more studies with a larger scale and that are prospective in nature are needed to confirm our findings. %M 38451633 %R 10.2196/46817 %U https://formative.jmir.org/2024/1/e46817 %U https://doi.org/10.2196/46817 %U http://www.ncbi.nlm.nih.gov/pubmed/38451633 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 11 %N %P e52885 %T Leveraging Generative AI Tools to Support the Development of Digital Solutions in Health Care Research: Case Study %A Rodriguez,Danissa V %A Lawrence,Katharine %A Gonzalez,Javier %A Brandfield-Harvey,Beatrix %A Xu,Lynn %A Tasneem,Sumaiya %A Levine,Defne L %A Mann,Devin %+ Department of Population Health, New York University Grossman School of Medicine, 227 East 30th Street, 6th Floor, New York, NY, 10016, United States, 1 646 501 2684, danissa.rodriguez@nyulangone.org %K digital health %K GenAI %K generative %K artificial intelligence %K ChatGPT %K software engineering %K mHealth %K mobile health %K app %K apps %K application %K applications %K diabetes %K diabetic %K diabetes prevention %K digital prescription %K software %K engagement %K behaviour change %K behavior change %K developer %K developers %K LLM %K LLMs %K language model %K language models %K NLP %K natural language processing %D 2024 %7 6.3.2024 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Generative artificial intelligence has the potential to revolutionize health technology product development by improving coding quality, efficiency, documentation, quality assessment and review, and troubleshooting. Objective: This paper explores the application of a commercially available generative artificial intelligence tool (ChatGPT) to the development of a digital health behavior change intervention designed to support patient engagement in a commercial digital diabetes prevention program. Methods: We examined the capacity, advantages, and limitations of ChatGPT to support digital product idea conceptualization, intervention content development, and the software engineering process, including software requirement generation, software design, and code production. In total, 11 evaluators, each with at least 10 years of experience in fields of study ranging from medicine and implementation science to computer science, participated in the output review process (ChatGPT vs human-generated output). All had familiarity or prior exposure to the original personalized automatic messaging system intervention. The evaluators rated the ChatGPT-produced outputs in terms of understandability, usability, novelty, relevance, completeness, and efficiency. Results: Most metrics received positive scores. We identified that ChatGPT can (1) support developers to achieve high-quality products faster and (2) facilitate nontechnical communication and system understanding between technical and nontechnical team members around the development goal of rapid and easy-to-build computational solutions for medical technologies. Conclusions: ChatGPT can serve as a usable facilitator for researchers engaging in the software development life cycle, from product conceptualization to feature identification and user story development to code generation. Trial Registration: ClinicalTrials.gov NCT04049500; https://clinicaltrials.gov/ct2/show/NCT04049500 %M 38446539 %R 10.2196/52885 %U https://humanfactors.jmir.org/2024/1/e52885 %U https://doi.org/10.2196/52885 %U http://www.ncbi.nlm.nih.gov/pubmed/38446539 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 7 %N %P e50163 %T Readability and Health Literacy Scores for ChatGPT-Generated Dermatology Public Education Materials: Cross-Sectional Analysis of Sunscreen and Melanoma Questions %A Roster,Katie %A Kann,Rebecca B %A Farabi,Banu %A Gronbeck,Christian %A Brownstone,Nicholas %A Lipner,Shari R %+ Department of Dermatology, Weill Cornell Medicine, 1305 York Ave 9th Floor, New York, NY, 10021, United States, 1 646 962 3376, shl9032@med.cornell.edu %K ChatGPT %K artificial intelligence %K AI %K LLM %K LLMs %K large language model %K language model %K language models %K generative %K NLP %K natural language processing %K health disparities %K health literacy %K readability %K disparities %K disparity %K dermatology %K health information %K comprehensible %K comprehensibility %K understandability %K patient education %K public education %K health education %K online information %D 2024 %7 6.3.2024 %9 Research Letter %J JMIR Dermatol %G English %X %M 38446502 %R 10.2196/50163 %U https://derma.jmir.org/2024/1/e50163 %U https://doi.org/10.2196/50163 %U http://www.ncbi.nlm.nih.gov/pubmed/38446502 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 7 %N %P e48451 %T Potential Use of ChatGPT in Responding to Patient Questions and Creating Patient Resources %A Reynolds,Kelly %A Tejasvi,Trilokraj %+ Department of Dermatology, University of Michigan, 1500 East Medical Center Drive, Ann Arbor, MI, 48109, United States, 1 7349364054, ttejasvi@med.umich.edu %K artificial intelligence %K AI %K ChatGPT %K patient resources %K patient handouts %K natural language processing software %K language model %K language models %K natural language processing %K chatbot %K chatbots %K conversational agent %K conversational agents %K patient education %K educational resource %K educational %D 2024 %7 6.3.2024 %9 Viewpoint %J JMIR Dermatol %G English %X ChatGPT (OpenAI) is an artificial intelligence–based free natural language processing model that generates complex responses to user-generated prompts. The advent of this tool comes at a time when physician burnout is at an all-time high, which is attributed at least in part to time spent outside of the patient encounter within the electronic medical record (documenting the encounter, responding to patient messages, etc). Although ChatGPT is not specifically designed to provide medical information, it can generate preliminary responses to patients’ questions about their medical conditions and can precipitately create educational patient resources, which do inevitably require rigorous editing and fact-checking on the part of the health care provider to ensure accuracy. In this way, this assistive technology has the potential to not only enhance a physician’s efficiency and work-life balance but also enrich the patient-physician relationship and ultimately improve patient outcomes. %M 38446541 %R 10.2196/48451 %U https://derma.jmir.org/2024/1/e48451 %U https://doi.org/10.2196/48451 %U http://www.ncbi.nlm.nih.gov/pubmed/38446541 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51837 %T What’s in a Name? Experimental Evidence of Gender Bias in Recommendation Letters Generated by ChatGPT %A Kaplan,Deanna M %A Palitsky,Roman %A Arconada Alvarez,Santiago J %A Pozzo,Nicole S %A Greenleaf,Morgan N %A Atkinson,Ciara A %A Lam,Wilbur A %+ Department of Family and Preventive Medicine, Emory University School of Medicine, Administrative Offices, Wesley Woods Campus, 1841 Clifton Road, NE, 5th Floor, Atlanta, GA, 30329, United States, 1 520 370 6752, deanna.m.kaplan@emory.edu %K chatbot %K generative artificial intelligence %K generative AI %K gender bias %K large language models %K letters of recommendation %K recommendation letter %K language model %K chatbots %K artificial intelligence %K AI %K gender-based language %K human written %K real-world %K scenario %D 2024 %7 5.3.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence chatbots such as ChatGPT (OpenAI) have garnered excitement about their potential for delegating writing tasks ordinarily performed by humans. Many of these tasks (eg, writing recommendation letters) have social and professional ramifications, making the potential social biases in ChatGPT’s underlying language model a serious concern. Objective: Three preregistered studies used the text analysis program Linguistic Inquiry and Word Count to investigate gender bias in recommendation letters written by ChatGPT in human-use sessions (N=1400 total letters). Methods: We conducted analyses using 22 existing Linguistic Inquiry and Word Count dictionaries, as well as 6 newly created dictionaries based on systematic reviews of gender bias in recommendation letters, to compare recommendation letters generated for the 200 most historically popular “male” and “female” names in the United States. Study 1 used 3 different letter-writing prompts intended to accentuate professional accomplishments associated with male stereotypes, female stereotypes, or neither. Study 2 examined whether lengthening each of the 3 prompts while holding the between-prompt word count constant modified the extent of bias. Study 3 examined the variability within letters generated for the same name and prompts. We hypothesized that when prompted with gender-stereotyped professional accomplishments, ChatGPT would evidence gender-based language differences replicating those found in systematic reviews of human-written recommendation letters (eg, more affiliative, social, and communal language for female names; more agentic and skill-based language for male names). Results: Significant differences in language between letters generated for female versus male names were observed across all prompts, including the prompt hypothesized to be neutral, and across nearly all language categories tested. Historically female names received significantly more social referents (5/6, 83% of prompts), communal or doubt-raising language (4/6, 67% of prompts), personal pronouns (4/6, 67% of prompts), and clout language (5/6, 83% of prompts). Contradicting the study hypotheses, some gender differences (eg, achievement language and agentic language) were significant in both the hypothesized and nonhypothesized directions, depending on the prompt. Heteroscedasticity between male and female names was observed in multiple linguistic categories, with greater variance for historically female names than for historically male names. Conclusions: ChatGPT reproduces many gender-based language biases that have been reliably identified in investigations of human-written reference letters, although these differences vary across prompts and language categories. Caution should be taken when using ChatGPT for tasks that have social consequences, such as reference letter writing. The methods developed in this study may be useful for ongoing bias testing among progressive generations of chatbots across a range of real-world scenarios. Trial Registration: OSF Registries osf.io/ztv96; https://osf.io/ztv96 %M 38441945 %R 10.2196/51837 %U https://www.jmir.org/2024/1/e51837 %U https://doi.org/10.2196/51837 %U http://www.ncbi.nlm.nih.gov/pubmed/38441945 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e49022 %T Promises, Pitfalls, and Clinical Applications of Artificial Intelligence in Pediatrics %A Bhargava,Hansa %A Salomon,Carmela %A Suresh,Srinivasan %A Chang,Anthony %A Kilian,Rachel %A Stijn,Diana van %A Oriol,Albert %A Low,Daniel %A Knebel,Ashley %A Taraman,Sharief %+ Cognoa, Inc, 2185 Park Blvd, Palo Alto, CA, 94306, United States, 1 8664264622, carmela.salomon@cognoa.com %K artificial intelligence %K pediatrics %K autism spectrum disorder %K ASD %K disparities %K pediatric %K youth %K child %K children %K autism %K autistic %K barrier %K barriers %K clinical application %K clinical applications %K professional development %K continuing education %K continuing medical education %K CME %K implementation %D 2024 %7 29.2.2024 %9 Viewpoint %J J Med Internet Res %G English %X Artificial intelligence (AI) broadly describes a branch of computer science focused on developing machines capable of performing tasks typically associated with human intelligence. Those who connect AI with the world of science fiction may meet its growing rise with hesitancy or outright skepticism. However, AI is becoming increasingly pervasive in our society, from algorithms helping to sift through airline fares to substituting words in emails and SMS text messages based on user choices. Data collection is ongoing and is being leveraged by software platforms to analyze patterns and make predictions across multiple industries. Health care is gradually becoming part of this technological transformation, as advancements in computational power and storage converge with the rapid expansion of digitized medical information. Given the growing and inevitable integration of AI into health care systems, it is our viewpoint that pediatricians urgently require training and orientation to the uses, promises, and pitfalls of AI in medicine. AI is unlikely to solve the full array of complex challenges confronting pediatricians today; however, if used responsibly, it holds great potential to improve many aspects of care for providers, children, and families. Our aim in this viewpoint is to provide clinicians with a targeted introduction to the field of AI in pediatrics, including key promises, pitfalls, and clinical applications, so they can play a more active role in shaping the future impact of AI in medicine. %M 38421690 %R 10.2196/49022 %U https://www.jmir.org/2024/1/e49022 %U https://doi.org/10.2196/49022 %U http://www.ncbi.nlm.nih.gov/pubmed/38421690 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e52155 %T Using AI Text-to-Image Generation to Create Novel Illustrations for Medical Education: Current Limitations as Illustrated by Hypothyroidism and Horner Syndrome %A Kumar,Ajay %A Burr,Pierce %A Young,Tim Michael %+ Queen Square Institute of Neurology, University College London, Number 7 Queen Square, London, WC1N 3BG, United Kingdom, 44 2031082781, t.young@ucl.ac.uk %K artificial intelligence %K AI %K medical illustration %K medical images %K medical education %K image %K images %K illustration %K illustrations %K photo %K photos %K photographs %K face %K facial %K paralysis %K photograph %K photography %K Horner's syndrome %K Horner syndrome %K Bernard syndrome %K Bernard's syndrome %K miosis %K oculosympathetic %K ptosis %K ophthalmoplegia %K nervous system %K autonomic %K eye %K eyes %K pupil %K pupils %K neurologic %K neurological %D 2024 %7 22.2.2024 %9 Research Letter %J JMIR Med Educ %G English %X Our research letter investigates the potential, as well as the current limitations, of widely available text-to-image tools in generating images for medical education. We focused on illustrations of important physical signs in the face (for which confidentiality issues in conventional patient photograph use may be a particular concern) that medics should know about, and we used facial images of hypothyroidism and Horner syndrome as examples. %M 38386400 %R 10.2196/52155 %U https://mededu.jmir.org/2024/1/e52155 %U https://doi.org/10.2196/52155 %U http://www.ncbi.nlm.nih.gov/pubmed/38386400 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 12 %N %P e44406 %T Mobile Apps for COVID-19 Detection and Diagnosis for Future Pandemic Control: Multidimensional Systematic Review %A Gheisari,Mehdi %A Ghaderzadeh,Mustafa %A Li,Huxiong %A Taami,Tania %A Fernández-Campusano,Christian %A Sadeghsalehi,Hamidreza %A Afzaal Abbasi,Aaqif %+ School of Nursing and Health Sciences of Boukan, Urmia University of Medical Sciences, Kurdistan Blv Boukan, Urmia, 5951715161, Iran, 98 9129378390, Mustaf.ghaderzadeh@sbmu.ac.ir %K COVID-19 %K detection %K diagnosis %K internet of things %K cloud computing %K mobile applications %K mobile app %K mobile apps %K artificial intelligence: AI %K mobile phone %K smartphone %D 2024 %7 22.2.2024 %9 Review %J JMIR Mhealth Uhealth %G English %X Background: In the modern world, mobile apps are essential for human advancement, and pandemic control is no exception. The use of mobile apps and technology for the detection and diagnosis of COVID-19 has been the subject of numerous investigations, although no thorough analysis of COVID-19 pandemic prevention has been conducted using mobile apps, creating a gap. Objective: With the intention of helping software companies and clinical researchers, this study provides comprehensive information regarding the different fields in which mobile apps were used to diagnose COVID-19 during the pandemic. Methods: In this systematic review, 535 studies were found after searching 5 major research databases (ScienceDirect, Scopus, PubMed, Web of Science, and IEEE). Of these, only 42 (7.9%) studies concerned with diagnosing and detecting COVID-19 were chosen after applying inclusion and exclusion criteria using the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) protocol. Results: Mobile apps were categorized into 6 areas based on the content of these 42 studies: contact tracing, data gathering, data visualization, artificial intelligence (AI)–based diagnosis, rule- and guideline-based diagnosis, and data transformation. Patients with COVID-19 were identified via mobile apps using a variety of clinical, geographic, demographic, radiological, serological, and laboratory data. Most studies concentrated on using AI methods to identify people who might have COVID-19. Additionally, symptoms, cough sounds, and radiological images were used more frequently compared to other data types. Deep learning techniques, such as convolutional neural networks, performed comparatively better in the processing of health care data than other types of AI techniques, which improved the diagnosis of COVID-19. Conclusions: Mobile apps could soon play a significant role as a powerful tool for data collection, epidemic health data analysis, and the early identification of suspected cases. These technologies can work with the internet of things, cloud storage, 5th-generation technology, and cloud computing. Processing pipelines can be moved to mobile device processing cores using new deep learning methods, such as lightweight neural networks. In the event of future pandemics, mobile apps will play a critical role in rapid diagnosis using various image data and clinical symptoms. Consequently, the rapid diagnosis of these diseases can improve the management of their effects and obtain excellent results in treating patients. %M 38231538 %R 10.2196/44406 %U https://mhealth.jmir.org/2024/1/e44406 %U https://doi.org/10.2196/44406 %U http://www.ncbi.nlm.nih.gov/pubmed/38231538 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e51996 %T Comprehensive Assessment and Early Prediction of Gross Motor Performance in Toddlers With Graph Convolutional Networks–Based Deep Learning: Development and Validation Study %A Chun,Sulim %A Jang,Sooyoung %A Kim,Jin Yong %A Ko,Chanyoung %A Lee,JooHyun %A Hong,JaeSeong %A Park,Yu Rang %+ Department of Biomedical Systems Informatics, Yonsei University College of Medicine, 6th floor, 50-1 Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea, 82 2 2228 2493, yurangpark@yuhs.ac %K child development %K digital health %K artificial intelligence %K gross %K motor %K movement %K development %K developmental %K machine learning %K pediatric %K pediatrics %K paediatric %K paediatrics %K toddler %K toddlers %K child %K children %K limb %K limbs %K algorithm %K algorithms %K kinesiology %K GCN %K graph convolutional networks %K convolutional network %D 2024 %7 21.2.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Accurate and timely assessment of children’s developmental status is crucial for early diagnosis and intervention. More accurate and automated developmental assessments are essential due to the lack of trained health care providers and imprecise parental reporting. In various areas of development, gross motor development in toddlers is known to be predictive of subsequent childhood developments. Objective: The purpose of this study was to develop a model to assess gross motor behavior and integrate the results to determine the overall gross motor status of toddlers. This study also aimed to identify behaviors that are important in the assessment of overall gross motor skills and detect critical moments and important body parts for the assessment of each behavior. Methods: We used behavioral videos of toddlers aged 18-35 months. To assess gross motor development, we selected 4 behaviors (climb up the stairs, go down the stairs, throw the ball, and stand on 1 foot) that have been validated with the Korean Developmental Screening Test for Infants and Children. In the child behavior videos, we estimated each child’s position as a bounding box and extracted human keypoints within the box. In the first stage, the videos with the extracted human keypoints of each behavior were evaluated separately using a graph convolutional networks (GCN)–based algorithm. The probability values obtained for each label in the first-stage model were used as input for the second-stage model, the extreme gradient boosting (XGBoost) algorithm, to predict the overall gross motor status. For interpretability, we used gradient-weighted class activation mapping (Grad-CAM) to identify important moments and relevant body parts during the movements. The Shapley additive explanations method was used for the assessment of variable importance, to determine the movements that contributed the most to the overall developmental assessment. Results: Behavioral videos of 4 gross motor skills were collected from 147 children, resulting in a total of 2395 videos. The stage-1 GCN model to evaluate each behavior had an area under the receiver operating characteristic curve (AUROC) of 0.79 to 0.90. Keypoint-mapping Grad-CAM visualization identified important moments in each behavior and differences in important body parts. The stage-2 XGBoost model to assess the overall gross motor status had an AUROC of 0.90. Among the 4 behaviors, “go down the stairs” contributed the most to the overall developmental assessment. Conclusions: Using movement videos of toddlers aged 18-35 months, we developed objective and automated models to evaluate each behavior and assess each child’s overall gross motor performance. We identified the important behaviors for assessing gross motor performance and developed methods to recognize important moments and body parts while evaluating gross motor performance. %M 38381519 %R 10.2196/51996 %U https://formative.jmir.org/2024/1/e51996 %U https://doi.org/10.2196/51996 %U http://www.ncbi.nlm.nih.gov/pubmed/38381519 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e46500 %T AI Education for Fourth-Year Medical Students: Two-Year Experience of a Web-Based, Self-Guided Curriculum and Mixed Methods Study %A Abid,Areeba %A Murugan,Avinash %A Banerjee,Imon %A Purkayastha,Saptarshi %A Trivedi,Hari %A Gichoya,Judy %+ Emory University School of Medicine, 2015 Uppergate Dr, Atlanta, GA, 30307, United States, 1 (404) 727 4018, areeba.abid@emory.edu %K medical education %K machine learning %K artificial intelligence %K elective curriculum %K medical student %K student %K students %K elective %K electives %K curricula %K curriculum %K lesson plan %K lesson plans %K educators %K educator %K teacher %K teachers %K teaching %K computer programming %K programming %K coding %K programmer %K programmers %K self guided %K self directed %D 2024 %7 20.2.2024 %9 Original Paper %J JMIR Med Educ %G English %X Background: Artificial intelligence (AI) and machine learning (ML) are poised to have a substantial impact in the health care space. While a plethora of web-based resources exist to teach programming skills and ML model development, there are few introductory curricula specifically tailored to medical students without a background in data science or programming. Programs that do exist are often restricted to a specific specialty. Objective: We hypothesized that a 1-month elective for fourth-year medical students, composed of high-quality existing web-based resources and a project-based structure, would empower students to learn about the impact of AI and ML in their chosen specialty and begin contributing to innovation in their field of interest. This study aims to evaluate the success of this elective in improving self-reported confidence scores in AI and ML. The authors also share our curriculum with other educators who may be interested in its adoption. Methods: This elective was offered in 2 tracks: technical (for students who were already competent programmers) and nontechnical (with no technical prerequisites, focusing on building a conceptual understanding of AI and ML). Students established a conceptual foundation of knowledge using curated web-based resources and relevant research papers, and were then tasked with completing 3 projects in their chosen specialty: a data set analysis, a literature review, and an AI project proposal. The project-based nature of the elective was designed to be self-guided and flexible to each student’s interest area and career goals. Students’ success was measured by self-reported confidence in AI and ML skills in pre and postsurveys. Qualitative feedback on students’ experiences was also collected. Results: This web-based, self-directed elective was offered on a pass-or-fail basis each month to fourth-year students at Emory University School of Medicine beginning in May 2021. As of June 2022, a total of 19 students had successfully completed the elective, representing a wide range of chosen specialties: diagnostic radiology (n=3), general surgery (n=1), internal medicine (n=5), neurology (n=2), obstetrics and gynecology (n=1), ophthalmology (n=1), orthopedic surgery (n=1), otolaryngology (n=2), pathology (n=2), and pediatrics (n=1). Students’ self-reported confidence scores for AI and ML rose by 66% after this 1-month elective. In qualitative surveys, students overwhelmingly reported enthusiasm and satisfaction with the course and commented that the self-direction and flexibility and the project-based design of the course were essential. Conclusions: Course participants were successful in diving deep into applications of AI in their widely-ranging specialties, produced substantial project deliverables, and generally reported satisfaction with their elective experience. The authors are hopeful that a brief, 1-month investment in AI and ML education during medical school will empower this next generation of physicians to pave the way for AI and ML innovation in health care. %M 38376896 %R 10.2196/46500 %U https://mededu.jmir.org/2024/1/e46500 %U https://doi.org/10.2196/46500 %U http://www.ncbi.nlm.nih.gov/pubmed/38376896 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e53654 %T Development of Cost-Effective Fatty Liver Disease Prediction Models in a Chinese Population: Statistical and Machine Learning Approaches %A Zhang,Liang %A Huang,Yueqing %A Huang,Min %A Zhao,Chun-Hua %A Zhang,Yan-Jun %A Wang,Yi %+ Department of General Practice, The Affiliated Suzhou Hospital of Nanjing Medical University, 16 Baitaxi Road, Gusu District, Suzhou, 215000, China, 86 13812757566, huangyq_sz@163.com %K NAFLD %K artificial intelligence %K public health %K transient elastography %K diagnosis %D 2024 %7 16.2.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: The increasing prevalence of nonalcoholic fatty liver disease (NAFLD) in China presents a significant public health concern. Traditional ultrasound, commonly used for fatty liver screening, often lacks the ability to accurately quantify steatosis, leading to insufficient follow-up for patients with moderate-to-severe steatosis. Transient elastography (TE) provides a more quantitative diagnosis of steatosis and fibrosis, closely aligning with biopsy results. Moreover, machine learning (ML) technology holds promise for developing more precise diagnostic models for NAFLD using a variety of laboratory indicators. Objective: This study aims to develop a novel ML-based diagnostic model leveraging TE results for staging hepatic steatosis. The objective was to streamline the model’s input features, creating a cost-effective and user-friendly tool to distinguish patients with NAFLD requiring follow-up. This innovative approach merges TE and ML to enhance diagnostic accuracy and efficiency in NAFLD assessment. Methods: The study involved a comprehensive analysis of health examination records from Suzhou Municipal Hospital, spanning from March to May 2023. Patient data and questionnaire responses were meticulously inputted into Microsoft Excel 2019, followed by thorough data cleaning and model development using Python 3.7, with libraries scikit-learn and numpy to ensure data accuracy. A cohort comprising 978 residents with complete medical records and TE results was included for analysis. Various classification models, including logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), light gradient boosting machine (LightGBM), and extreme gradient boosting (XGBoost), were constructed and evaluated based on the area under the receiver operating characteristic curve (AUROC). Results: Among the 916 patients included in the study, 273 were diagnosed with moderate-to-severe NAFLD. The concordance rate between traditional ultrasound and TE for detecting moderate-to-severe NAFLD was 84.6% (231/273). The AUROC values for the RF, LightGBM, XGBoost, SVM, KNN, and LR models were 0.91, 0.86, 0.83, 0.88, 0.77, and 0.81, respectively. These models achieved accuracy rates of 84%, 81%, 78%, 81%, 76%, and 77%, respectively. Notably, the RF model exhibited the best performance. A simplified RF model was developed with an AUROC of 0.88, featuring 62% sensitivity and 90% specificity. This simplified model used 6 key features: waist circumference, BMI, fasting plasma glucose, uric acid, total bilirubin, and high-sensitivity C-reactive protein. This approach offers a cost-effective and user-friendly tool while streamlining feature acquisition for training purposes. Conclusions: The study introduces a groundbreaking, cost-effective ML algorithm that leverages health examination data for identifying moderate-to-severe NAFLD. This model has the potential to significantly impact public health by enabling targeted investigations and interventions for NAFLD. By integrating TE and ML technologies, the study showcases innovative approaches to advancing NAFLD diagnostics. %M 38363597 %R 10.2196/53654 %U https://formative.jmir.org/2024/1/e53654 %U https://doi.org/10.2196/53654 %U http://www.ncbi.nlm.nih.gov/pubmed/38363597 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e52164 %T Human-Written vs AI-Generated Texts in Orthopedic Academic Literature: Comparative Qualitative Analysis %A Hakam,Hassan Tarek %A Prill,Robert %A Korte,Lisa %A Lovreković,Bruno %A Ostojić,Marko %A Ramadanov,Nikolai %A Muehlensiepen,Felix %+ Center of Orthopaedics and Trauma Surgery, University Clinic of Brandenburg, Brandenburg Medical School, Hochstr 29, Brandenburg an der Havel, 14770, Germany, 49 03381 411940, hassantarek.hakam@mhb-fontane.de %K artificial intelligence %K AI %K large language model %K LLM %K research %K orthopedic surgery %K sports medicine %K orthopedics %K surgery %K orthopedic %K qualitative study %K medical database %K feedback %K detection %K tool %K scientific integrity %K study design %D 2024 %7 16.2.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: As large language models (LLMs) are becoming increasingly integrated into different aspects of health care, questions about the implications for medical academic literature have begun to emerge. Key aspects such as authenticity in academic writing are at stake with artificial intelligence (AI) generating highly linguistically accurate and grammatically sound texts. Objective: The objective of this study is to compare human-written with AI-generated scientific literature in orthopedics and sports medicine. Methods: Five original abstracts were selected from the PubMed database. These abstracts were subsequently rewritten with the assistance of 2 LLMs with different degrees of proficiency. Subsequently, researchers with varying degrees of expertise and with different areas of specialization were asked to rank the abstracts according to linguistic and methodological parameters. Finally, researchers had to classify the articles as AI generated or human written. Results: Neither the researchers nor the AI-detection software could successfully identify the AI-generated texts. Furthermore, the criteria previously suggested in the literature did not correlate with whether the researchers deemed a text to be AI generated or whether they judged the article correctly based on these parameters. Conclusions: The primary finding of this study was that researchers were unable to distinguish between LLM-generated and human-written texts. However, due to the small sample size, it is not possible to generalize the results of this study. As is the case with any tool used in academic research, the potential to cause harm can be mitigated by relying on the transparency and integrity of the researchers. With scientific integrity at stake, further research with a similar study design should be conducted to determine the magnitude of this issue. %M 38363631 %R 10.2196/52164 %U https://formative.jmir.org/2024/1/e52164 %U https://doi.org/10.2196/52164 %U http://www.ncbi.nlm.nih.gov/pubmed/38363631 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e48690 %T Sodium Intake Estimation in Hospital Patients Using AI-Based Imaging: Prospective Pilot Study %A Ryu,Jiwon %A Kim,Sejoong %A Lim,Yejee %A Ohn,Jung Hun %A Kim,Sun-wook %A Cho,Jae Ho %A Park,Hee Sun %A Lee,Jongchan %A Kim,Eun Sun %A Kim,Nak-Hyun %A Song,Ji Eun %A Kim,Su Hwan %A Suh,Eui-Chang %A Mukhtorov,Doniyorjon %A Park,Jung Hyun %A Kim,Sung Kweon %A Kim,Hye Won %+ Hospital Medicine Center, Seoul National University Bundang Hospital, Gumi-ro 173 Beon-gil 82, Bundang-gu, Seongnam-si, 13620, Republic of Korea, 82 7877638, kimhwhw@gmail.com %K artificial intelligence %K AI %K image-to-text %K smart nutrition %K eHealth %K urine %K validation %K AI image %K food AI %K hospital %K sodium intake %K pilot study %K imaging %K diet %K diet management %K sex %K age %D 2024 %7 16.2.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Measurement of sodium intake in hospitalized patients is critical for their care. In this study, artificial intelligence (AI)–based imaging was performed to determine sodium intake in these patients. Objective: The applicability of a diet management system was evaluated using AI-based imaging to assess the sodium content of diets prescribed for hospitalized patients. Methods: Based on the information on the already investigated nutrients and quantity of food, consumed sodium was analyzed through photographs obtained before and after a meal. We used a hybrid model that first leveraged the capabilities of the You Only Look Once, version 4 (YOLOv4) architecture for the detection of food and dish areas in images. Following this initial detection, 2 distinct approaches were adopted for further classification: a custom ResNet-101 model and a hyperspectral imaging-based technique. These methodologies focused on accurate classification and estimation of the food quantity and sodium amount, respectively. The 24-hour urine sodium (UNa) value was measured as a reference for evaluating the sodium intake. Results: Results were analyzed using complete data from 25 participants out of the total 54 enrolled individuals. The median sodium intake calculated by the AI algorithm (AI-Na) was determined to be 2022.7 mg per day/person (adjusted by administered fluids). A significant correlation was observed between AI-Na and 24-hour UNa, while there was a notable disparity between them. A regression analysis, considering patient characteristics (eg, gender, age, renal function, the use of diuretics, and administered fluids) yielded a formula accounting for the interaction between AI-Na and 24-hour UNa. Consequently, it was concluded that AI-Na holds clinical significance in estimating salt intake for hospitalized patients using images without the need for 24-hour UNa measurements. The degree of correlation between AI-Na and 24-hour UNa was found to vary depending on the use of diuretics. Conclusions: This study highlights the potential of AI-based imaging for determining sodium intake in hospitalized patients. %M 38363594 %R 10.2196/48690 %U https://formative.jmir.org/2024/1/e48690 %U https://doi.org/10.2196/48690 %U http://www.ncbi.nlm.nih.gov/pubmed/38363594 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 13 %N %P e54704 %T A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence–Based Models in Health Care Education and Practice: Development Study Involving a Literature Review %A Sallam,Malik %A Barakat,Muna %A Sallam,Mohammed %+ Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Queen Rania Al-Abdullah Street-Aljubeiha, Amman, 11942, Jordan, 962 0791845186, malik.sallam@ju.edu.jo %K guidelines %K evaluation %K meaningful analytics %K large language models %K decision support %D 2024 %7 15.2.2024 %9 Original Paper %J Interact J Med Res %G English %X Background: Adherence to evidence-based practice is indispensable in health care. Recently, the utility of generative artificial intelligence (AI) models in health care has been evaluated extensively. However, the lack of consensus guidelines on the design and reporting of findings of these studies poses a challenge for the interpretation and synthesis of evidence. Objective: This study aimed to develop a preliminary checklist to standardize the reporting of generative AI-based studies in health care education and practice. Methods: A literature review was conducted in Scopus, PubMed, and Google Scholar. Published records with “ChatGPT,” “Bing,” or “Bard” in the title were retrieved. Careful examination of the methodologies employed in the included records was conducted to identify the common pertinent themes and the possible gaps in reporting. A panel discussion was held to establish a unified and thorough checklist for the reporting of AI studies in health care. The finalized checklist was used to evaluate the included records by 2 independent raters. Cohen κ was used as the method to evaluate the interrater reliability. Results: The final data set that formed the basis for pertinent theme identification and analysis comprised a total of 34 records. The finalized checklist included 9 pertinent themes collectively referred to as METRICS (Model, Evaluation, Timing, Range/Randomization, Individual factors, Count, and Specificity of prompts and language). Their details are as follows: (1) Model used and its exact settings; (2) Evaluation approach for the generated content; (3) Timing of testing the model; (4) Transparency of the data source; (5) Range of tested topics; (6) Randomization of selecting the queries; (7) Individual factors in selecting the queries and interrater reliability; (8) Count of queries executed to test the model; and (9) Specificity of the prompts and language used. The overall mean METRICS score was 3.0 (SD 0.58). The tested METRICS score was acceptable, with the range of Cohen κ of 0.558 to 0.962 (P<.001 for the 9 tested items). With classification per item, the highest average METRICS score was recorded for the “Model” item, followed by the “Specificity” item, while the lowest scores were recorded for the “Randomization” item (classified as suboptimal) and “Individual factors” item (classified as satisfactory). Conclusions: The METRICS checklist can facilitate the design of studies guiding researchers toward best practices in reporting results. The findings highlight the need for standardized reporting algorithms for generative AI-based studies in health care, considering the variability observed in methodologies and reporting. The proposed METRICS checklist could be a preliminary helpful base to establish a universally accepted approach to standardize the design and reporting of generative AI-based studies in health care, which is a swiftly evolving research topic. %M 38276872 %R 10.2196/54704 %U https://www.i-jmr.org/2024/1/e54704 %U https://doi.org/10.2196/54704 %U http://www.ncbi.nlm.nih.gov/pubmed/38276872 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e54349 %T Implementation of Chatbot Technology in Health Care: Protocol for a Bibliometric Analysis %A Ni,Zhao %A Peng,Mary L %A Balakrishnan,Vimala %A Tee,Vincent %A Azwa,Iskandar %A Saifi,Rumana %A Nelson,LaRon E %A Vlahov,David %A Altice,Frederick L %+ School of Nursing, Yale University, 400 West Campus Drive, Orange, CT, 06477, United States, 1 2037373039, zhao.ni@yale.edu %K artificial intelligence %K AI %K bibliometric analysis %K chatbots %K health care %K health promotion %D 2024 %7 15.2.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Chatbots have the potential to increase people’s access to quality health care. However, the implementation of chatbot technology in the health care system is unclear due to the scarce analysis of publications on the adoption of chatbot in health and medical settings. Objective: This paper presents a protocol of a bibliometric analysis aimed at offering the public insights into the current state and emerging trends in research related to the use of chatbot technology for promoting health. Methods: In this bibliometric analysis, we will select published papers from the databases of CINAHL, IEEE Xplore, PubMed, Scopus, and Web of Science that pertain to chatbot technology and its applications in health care. Our search strategy includes keywords such as “chatbot,” “virtual agent,” “virtual assistant,” “conversational agent,” “conversational AI,” “interactive agent,” “health,” and “healthcare.” Five researchers who are AI engineers and clinicians will independently review the titles and abstracts of selected papers to determine their eligibility for a full-text review. The corresponding author (ZN) will serve as a mediator to address any discrepancies and disputes among the 5 reviewers. Our analysis will encompass various publication patterns of chatbot research, including the number of annual publications, their geographic or institutional distribution, and the number of annual grants supporting chatbot research, and further summarize the methodologies used in the development of health-related chatbots, along with their features and applications in health care settings. Software tool VOSViewer (version 1.6.19; Leiden University) will be used to construct and visualize bibliometric networks. Results: The preparation for the bibliometric analysis began on December 3, 2021, when the research team started the process of familiarizing themselves with the software tools that may be used in this analysis, VOSViewer and CiteSpace, during which they consulted 3 librarians at the Yale University regarding search terms and tentative results. Tentative searches on the aforementioned databases yielded a total of 2340 papers. The official search phase started on July 27, 2023. Our goal is to complete the screening of papers and the analysis by February 15, 2024. Conclusions: Artificial intelligence chatbots, such as ChatGPT (OpenAI Inc), have sparked numerous discussions within the health care industry regarding their impact on human health. Chatbot technology holds substantial promise for advancing health care systems worldwide. However, developing a sophisticated chatbot capable of precise interaction with health care consumers, delivering personalized care, and providing accurate health-related information and knowledge remain considerable challenges. This bibliometric analysis seeks to fill the knowledge gap in the existing literature on health-related chatbots, entailing their applications, the software used in their development, and their preferred functionalities among users. International Registered Report Identifier (IRRID): PRR1-10.2196/54349 %M 38228575 %R 10.2196/54349 %U https://www.researchprotocols.org/2024/1/e54349 %U https://doi.org/10.2196/54349 %U http://www.ncbi.nlm.nih.gov/pubmed/38228575 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51473 %T Machine Learning–Based Prediction of Suicidality in Adolescents With Allergic Rhinitis: Derivation and Validation in 2 Independent Nationwide Cohorts %A Lee,Hojae %A Cho,Joong Ki %A Park,Jaeyu %A Lee,Hyeri %A Fond,Guillaume %A Boyer,Laurent %A Kim,Hyeon Jin %A Park,Seoyoung %A Cho,Wonyoung %A Lee,Hayeon %A Lee,Jinseok %A Yon,Dong Keon %+ Department of Regulatory Science, Kyung Hee University, 23 Kyungheedae-ro, Dongdaemun-gu, Seoul, 02447, Republic of Korea, 82 2 6935 2476, yonkkang@gmail.com %K machine learning %K allergic rhinitis %K prediction %K random forest %K suicidality %D 2024 %7 14.2.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Given the additional risk of suicide-related behaviors in adolescents with allergic rhinitis (AR), it is important to use the growing field of machine learning (ML) to evaluate this risk. Objective: This study aims to evaluate the validity and usefulness of an ML model for predicting suicide risk in patients with AR. Methods: We used data from 2 independent survey studies, Korea Youth Risk Behavior Web-based Survey (KYRBS; n=299,468) for the original data set and Korea National Health and Nutrition Examination Survey (KNHANES; n=833) for the external validation data set, to predict suicide risks of AR in adolescents aged 13 to 18 years, with 3.45% (10,341/299,468) and 1.4% (12/833) of the patients attempting suicide in the KYRBS and KNHANES studies, respectively. The outcome of interest was the suicide attempt risks. We selected various ML-based models with hyperparameter tuning in the discovery and performed an area under the receiver operating characteristic curve (AUROC) analysis in the train, test, and external validation data. Results: The study data set included 299,468 (KYRBS; original data set) and 833 (KNHANES; external validation data set) patients with AR recruited between 2005 and 2022. The best-performing ML model was the random forest model with a mean AUROC of 84.12% (95% CI 83.98%-84.27%) in the original data set. Applying this result to the external validation data set revealed the best performance among the models, with an AUROC of 89.87% (sensitivity 83.33%, specificity 82.58%, accuracy 82.59%, and balanced accuracy 82.96%). While looking at feature importance, the 5 most important features in predicting suicide attempts in adolescent patients with AR are depression, stress status, academic achievement, age, and alcohol consumption. Conclusions: This study emphasizes the potential of ML models in predicting suicide risks in patients with AR, encouraging further application of these models in other conditions to enhance adolescent health and decrease suicide rates. %M 38354043 %R 10.2196/51473 %U https://www.jmir.org/2024/1/e51473 %U https://doi.org/10.2196/51473 %U http://www.ncbi.nlm.nih.gov/pubmed/38354043 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e52660 %T Using #ActuallyAutistic on Twitter for Precision Diagnosis of Autism Spectrum Disorder: Machine Learning Study %A Jaiswal,Aditi %A Washington,Peter %+ Department of Information and Computer Sciences, University of Hawaii at Manoa, Room 312C, Pacific Ocean Science and Technology, 1680 East-West Road, Honolulu, HI, 96822, United States, 1 8088296359, ajaiswal@hawaii.edu %K autism %K autism spectrum disorder %K machine learning %K natural language processing %K public health %K sentiment analysis %K social media analysis %K Twitter %D 2024 %7 14.2.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: The increasing use of social media platforms has given rise to an unprecedented surge in user-generated content, with millions of individuals publicly sharing their thoughts, experiences, and health-related information. Social media can serve as a useful means to study and understand public health. Twitter (subsequently rebranded as “X”) is one such social media platform that has proven to be a valuable source of rich information for both the general public and health officials. We conducted the first study applying Twitter data mining to autism screening. Objective: We aimed to study the feasibility of autism screening from Twitter data and discuss the ethical implications of such models. Methods: We developed a machine learning model to attempt to distinguish individuals with autism from their neurotypical peers based on the textual patterns from their public communications on Twitter. We collected 6,515,470 tweets from users’ self-identification with autism using “#ActuallyAutistic” and a separate control group. To construct the data set, we targeted English-language tweets using the search query “#ActuallyAutistic” posted from January 1, 2014 to December 31, 2022. We encrypted all user IDs and stripped the tweets of identifiable information such as the associated email address prior to analysis. From these tweets, we identified unique users who used keywords such as “autism” OR “autistic” OR “neurodiverse” in their profile description and collected all the tweets from their timelines. To build the control group data set, we formulated a search query excluding the hashtag “#ActuallyAutistic” and collected 1000 tweets per day during the same time period. We trained a word2vec model and an attention-based, bidirectional long short-term memory model to validate the performance of per-tweet and per-profile classification models. We deleted the data set and the models after our analysis. Results: Our tweet classifier reached a 73% accuracy, a 0.728 area under the receiver operating characteristic curve score, and an 0.71 F1-score using word2vec representations fed into a logistic regression model, while the user profile classifier achieved an 0.78 area under the receiver operating characteristic curve score and an F1-score of 0.805 using an attention-based, bidirectional long short-term memory model. Conclusions: We have shown that it is feasible to train machine learning models using social media data to predict use of the #ActuallyAutistic hashtag, an imperfect proxy for self-reported autism. While analyzing textual differences in naturalistic text has the potential to help clinicians screen for autism, there remain ethical questions that must be addressed for such research to move forward and to translate into the real world. While machine learning has the potential to improve behavioral research, there are still a plethora of ethical issues in digital phenotyping studies using social media with respect to user consent of marginalized populations. Achieving this requires a more inclusive approach during the model development process that involves the autistic community directly in the ideation and consent processes. %M 38354045 %R 10.2196/52660 %U https://formative.jmir.org/2024/1/e52660 %U https://doi.org/10.2196/52660 %U http://www.ncbi.nlm.nih.gov/pubmed/38354045 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e47739 %T Identifying Functional Status Impairment in People Living With Dementia Through Natural Language Processing of Clinical Documents: Cross-Sectional Study %A Laurentiev,John %A Kim,Dae Hyun %A Mahesri,Mufaddal %A Wang,Kuan-Yuan %A Bessette,Lily G %A York,Cassandra %A Zakoul,Heidi %A Lee,Su Been %A Zhou,Li %A Lin,Kueiyu Joshua %+ Department of Medicine, Brigham and Women's Hospital, 1620 Tremont St. Suite 3030, Boston, MA, 02120, United States, 1 617 278 0930, jklin@bwh.harvard.edu %K activities of daily living %K ADLs %K clinical note %K dementia %K electronic health record %K EHR %K functional impairment %K instrumental activities of daily living %K iADLs %K machine learning %K natural language processing %K NLP %D 2024 %7 13.2.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Assessment of activities of daily living (ADLs) and instrumental ADLs (iADLs) is key to determining the severity of dementia and care needs among older adults. However, such information is often only documented in free-text clinical notes within the electronic health record and can be challenging to find. Objective: This study aims to develop and validate machine learning models to determine the status of ADL and iADL impairments based on clinical notes. Methods: This cross-sectional study leveraged electronic health record clinical notes from Mass General Brigham’s Research Patient Data Repository linked with Medicare fee-for-service claims data from 2007 to 2017 to identify individuals aged 65 years or older with at least 1 diagnosis of dementia. Notes for encounters both 180 days before and after the first date of dementia diagnosis were randomly sampled. Models were trained and validated using note sentences filtered by expert-curated keywords (filtered cohort) and further evaluated using unfiltered sentences (unfiltered cohort). The model’s performance was compared using area under the receiver operating characteristic curve and area under the precision-recall curve (AUPRC). Results: The study included 10,000 key-term–filtered sentences representing 441 people (n=283, 64.2% women; mean age 82.7, SD 7.9 years) and 1000 unfiltered sentences representing 80 people (n=56, 70% women; mean age 82.8, SD 7.5 years). Area under the receiver operating characteristic curve was high for the best-performing ADL and iADL models on both cohorts (>0.97). For ADL impairment identification, the random forest model achieved the best AUPRC (0.89, 95% CI 0.86-0.91) on the filtered cohort; the support vector machine model achieved the highest AUPRC (0.82, 95% CI 0.75-0.89) for the unfiltered cohort. For iADL impairment, the Bio+Clinical bidirectional encoder representations from transformers (BERT) model had the highest AUPRC (filtered: 0.76, 95% CI 0.68-0.82; unfiltered: 0.58, 95% CI 0.001-1.0). Compared with a keyword-search approach on the unfiltered cohort, machine learning reduced false-positive rates from 4.5% to 0.2% for ADL and 1.8% to 0.1% for iADL. Conclusions: In this study, we demonstrated the ability of machine learning models to accurately identify ADL and iADL impairment based on free-text clinical notes, which could be useful in determining the severity of dementia. %M 38349732 %R 10.2196/47739 %U https://www.jmir.org/2024/1/e47739 %U https://doi.org/10.2196/47739 %U http://www.ncbi.nlm.nih.gov/pubmed/38349732 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e51391 %T Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models %A Abdullahi,Tassallah %A Singh,Ritambhara %A Eickhoff,Carsten %+ School of Medicine, University of Tübingen, Schaffhausenstr, 77, Tübingen, 72072, Germany, 49 7071 29 843, carsten.eickhoff@uni-tuebingen.de %K clinical decision support %K rare diseases %K complex diseases %K prompt engineering %K reliability %K consistency %K natural language processing %K language model %K Bard %K ChatGPT 3.5 %K GPT-4 %K MedAlpaca %K medical education %K complex diagnosis %K artificial intelligence %K AI assistance %K medical training %K prediction model %D 2024 %7 13.2.2024 %9 Original Paper %J JMIR Med Educ %G English %X Background: Patients with rare and complex diseases often experience delayed diagnoses and misdiagnoses because comprehensive knowledge about these diseases is limited to only a few medical experts. In this context, large language models (LLMs) have emerged as powerful knowledge aggregation tools with applications in clinical decision support and education domains. Objective: This study aims to explore the potential of 3 popular LLMs, namely Bard (Google LLC), ChatGPT-3.5 (OpenAI), and GPT-4 (OpenAI), in medical education to enhance the diagnosis of rare and complex diseases while investigating the impact of prompt engineering on their performance. Methods: We conducted experiments on publicly available complex and rare cases to achieve these objectives. We implemented various prompt strategies to evaluate the performance of these models using both open-ended and multiple-choice prompts. In addition, we used a majority voting strategy to leverage diverse reasoning paths within language models, aiming to enhance their reliability. Furthermore, we compared their performance with the performance of human respondents and MedAlpaca, a generative LLM specifically designed for medical tasks. Results: Notably, all LLMs outperformed the average human consensus and MedAlpaca, with a minimum margin of 5% and 13%, respectively, across all 30 cases from the diagnostic case challenge collection. On the frequently misdiagnosed cases category, Bard tied with MedAlpaca but surpassed the human average consensus by 14%, whereas GPT-4 and ChatGPT-3.5 outperformed MedAlpaca and the human respondents on the moderately often misdiagnosed cases category with minimum accuracy scores of 28% and 11%, respectively. The majority voting strategy, particularly with GPT-4, demonstrated the highest overall score across all cases from the diagnostic complex case collection, surpassing that of other LLMs. On the Medical Information Mart for Intensive Care-III data sets, Bard and GPT-4 achieved the highest diagnostic accuracy scores, with multiple-choice prompts scoring 93%, whereas ChatGPT-3.5 and MedAlpaca scored 73% and 47%, respectively. Furthermore, our results demonstrate that there is no one-size-fits-all prompting approach for improving the performance of LLMs and that a single strategy does not universally apply to all LLMs. Conclusions: Our findings shed light on the diagnostic capabilities of LLMs and the challenges associated with identifying an optimal prompting strategy that aligns with each language model’s characteristics and specific task requirements. The significance of prompt engineering is highlighted, providing valuable insights for researchers and practitioners who use these language models for medical training. Furthermore, this study represents a crucial step toward understanding how LLMs can enhance diagnostic reasoning in rare and complex medical cases, paving the way for developing effective educational tools and accurate diagnostic aids to improve patient care and outcomes. %M 38349725 %R 10.2196/51391 %U https://mededu.jmir.org/2024/1/e51391 %U https://doi.org/10.2196/51391 %U http://www.ncbi.nlm.nih.gov/pubmed/38349725 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e48949 %T Cocreating an Automated mHealth Apps Systematic Review Process With Generative AI: Design Science Research Approach %A Giunti,Guido %A Doherty,Colin P %+ Academic Unit of Neurology, School of Medicine, Trinity College Dublin, College Green, Dublin, D02, Ireland, 353 1 896 1000, drguidogiunti@gmail.com %K generative artificial intelligence %K mHealth %K ChatGPT %K evidence-base %K apps %K qualitative study %K design science research %K eHealth %K mobile device %K AI %K language model %K mHealth intervention %K generative AI %K AI tool %K software code %K systematic review %K language model %D 2024 %7 12.2.2024 %9 Original Paper %J JMIR Med Educ %G English %X Background: The use of mobile devices for delivering health-related services (mobile health [mHealth]) has rapidly increased, leading to a demand for summarizing the state of the art and practice through systematic reviews. However, the systematic review process is a resource-intensive and time-consuming process. Generative artificial intelligence (AI) has emerged as a potential solution to automate tedious tasks. Objective: This study aimed to explore the feasibility of using generative AI tools to automate time-consuming and resource-intensive tasks in a systematic review process and assess the scope and limitations of using such tools. Methods: We used the design science research methodology. The solution proposed is to use cocreation with a generative AI, such as ChatGPT, to produce software code that automates the process of conducting systematic reviews. Results: A triggering prompt was generated, and assistance from the generative AI was used to guide the steps toward developing, executing, and debugging a Python script. Errors in code were solved through conversational exchange with ChatGPT, and a tentative script was created. The code pulled the mHealth solutions from the Google Play Store and searched their descriptions for keywords that hinted toward evidence base. The results were exported to a CSV file, which was compared to the initial outputs of other similar systematic review processes. Conclusions: This study demonstrates the potential of using generative AI to automate the time-consuming process of conducting systematic reviews of mHealth apps. This approach could be particularly useful for researchers with limited coding skills. However, the study has limitations related to the design science research methodology, subjectivity bias, and the quality of the search results used to train the language model. %M 38345839 %R 10.2196/48949 %U https://mededu.jmir.org/2024/1/e48949 %U https://doi.org/10.2196/48949 %U http://www.ncbi.nlm.nih.gov/pubmed/38345839 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e55368 %T Proposing a Principle-Based Approach for Teaching AI Ethics in Medical Education %A Weidener,Lukas %A Fischer,Michael %+ UMIT TIROL – Private University for Health Sciences and Health Technology, Eduard-Wallnöfer-Zentrum 1, Hall in Tirol, 6060, Austria, 43 50 8648 3930, lukas.weidener@edu.umit-tirol.at %K artificial intelligence %K AI %K ethics %K artificial intelligence ethics %K AI ethics %K medical education %K medicine %K medical artificial intelligence ethics %K medical AI ethics %K medical ethics %K public health ethics %D 2024 %7 9.2.2024 %9 Viewpoint %J JMIR Med Educ %G English %X The use of artificial intelligence (AI) in medicine, potentially leading to substantial advancements such as improved diagnostics, has been of increasing scientific and societal interest in recent years. However, the use of AI raises new ethical challenges, such as an increased risk of bias and potential discrimination against patients, as well as misdiagnoses potentially leading to over- or underdiagnosis with substantial consequences for patients. Recognizing these challenges, current research underscores the importance of integrating AI ethics into medical education. This viewpoint paper aims to introduce a comprehensive set of ethical principles for teaching AI ethics in medical education. This dynamic and principle-based approach is designed to be adaptive and comprehensive, addressing not only the current but also emerging ethical challenges associated with the use of AI in medicine. This study conducts a theoretical analysis of the current academic discourse on AI ethics in medical education, identifying potential gaps and limitations. The inherent interconnectivity and interdisciplinary nature of these anticipated challenges are illustrated through a focused discussion on “informed consent” in the context of AI in medicine and medical education. This paper proposes a principle-based approach to AI ethics education, building on the 4 principles of medical ethics—autonomy, beneficence, nonmaleficence, and justice—and extending them by integrating 3 public health ethics principles—efficiency, common good orientation, and proportionality. The principle-based approach to teaching AI ethics in medical education proposed in this study offers a foundational framework for addressing the anticipated ethical challenges of using AI in medicine, recommended in the current academic discourse. By incorporating the 3 principles of public health ethics, this principle-based approach ensures that medical ethics education remains relevant and responsive to the dynamic landscape of AI integration in medicine. As the advancement of AI technologies in medicine is expected to increase, medical ethics education must adapt and evolve accordingly. The proposed principle-based approach for teaching AI ethics in medical education provides an important foundation to ensure that future medical professionals are not only aware of the ethical dimensions of AI in medicine but also equipped to make informed ethical decisions in their practice. Future research is required to develop problem-based and competency-oriented learning objectives and educational content for the proposed principle-based approach to teaching AI ethics in medical education. %M 38285931 %R 10.2196/55368 %U https://mededu.jmir.org/2024/1/e55368 %U https://doi.org/10.2196/55368 %U http://www.ncbi.nlm.nih.gov/pubmed/38285931 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e48514 %T Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study %A Yu,Peng %A Fang,Changchang %A Liu,Xiaolin %A Fu,Wanying %A Ling,Jitao %A Yan,Zhiwei %A Jiang,Yuan %A Cao,Zhengyu %A Wu,Maoxiong %A Chen,Zhiteng %A Zhu,Wengen %A Zhang,Yuling %A Abudukeremu,Ayiguli %A Wang,Yue %A Liu,Xiao %A Wang,Jingfeng %+ Department of Cardiology, Sun Yat-sen Memorial Hospital of Sun Yat-sen University, 107 Yanjiang West Road, Guangzhou, China, 86 15083827378, liux587@mail.sysu.edu.cn %K ChatGPT %K Chinese Postgraduate Examination for Clinical Medicine %K medical student %K performance %K artificial intelligence %K medical care %K qualitative feedback %K medical education %K clinical decision-making %D 2024 %7 9.2.2024 %9 Original Paper %J JMIR Med Educ %G English %X Background: ChatGPT, an artificial intelligence (AI) based on large-scale language models, has sparked interest in the field of health care. Nonetheless, the capabilities of AI in text comprehension and generation are constrained by the quality and volume of available training data for a specific language, and the performance of AI across different languages requires further investigation. While AI harbors substantial potential in medicine, it is imperative to tackle challenges such as the formulation of clinical care standards; facilitating cultural transitions in medical education and practice; and managing ethical issues including data privacy, consent, and bias. Objective: The study aimed to evaluate ChatGPT’s performance in processing Chinese Postgraduate Examination for Clinical Medicine questions, assess its clinical reasoning ability, investigate potential limitations with the Chinese language, and explore its potential as a valuable tool for medical professionals in the Chinese context. Methods: A data set of Chinese Postgraduate Examination for Clinical Medicine questions was used to assess the effectiveness of ChatGPT’s (version 3.5) medical knowledge in the Chinese language, which has a data set of 165 medical questions that were divided into three categories: (1) common questions (n=90) assessing basic medical knowledge, (2) case analysis questions (n=45) focusing on clinical decision-making through patient case evaluations, and (3) multichoice questions (n=30) requiring the selection of multiple correct answers. First of all, we assessed whether ChatGPT could meet the stringent cutoff score defined by the government agency, which requires a performance within the top 20% of candidates. Additionally, in our evaluation of ChatGPT’s performance on both original and encoded medical questions, 3 primary indicators were used: accuracy, concordance (which validates the answer), and the frequency of insights. Results: Our evaluation revealed that ChatGPT scored 153.5 out of 300 for original questions in Chinese, which signifies the minimum score set to ensure that at least 20% more candidates pass than the enrollment quota. However, ChatGPT had low accuracy in answering open-ended medical questions, with only 31.5% total accuracy. The accuracy for common questions, multichoice questions, and case analysis questions was 42%, 37%, and 17%, respectively. ChatGPT achieved a 90% concordance across all questions. Among correct responses, the concordance was 100%, significantly exceeding that of incorrect responses (n=57, 50%; P<.001). ChatGPT provided innovative insights for 80% (n=132) of all questions, with an average of 2.95 insights per accurate response. Conclusions: Although ChatGPT surpassed the passing threshold for the Chinese Postgraduate Examination for Clinical Medicine, its performance in answering open-ended medical questions was suboptimal. Nonetheless, ChatGPT exhibited high internal concordance and the ability to generate multiple insights in the Chinese language. Future research should investigate the language-based discrepancies in ChatGPT’s performance within the health care context. %M 38335017 %R 10.2196/48514 %U https://mededu.jmir.org/2024/1/e48514 %U https://doi.org/10.2196/48514 %U http://www.ncbi.nlm.nih.gov/pubmed/38335017 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e53216 %T Investigating the Impact of Prompt Engineering on the Performance of Large Language Models for Standardizing Obstetric Diagnosis Text: Comparative Study %A Wang,Lei %A Bi,Wenshuai %A Zhao,Suling %A Ma,Yinyao %A Lv,Longting %A Meng,Chenwei %A Fu,Jingru %A Lv,Hanlin %+ BGI Research, Building 11, Beishan Industrial Zone, Yantian District, Shenzhen, 518083, China, 86 18707190886, lvhanlin@genomics.cn %K obstetric data %K similarity embedding %K term standardization %K large language models %K LLMs %D 2024 %7 8.2.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: The accumulation of vast electronic medical records (EMRs) through medical informatization creates significant research value, particularly in obstetrics. Diagnostic standardization across different health care institutions and regions is vital for medical data analysis. Large language models (LLMs) have been extensively used for various medical tasks. Prompt engineering is key to use LLMs effectively. Objective: This study aims to evaluate and compare the performance of LLMs with various prompt engineering techniques on the task of standardizing obstetric diagnostic terminology using real-world obstetric data. Methods: The paper describes a 4-step approach used for mapping diagnoses in electronic medical records to the International Classification of Diseases, 10th revision, observation domain. First, similarity measures were used for mapping the diagnoses. Second, candidate mapping terms were collected based on similarity scores above a threshold, to be used as the training data set. For generating optimal mapping terms, we used two LLMs (ChatGLM2 and Qwen-14B-Chat [QWEN]) for zero-shot learning in step 3. Finally, a performance comparison was conducted by using 3 pretrained bidirectional encoder representations from transformers (BERTs), including BERT, whole word masking BERT, and momentum contrastive learning with BERT (MC-BERT), for unsupervised optimal mapping term generation in the fourth step. Results: LLMs and BERT demonstrated comparable performance at their respective optimal levels. LLMs showed clear advantages in terms of performance and efficiency in unsupervised settings. Interestingly, the performance of the LLMs varied significantly across different prompt engineering setups. For instance, when applying the self-consistency approach in QWEN, the F1-score improved by 5%, with precision increasing by 7.9%, outperforming the zero-shot method. Likewise, ChatGLM2 delivered similar rates of accurately generated responses. During the analysis, the BERT series served as a comparative model with comparable results. Among the 3 models, MC-BERT demonstrated the highest level of performance. However, the differences among the versions of BERT in this study were relatively insignificant. Conclusions: After applying LLMs to standardize diagnoses and designing 4 different prompts, we compared the results to those generated by the BERT model. Our findings indicate that QWEN prompts largely outperformed the other prompts, with precision comparable to that of the BERT model. These results demonstrate the potential of unsupervised approaches in improving the efficiency of aligning diagnostic terms in daily research and uncovering hidden information values in patient data. %M 38329787 %R 10.2196/53216 %U https://formative.jmir.org/2024/1/e53216 %U https://doi.org/10.2196/53216 %U http://www.ncbi.nlm.nih.gov/pubmed/38329787 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e32690 %T Vision-Language Model for Generating Textual Descriptions From Clinical Images: Model Development and Validation Study %A Ji,Jia %A Hou,Yongshuai %A Chen,Xinyu %A Pan,Youcheng %A Xiang,Yang %+ Peng Cheng Laboratory, No. 2 Xingke 1st Street, Shenzhen, 518000, China, 86 18566668732, panyoucheng4@gmail.com %K clinical image %K radiology report generation %K vision-language model %K multistage fine-tuning %K prior knowledge %D 2024 %7 8.2.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: The automatic generation of radiology reports, which seeks to create a free-text description from a clinical radiograph, is emerging as a pivotal intersection between clinical medicine and artificial intelligence. Leveraging natural language processing technologies can accelerate report creation, enhancing health care quality and standardization. However, most existing studies have not yet fully tapped into the combined potential of advanced language and vision models. Objective: The purpose of this study was to explore the integration of pretrained vision-language models into radiology report generation. This would enable the vision-language model to automatically convert clinical images into high-quality textual reports. Methods: In our research, we introduced a radiology report generation model named ClinicalBLIP, building upon the foundational InstructBLIP model and refining it using clinical image-to-text data sets. A multistage fine-tuning approach via low-rank adaptation was proposed to deepen the semantic comprehension of the visual encoder and the large language model for clinical imagery. Furthermore, prior knowledge was integrated through prompt learning to enhance the precision of the reports generated. Experiments were conducted on both the IU X-RAY and MIMIC-CXR data sets, with ClinicalBLIP compared to several leading methods. Results: Experimental results revealed that ClinicalBLIP obtained superior scores of 0.570/0.365 and 0.534/0.313 on the IU X-RAY/MIMIC-CXR test sets for the Metric for Evaluation of Translation with Explicit Ordering (METEOR) and the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) evaluations, respectively. This performance notably surpasses that of existing state-of-the-art methods. Further evaluations confirmed the effectiveness of the multistage fine-tuning and the integration of prior information, leading to substantial improvements. Conclusions: The proposed ClinicalBLIP model demonstrated robustness and effectiveness in enhancing clinical radiology report generation, suggesting significant promise for real-world clinical applications. %M 38329788 %R 10.2196/32690 %U https://formative.jmir.org/2024/1/e32690 %U https://doi.org/10.2196/32690 %U http://www.ncbi.nlm.nih.gov/pubmed/38329788 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e52205 %T Digitally Diagnosing Multiple Developmental Delays Using Crowdsourcing Fused With Machine Learning: Protocol for a Human-in-the-Loop Machine Learning Study %A Jaiswal,Aditi %A Kruiper,Ruben %A Rasool,Abdur %A Nandkeolyar,Aayush %A Wall,Dennis P %A Washington,Peter %+ Department of Information and Computer Sciences, University of Hawaii at Manoa, Room 312, Pacific Ocean Science and Technology (POST), 1680 East-West Road, Honolulu, HI, 96822, United States, 1 8088296359, ajaiswal@hawaii.edu %K machine learning %K crowdsourcing %K autism spectrum disorder %K ASD %K attention-deficit/hyperactivity disorder %K ADHD %K precision health %D 2024 %7 8.2.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: A considerable number of minors in the United States are diagnosed with developmental or psychiatric conditions, potentially influenced by underdiagnosis factors such as cost, distance, and clinician availability. Despite the potential of digital phenotyping tools with machine learning (ML) approaches to expedite diagnoses and enhance diagnostic services for pediatric psychiatric conditions, existing methods face limitations because they use a limited set of social features for prediction tasks and focus on a single binary prediction, resulting in uncertain accuracies. Objective: This study aims to propose the development of a gamified web system for data collection, followed by a fusion of novel crowdsourcing algorithms with ML behavioral feature extraction approaches to simultaneously predict diagnoses of autism spectrum disorder and attention-deficit/hyperactivity disorder in a precise and specific manner. Methods: The proposed pipeline will consist of (1) gamified web applications to curate videos of social interactions adaptively based on the needs of the diagnostic system, (2) behavioral feature extraction techniques consisting of automated ML methods and novel crowdsourcing algorithms, and (3) the development of ML models that classify several conditions simultaneously and that adaptively request additional information based on uncertainties about the data. Results: A preliminary version of the web interface has been implemented, and a prior feature selection method has highlighted a core set of behavioral features that can be targeted through the proposed gamified approach. Conclusions: The prospect for high reward stems from the possibility of creating the first artificial intelligence–powered tool that can identify complex social behaviors well enough to distinguish conditions with nuanced differentiators such as autism spectrum disorder and attention-deficit/hyperactivity disorder. International Registered Report Identifier (IRRID): PRR1-10.2196/52205 %M 38329783 %R 10.2196/52205 %U https://www.researchprotocols.org/2024/1/e52205 %U https://doi.org/10.2196/52205 %U http://www.ncbi.nlm.nih.gov/pubmed/38329783 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e46493 %T Personalized Deep Learning for Substance Use in Hawaii: Protocol for a Passive Sensing and Ecological Momentary Assessment Study %A Sun,Yinan %A Kargarandehkordi,Ali %A Slade,Christopher %A Jaiswal,Aditi %A Busch,Gerald %A Guerrero,Anthony %A Phillips,Kristina T %A Washington,Peter %+ Department of Information and Computer Sciences, University of Hawaii at Manoa, 1680 East-West Road, Honolulu, HI, 96822, United States, 1 5126800926, pyw@hawaii.edu %K machine learning %K precision health %K Indigenous data sovereignty %K substance use %K personalized artificial intelligence %K wearables %K ecological momentary assessments %K passive sensing %K mobile phone %D 2024 %7 7.2.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Artificial intelligence (AI)–powered digital therapies that detect methamphetamine cravings via consumer devices have the potential to reduce health care disparities by providing remote and accessible care solutions to communities with limited care solutions, such as Native Hawaiian, Filipino, and Pacific Islander communities. However, Native Hawaiian, Filipino, and Pacific Islander communities are understudied with respect to digital therapeutics and AI health sensing despite using technology at the same rates as other racial groups. Objective: In this study, we aimed to understand the feasibility of continuous remote digital monitoring and ecological momentary assessments in Native Hawaiian, Filipino, and Pacific Islander communities in Hawaii by curating a novel data set of longitudinal Fitbit (Fitbit Inc) biosignals with the corresponding craving and substance use labels. We also aimed to develop personalized AI models that predict methamphetamine craving events in real time using wearable sensor data. Methods: We will develop personalized AI and machine learning models for methamphetamine use and craving prediction in 40 individuals from Native Hawaiian, Filipino, and Pacific Islander communities by curating a novel data set of real-time Fitbit biosensor readings and the corresponding participant annotations (ie, raw self-reported substance use data) of their methamphetamine use and cravings. In the process of collecting this data set, we will gain insights into cultural and other human factors that can challenge the proper acquisition of precise annotations. With the resulting data set, we will use self-supervised learning AI approaches, which are a new family of machine learning methods that allows a neural network to be trained without labels by being optimized to make predictions about the data. The inputs to the proposed AI models are Fitbit biosensor readings, and the outputs are predictions of methamphetamine use or craving. This paradigm is gaining increased attention in AI for health care. Results: To date, more than 40 individuals have expressed interest in participating in the study, and we have successfully recruited our first 5 participants with minimal logistical challenges and proper compliance. Several logistical challenges that the research team has encountered so far and the related implications are discussed. Conclusions: We expect to develop models that significantly outperform traditional supervised methods by finetuning according to the data of a participant. Such methods will enable AI solutions that work with the limited data available from Native Hawaiian, Filipino, and Pacific Islander populations and that are inherently unbiased owing to their personalized nature. Such models can support future AI-powered digital therapeutics for substance abuse. International Registered Report Identifier (IRRID): DERR1-10.2196/46493 %M 38324375 %R 10.2196/46493 %U https://www.researchprotocols.org/2024/1/e46493 %U https://doi.org/10.2196/46493 %U http://www.ncbi.nlm.nih.gov/pubmed/38324375 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51640 %T Limitations of the Cough Sound-Based COVID-19 Diagnosis Artificial Intelligence Model and its Future Direction: Longitudinal Observation Study %A Kim,Jina %A Choi,Yong Sung %A Lee,Young Joo %A Yeo,Seung Geun %A Kim,Kyung Won %A Kim,Min Seo %A Rahmati,Masoud %A Yon,Dong Keon %A Lee,Jinseok %+ Department of Biomedical Engineering, Kyung Hee University, 1732 Deogyeong-daero, Giheung-gu, Seoul, 17104, Republic of Korea, 82 2 6935 2476, gonasago@khu.ac.kr %K COVID-19 variants %K cough sound %K artificial intelligence %K diagnosis %K human lifestyle %K SARS-CoV-2 %K AI model %K cough %K sound-based %K diagnosis %K sounds app %K development %K COVID-19 %K AI %D 2024 %7 6.2.2024 %9 Short Paper %J J Med Internet Res %G English %X Background: The outbreak of SARS-CoV-2 in 2019 has necessitated the rapid and accurate detection of COVID-19 to manage patients effectively and implement public health measures. Artificial intelligence (AI) models analyzing cough sounds have emerged as promising tools for large-scale screening and early identification of potential cases. Objective: This study aimed to investigate the efficacy of using cough sounds as a diagnostic tool for COVID-19, considering the unique acoustic features that differentiate positive and negative cases. We investigated whether an AI model trained on cough sound recordings from specific periods, especially the early stages of the COVID-19 pandemic, were applicable to the ongoing situation with persistent variants. Methods: We used cough sound recordings from 3 data sets (Cambridge, Coswara, and Virufy) representing different stages of the pandemic and variants. Our AI model was trained using the Cambridge data set with subsequent evaluation against all data sets. The performance was analyzed based on the area under the receiver operating curve (AUC) across different data measurement periods and COVID-19 variants. Results: The AI model demonstrated a high AUC when tested with the Cambridge data set, indicative of its initial effectiveness. However, the performance varied significantly with other data sets, particularly in detecting later variants such as Delta and Omicron, with a marked decline in AUC observed for the latter. These results highlight the challenges in maintaining the efficacy of AI models against the backdrop of an evolving virus. Conclusions: While AI models analyzing cough sounds offer a promising noninvasive and rapid screening method for COVID-19, their effectiveness is challenged by the emergence of new virus variants. Ongoing research and adaptations in AI methodologies are crucial to address these limitations. The adaptability of AI models to evolve with the virus underscores their potential as a foundational technology for not only the current pandemic but also future outbreaks, contributing to a more agile and resilient global health infrastructure. %M 38319694 %R 10.2196/51640 %U https://www.jmir.org/2024/1/e51640 %U https://doi.org/10.2196/51640 %U http://www.ncbi.nlm.nih.gov/pubmed/38319694 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 11 %N %P e54369 %T Capacity of Generative AI to Interpret Human Emotions From Visual and Textual Data: Pilot Evaluation Study %A Elyoseph,Zohar %A Refoua,Elad %A Asraf,Kfir %A Lvovsky,Maya %A Shimoni,Yoav %A Hadar-Shoval,Dorit %+ Imperial College London, Fulham Palace Road, London, W6 8RF, United Kingdom, 44 547836088, zohar.j.a@gmail.com %K Reading the Mind in the Eyes Test %K RMET %K emotional awareness %K emotional comprehension %K emotional cue %K emotional cues %K ChatGPT %K large language model %K LLM %K large language models %K LLMs %K empathy %K mentalizing %K mentalization %K machine learning %K artificial intelligence %K AI %K algorithm %K algorithms %K predictive model %K predictive models %K predictive analytics %K predictive system %K practical model %K practical models %K early warning %K early detection %K mental health %K mental disease %K mental illness %K mental illnesses %K mental diseases %D 2024 %7 6.2.2024 %9 Original Paper %J JMIR Ment Health %G English %X Background: Mentalization, which is integral to human cognitive processes, pertains to the interpretation of one’s own and others’ mental states, including emotions, beliefs, and intentions. With the advent of artificial intelligence (AI) and the prominence of large language models in mental health applications, questions persist about their aptitude in emotional comprehension. The prior iteration of the large language model from OpenAI, ChatGPT-3.5, demonstrated an advanced capacity to interpret emotions from textual data, surpassing human benchmarks. Given the introduction of ChatGPT-4, with its enhanced visual processing capabilities, and considering Google Bard’s existing visual functionalities, a rigorous assessment of their proficiency in visual mentalizing is warranted. Objective: The aim of the research was to critically evaluate the capabilities of ChatGPT-4 and Google Bard with regard to their competence in discerning visual mentalizing indicators as contrasted with their textual-based mentalizing abilities. Methods: The Reading the Mind in the Eyes Test developed by Baron-Cohen and colleagues was used to assess the models’ proficiency in interpreting visual emotional indicators. Simultaneously, the Levels of Emotional Awareness Scale was used to evaluate the large language models’ aptitude in textual mentalizing. Collating data from both tests provided a holistic view of the mentalizing capabilities of ChatGPT-4 and Bard. Results: ChatGPT-4, displaying a pronounced ability in emotion recognition, secured scores of 26 and 27 in 2 distinct evaluations, significantly deviating from a random response paradigm (P<.001). These scores align with established benchmarks from the broader human demographic. Notably, ChatGPT-4 exhibited consistent responses, with no discernible biases pertaining to the sex of the model or the nature of the emotion. In contrast, Google Bard’s performance aligned with random response patterns, securing scores of 10 and 12 and rendering further detailed analysis redundant. In the domain of textual analysis, both ChatGPT and Bard surpassed established benchmarks from the general population, with their performances being remarkably congruent. Conclusions: ChatGPT-4 proved its efficacy in the domain of visual mentalizing, aligning closely with human performance standards. Although both models displayed commendable acumen in textual emotion interpretation, Bard’s capabilities in visual emotion interpretation necessitate further scrutiny and potential refinement. This study stresses the criticality of ethical AI development for emotional recognition, highlighting the need for inclusive data, collaboration with patients and mental health experts, and stringent governmental oversight to ensure transparency and protect patient privacy. %M 38319707 %R 10.2196/54369 %U https://mental.jmir.org/2024/1/e54369 %U https://doi.org/10.2196/54369 %U http://www.ncbi.nlm.nih.gov/pubmed/38319707 %0 Journal Article %@ 2563-3570 %I JMIR Publications %V 5 %N %P e52059 %T Machine Learning Models for Prediction of Maternal Hemorrhage and Transfusion: Model Development Study %A Ahmadzia,Homa Khorrami %A Dzienny,Alexa C %A Bopf,Mike %A Phillips,Jaclyn M %A Federspiel,Jerome Jeffrey %A Amdur,Richard %A Rice,Madeline Murguia %A Rodriguez,Laritza %+ Division of Maternal-Fetal Medicine, Department of Obstetrics and Gynecology, Inova Health System, 3300 Gallows Road, Falls Church, VA, 22042, United States, 1 571 472 0920, homa.ahmadzia@inova.org %K postpartum hemorrhage %K machine learning %K prediction %K maternal %K predict %K predictive %K bleeding %K hemorrhage %K hemorrhaging %K birth %K postnatal %K blood %K transfusion %K antepartum %K obstetric %K obstetrics %K women's health %K gynecology %K gynecological %D 2024 %7 5.2.2024 %9 Original Paper %J JMIR Bioinform Biotech %G English %X Background: Current postpartum hemorrhage (PPH) risk stratification is based on traditional statistical models or expert opinion. Machine learning could optimize PPH prediction by allowing for more complex modeling. Objective: We sought to improve PPH prediction and compare machine learning and traditional statistical methods. Methods: We developed models using the Consortium for Safe Labor data set (2002-2008) from 12 US hospitals. The primary outcome was a transfusion of blood products or PPH (estimated blood loss of ≥1000 mL). The secondary outcome was a transfusion of any blood product. Fifty antepartum and intrapartum characteristics and hospital characteristics were included. Logistic regression, support vector machines, multilayer perceptron, random forest, and gradient boosting (GB) were used to generate prediction models. The area under the receiver operating characteristic curve (ROC-AUC) and area under the precision/recall curve (PR-AUC) were used to compare performance. Results: Among 228,438 births, 5760 (3.1%) women had a postpartum hemorrhage, 5170 (2.8%) had a transfusion, and 10,344 (5.6%) met the criteria for the transfusion-PPH composite. Models predicting the transfusion-PPH composite using antepartum and intrapartum features had the best positive predictive values, with the GB machine learning model performing best overall (ROC-AUC=0.833, 95% CI 0.828-0.838; PR-AUC=0.210, 95% CI 0.201-0.220). The most predictive features in the GB model predicting the transfusion-PPH composite were the mode of delivery, oxytocin incremental dose for labor (mU/minute), intrapartum tocolytic use, presence of anesthesia nurse, and hospital type. Conclusions: Machine learning offers higher discriminability than logistic regression in predicting PPH. The Consortium for Safe Labor data set may not be optimal for analyzing risk due to strong subgroup effects, which decreases accuracy and limits generalizability. %M 38935950 %R 10.2196/52059 %U https://bioinform.jmir.org/2024/1/e52059 %U https://doi.org/10.2196/52059 %U http://www.ncbi.nlm.nih.gov/pubmed/38935950 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e50705 %T Increasing Realism and Variety of Virtual Patient Dialogues for Prenatal Counseling Education Through a Novel Application of ChatGPT: Exploratory Observational Study %A Gray,Megan %A Baird,Austin %A Sawyer,Taylor %A James,Jasmine %A DeBroux,Thea %A Bartlett,Michelle %A Krick,Jeanne %A Umoren,Rachel %+ Division of Neonatology, University of Washington, M/S FA.2.113, 4800 Sand Point Way, Seattle, WA, 98105, United States, 1 206 919 5476, graym1@uw.edu %K prenatal counseling %K virtual health %K virtual patient %K simulation %K neonatology %K ChatGPT %K AI %K artificial intelligence %D 2024 %7 1.2.2024 %9 Original Paper %J JMIR Med Educ %G English %X Background: Using virtual patients, facilitated by natural language processing, provides a valuable educational experience for learners. Generating a large, varied sample of realistic and appropriate responses for virtual patients is challenging. Artificial intelligence (AI) programs can be a viable source for these responses, but their utility for this purpose has not been explored. Objective: In this study, we explored the effectiveness of generative AI (ChatGPT) in developing realistic virtual standardized patient dialogues to teach prenatal counseling skills. Methods: ChatGPT was prompted to generate a list of common areas of concern and questions that families expecting preterm delivery at 24 weeks gestation might ask during prenatal counseling. ChatGPT was then prompted to generate 2 role-plays with dialogues between a parent expecting a potential preterm delivery at 24 weeks and their counseling physician using each of the example questions. The prompt was repeated for 2 unique role-plays: one parent was characterized as anxious and the other as having low trust in the medical system. Role-play scripts were exported verbatim and independently reviewed by 2 neonatologists with experience in prenatal counseling, using a scale of 1-5 on realism, appropriateness, and utility for virtual standardized patient responses. Results: ChatGPT generated 7 areas of concern, with 35 example questions used to generate role-plays. The 35 role-play transcripts generated 176 unique parent responses (median 5, IQR 4-6, per role-play) with 268 unique sentences. Expert review identified 117 (65%) of the 176 responses as indicating an emotion, either directly or indirectly. Approximately half (98/176, 56%) of the responses had 2 or more sentences, and half (88/176, 50%) included at least 1 question. More than half (104/176, 58%) of the responses from role-played parent characters described a feeling, such as being scared, worried, or concerned. The role-plays of parents with low trust in the medical system generated many unique sentences (n=50). Most of the sentences in the responses were found to be reasonably realistic (214/268, 80%), appropriate for variable prenatal counseling conversation paths (233/268, 87%), and usable without more than a minimal modification in a virtual patient program (169/268, 63%). Conclusions: Generative AI programs, such as ChatGPT, may provide a viable source of training materials to expand virtual patient programs, with careful attention to the concerns and questions of patients and families. Given the potential for unrealistic or inappropriate statements and questions, an expert should review AI chat outputs before deploying them in an educational program. %M 38300696 %R 10.2196/50705 %U https://mededu.jmir.org/2024/1/e50705 %U https://doi.org/10.2196/50705 %U http://www.ncbi.nlm.nih.gov/pubmed/38300696 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52622 %T The Performance of Wearable AI in Detecting Stress Among Students: Systematic Review and Meta-Analysis %A Abd-alrazaq,Alaa %A Alajlani,Mohannad %A Ahmad,Reham %A AlSaad,Rawan %A Aziz,Sarah %A Ahmed,Arfan %A Alsahli,Mohammed %A Damseh,Rafat %A Sheikh,Javaid %+ AI Center for Precision Health, Weill Cornell Medicine-Qatar, Qatar Foundation, PO Box 5825, Doha Al Luqta St, Ar-Rayyan, Doha, 2442, Qatar, 974 5570845212, Aaa4027@qatar-med.cornell.edu %K stress %K artificial intelligence %K wearable devices %K machine learning %K systematic review %K students %K mobile phone %D 2024 %7 31.1.2024 %9 Review %J J Med Internet Res %G English %X Background: Students usually encounter stress throughout their academic path. Ongoing stressors may lead to chronic stress, adversely affecting their physical and mental well-being. Thus, early detection and monitoring of stress among students are crucial. Wearable artificial intelligence (AI) has emerged as a valuable tool for this purpose. It offers an objective, noninvasive, nonobtrusive, automated approach to continuously monitor biomarkers in real time, thereby addressing the limitations of traditional approaches such as self-reported questionnaires. Objective: This systematic review and meta-analysis aim to assess the performance of wearable AI in detecting and predicting stress among students. Methods: Search sources in this review included 7 electronic databases (MEDLINE, Embase, PsycINFO, ACM Digital Library, Scopus, IEEE Xplore, and Google Scholar). We also checked the reference lists of the included studies and checked studies that cited the included studies. The search was conducted on June 12, 2023. This review included research articles centered on the creation or application of AI algorithms for the detection or prediction of stress among students using data from wearable devices. In total, 2 independent reviewers performed study selection, data extraction, and risk-of-bias assessment. The Quality Assessment of Diagnostic Accuracy Studies–Revised tool was adapted and used to examine the risk of bias in the included studies. Evidence synthesis was conducted using narrative and statistical techniques. Results: This review included 5.8% (19/327) of the studies retrieved from the search sources. A meta-analysis of 37 accuracy estimates derived from 32% (6/19) of the studies revealed a pooled mean accuracy of 0.856 (95% CI 0.70-0.93). Subgroup analyses demonstrated that the accuracy of wearable AI was moderated by the number of stress classes (P=.02), type of wearable device (P=.049), location of the wearable device (P=.02), data set size (P=.009), and ground truth (P=.001). The average estimates of sensitivity, specificity, and F1-score were 0.755 (SD 0.181), 0.744 (SD 0.147), and 0.759 (SD 0.139), respectively. Conclusions: Wearable AI shows promise in detecting student stress but currently has suboptimal performance. The results of the subgroup analyses should be carefully interpreted given that many of these findings may be due to other confounding factors rather than the underlying grouping characteristics. Thus, wearable AI should be used alongside other assessments (eg, clinical questionnaires) until further evidence is available. Future research should explore the ability of wearable AI to differentiate types of stress, distinguish stress from other mental health issues, predict future occurrences of stress, consider factors such as the placement of the wearable device and the methods used to assess the ground truth, and report detailed results to facilitate the conduct of meta-analyses. Trial Registration: PROSPERO CRD42023435051; http://tinyurl.com/3fzb5rnp %M 38294846 %R 10.2196/52622 %U https://www.jmir.org/2024/1/e52622 %U https://doi.org/10.2196/52622 %U http://www.ncbi.nlm.nih.gov/pubmed/38294846 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e44185 %T Identifying Frailty in Older Adults Receiving Home Care Assessment Using Machine Learning: Longitudinal Observational Study on the Role of Classifier, Feature Selection, and Sample Size %A Pan,Cheng %A Luo,Hao %A Cheung,Gary %A Zhou,Huiquan %A Cheng,Reynold %A Cullum,Sarah %A Wu,Chuan %+ Department of Social Work and Social Administration, The University of Hong Kong, CJT 521, Jockey Club Tower, Pokfulam Road, Hong Kong, China (Hong Kong), 852 68421252, haoluo@hku.hk %K machine learning %K logistic regression %K frailty %K older adults %K home care %K sample size %K features %K data set %K model %K home care %K mortality prediction %K assessment %D 2024 %7 31.1.2024 %9 Original Paper %J JMIR AI %G English %X Background: Machine learning techniques are starting to be used in various health care data sets to identify frail persons who may benefit from interventions. However, evidence about the performance of machine learning techniques compared to conventional regression is mixed. It is also unclear what methodological and database factors are associated with performance. Objective: This study aimed to compare the mortality prediction accuracy of various machine learning classifiers for identifying frail older adults in different scenarios. Methods: We used deidentified data collected from older adults (65 years of age and older) assessed with interRAI-Home Care instrument in New Zealand between January 1, 2012, and December 31, 2016. A total of 138 interRAI assessment items were used to predict 6-month and 12-month mortality, using 3 machine learning classifiers (random forest [RF], extreme gradient boosting [XGBoost], and multilayer perceptron [MLP]) and regularized logistic regression. We conducted a simulation study comparing the performance of machine learning models with logistic regression and interRAI Home Care Frailty Scale and examined the effects of sample sizes, the number of features, and train-test split ratios. Results: A total of 95,042 older adults (median age 82.66 years, IQR 77.92-88.76; n=37,462, 39.42% male) receiving home care were analyzed. The average area under the curve (AUC) and sensitivities of 6-month mortality prediction showed that machine learning classifiers did not outperform regularized logistic regressions. In terms of AUC, regularized logistic regression had better performance than XGBoost, MLP, and RF when the number of features was ≤80 and the sample size ≤16,000; MLP outperformed regularized logistic regression in terms of sensitivities when the number of features was ≥40 and the sample size ≥4000. Conversely, RF and XGBoost demonstrated higher specificities than regularized logistic regression in all scenarios. Conclusions: The study revealed that machine learning models exhibited significant variation in prediction performance when evaluated using different metrics. Regularized logistic regression was an effective model for identifying frail older adults receiving home care, as indicated by the AUC, particularly when the number of features and sample sizes were not excessively large. Conversely, MLP displayed superior sensitivity, while RF exhibited superior specificity when the number of features and sample sizes were large. %M 38875533 %R 10.2196/44185 %U https://ai.jmir.org/2024/1/e44185 %U https://doi.org/10.2196/44185 %U http://www.ncbi.nlm.nih.gov/pubmed/38875533 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e50890 %T Machine Learning and Health Science Research: Tutorial %A Cho,Hunyong %A She,Jane %A De Marchi,Daniel %A El-Zaatari,Helal %A Barnes,Edward L %A Kahkoska,Anna R %A Kosorok,Michael R %A Virkud,Arti V %+ Department of Biostatistics, University of North Carolina at Chapel Hill, 3101 McGavran-Greenberg Hall, CB #7420, Chapel Hill, NC, 27599-7420, United States, 1 (919) 966 7250, jane.she@unc.edu %K health science researcher %K machine learning pipeline %K machine learning %K medical machine learning %K precision medicine %K reproducibility %K unsupervised learning %D 2024 %7 30.1.2024 %9 Tutorial %J J Med Internet Res %G English %X Machine learning (ML) has seen impressive growth in health science research due to its capacity for handling complex data to perform a range of tasks, including unsupervised learning, supervised learning, and reinforcement learning. To aid health science researchers in understanding the strengths and limitations of ML and to facilitate its integration into their studies, we present here a guideline for integrating ML into an analysis through a structured framework, covering steps from framing a research question to study design and analysis techniques for specialized data types. %M 38289657 %R 10.2196/50890 %U https://www.jmir.org/2024/1/e50890 %U https://doi.org/10.2196/50890 %U http://www.ncbi.nlm.nih.gov/pubmed/38289657 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51069 %T Efficacy of ChatGPT in Cantonese Sentiment Analysis: Comparative Study %A Fu,Ziru %A Hsu,Yu Cheng %A Chan,Christian S %A Lau,Chaak Ming %A Liu,Joyce %A Yip,Paul Siu Fai %+ The Hong Kong Jockey Club Centre for Suicide Research and Prevention, Faculty of Social Sciences, The University of Hong Kong, 2/F, The Hong Kong Jockey Club Building for Interdisciplinary Research, 5 Sassoon Road, Pokfulam, Hong Kong SAR, China (Hong Kong), 852 28315232, sfpyip@hku.hk %K Cantonese %K ChatGPT %K counseling %K natural language processing %K NLP %K sentiment analysis %D 2024 %7 30.1.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Sentiment analysis is a significant yet difficult task in natural language processing. The linguistic peculiarities of Cantonese, including its high similarity with Standard Chinese, its grammatical and lexical uniqueness, and its colloquialism and multilingualism, make it different from other languages and pose additional challenges to sentiment analysis. Recent advances in models such as ChatGPT offer potential viable solutions. Objective: This study investigated the efficacy of GPT-3.5 and GPT-4 in Cantonese sentiment analysis in the context of web-based counseling and compared their performance with other mainstream methods, including lexicon-based methods and machine learning approaches. Methods: We analyzed transcripts from a web-based, text-based counseling service in Hong Kong, including a total of 131 individual counseling sessions and 6169 messages between counselors and help-seekers. First, a codebook was developed for human annotation. A simple prompt (“Is the sentiment of this Cantonese text positive, neutral, or negative? Respond with the sentiment label only.”) was then given to GPT-3.5 and GPT-4 to label each message’s sentiment. GPT-3.5 and GPT-4’s performance was compared with a lexicon-based method and 3 state-of-the-art models, including linear regression, support vector machines, and long short-term memory neural networks. Results: Our findings revealed ChatGPT’s remarkable accuracy in sentiment classification, with GPT-3.5 and GPT-4, respectively, achieving 92.1% (5682/6169) and 95.3% (5880/6169) accuracy in identifying positive, neutral, and negative sentiment, thereby outperforming the traditional lexicon-based method, which had an accuracy of 37.2% (2295/6169), and the 3 machine learning models, which had accuracies ranging from 66% (4072/6169) to 70.9% (4374/6169). Conclusions: Among many text analysis techniques, ChatGPT demonstrates superior accuracy and emerges as a promising tool for Cantonese sentiment analysis. This study also highlights ChatGPT’s applicability in real-world scenarios, such as monitoring the quality of text-based counseling services and detecting message-level sentiments in vivo. The insights derived from this study pave the way for further exploration into the capabilities of ChatGPT in the context of underresourced languages and specialized domains like psychotherapy and natural language processing. %M 38289662 %R 10.2196/51069 %U https://www.jmir.org/2024/1/e51069 %U https://doi.org/10.2196/51069 %U http://www.ncbi.nlm.nih.gov/pubmed/38289662 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e48995 %T BERT-Based Neural Network for Inpatient Fall Detection From Electronic Medical Records: Retrospective Cohort Study %A Cheligeer,Cheligeer %A Wu,Guosong %A Lee,Seungwon %A Pan,Jie %A Southern,Danielle A %A Martin,Elliot A %A Sapiro,Natalie %A Eastwood,Cathy A %A Quan,Hude %A Xu,Yuan %+ Centre for Health Informatics, Cumming School of Medicine, University of Calgary, 3280 Hospital Dr NW, Calgary, AB, T2N 4Z6, Canada, 1 (403) 210 9554, yuxu@ucalgary.ca %K accidental falls %K electronic medical records %K data mining %K machine learning %K patient safety %K natural language processing %K adverse event %D 2024 %7 30.1.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Inpatient falls are a substantial concern for health care providers and are associated with negative outcomes for patients. Automated detection of falls using machine learning (ML) algorithms may aid in improving patient safety and reducing the occurrence of falls. Objective: This study aims to develop and evaluate an ML algorithm for inpatient fall detection using multidisciplinary progress record notes and a pretrained Bidirectional Encoder Representation from Transformers (BERT) language model. Methods: A cohort of 4323 adult patients admitted to 3 acute care hospitals in Calgary, Alberta, Canada from 2016 to 2021 were randomly sampled. Trained reviewers determined falls from patient charts, which were linked to electronic medical records and administrative data. The BERT-based language model was pretrained on clinical notes, and a fall detection algorithm was developed based on a neural network binary classification architecture. Results: To address various use scenarios, we developed 3 different Alberta hospital notes-specific BERT models: a high sensitivity model (sensitivity 97.7, IQR 87.7-99.9), a high positive predictive value model (positive predictive value 85.7, IQR 57.2-98.2), and the high F1-score model (F1=64.4). Our proposed method outperformed 3 classical ML algorithms and an International Classification of Diseases code–based algorithm for fall detection, showing its potential for improved performance in diverse clinical settings. Conclusions: The developed algorithm provides an automated and accurate method for inpatient fall detection using multidisciplinary progress record notes and a pretrained BERT language model. This method could be implemented in clinical practice to improve patient safety and reduce the occurrence of falls in hospitals. %M 38289643 %R 10.2196/48995 %U https://medinform.jmir.org/2024/1/e48995 %U https://doi.org/10.2196/48995 %U http://www.ncbi.nlm.nih.gov/pubmed/38289643 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e47240 %T An Environmental Uncertainty Perception Framework for Misinformation Detection and Spread Prediction in the COVID-19 Pandemic: Artificial Intelligence Approach %A Lu,Jiahui %A Zhang,Huibin %A Xiao,Yi %A Wang,Yingyu %+ School of New Media and Communication, Tianjin University, Number 92, Weijin Road, Tianjin, 300072, China, 86 15135154977, zhanghb@tju.edu.cn %K misinformation detection %K misinformation spread prediction %K uncertainty %K COVID-19 %K information environment %D 2024 %7 29.1.2024 %9 Original Paper %J JMIR AI %G English %X Background: Amidst the COVID-19 pandemic, misinformation on social media has posed significant threats to public health. Detecting and predicting the spread of misinformation are crucial for mitigating its adverse effects. However, prevailing frameworks for these tasks have predominantly focused on post-level signals of misinformation, neglecting features of the broader information environment where misinformation originates and proliferates. Objective: This study aims to create a novel framework that integrates the uncertainty of the information environment into misinformation features, with the goal of enhancing the model’s accuracy in tasks such as misinformation detection and predicting the scale of dissemination. The objective is to provide better support for online governance efforts during health crises. Methods: In this study, we embraced uncertainty features within the information environment and introduced a novel Environmental Uncertainty Perception (EUP) framework for the detection of misinformation and the prediction of its spread on social media. The framework encompasses uncertainty at 4 scales of the information environment: physical environment, macro-media environment, micro-communicative environment, and message framing. We assessed the effectiveness of the EUP using real-world COVID-19 misinformation data sets. Results: The experimental results demonstrated that the EUP alone achieved notably good performance, with detection accuracy at 0.753 and prediction accuracy at 0.71. These results were comparable to state-of-the-art baseline models such as bidirectional long short-term memory (BiLSTM; detection accuracy 0.733 and prediction accuracy 0.707) and bidirectional encoder representations from transformers (BERT; detection accuracy 0.755 and prediction accuracy 0.728). Additionally, when the baseline models collaborated with the EUP, they exhibited improved accuracy by an average of 1.98% for the misinformation detection and 2.4% for spread-prediction tasks. On unbalanced data sets, the EUP yielded relative improvements of 21.5% and 5.7% in macro-F1-score and area under the curve, respectively. Conclusions: This study makes a significant contribution to the literature by recognizing uncertainty features within information environments as a crucial factor for improving misinformation detection and spread-prediction algorithms during the pandemic. The research elaborates on the complexities of uncertain information environments for misinformation across 4 distinct scales, including the physical environment, macro-media environment, micro-communicative environment, and message framing. The findings underscore the effectiveness of incorporating uncertainty into misinformation detection and spread prediction, providing an interdisciplinary and easily implementable framework for the field. %M 38875583 %R 10.2196/47240 %U https://ai.jmir.org/2024/1/e47240 %U https://doi.org/10.2196/47240 %U http://www.ncbi.nlm.nih.gov/pubmed/38875583 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 11 %N %P e52055 %T Testing the Feasibility and Acceptability of Using an Artificial Intelligence Chatbot to Promote HIV Testing and Pre-Exposure Prophylaxis in Malaysia: Mixed Methods Study %A Cheah,Min Hui %A Gan,Yan Nee %A Altice,Frederick L %A Wickersham,Jeffrey A %A Shrestha,Roman %A Salleh,Nur Afiqah Mohd %A Ng,Kee Seong %A Azwa,Iskandar %A Balakrishnan,Vimala %A Kamarulzaman,Adeeba %A Ni,Zhao %+ School of Nursing, Yale University, 400 West Campus Drive, Orange, CT, 06477, United States, 1 203 737 3039, zhao.ni@yale.edu %K artificial intelligence %K acceptability %K chatbot %K feasibility %K HIV prevention %K HIV testing %K men who have sex with men %K MSM %K mobile health %K mHealth %K preexposure prophylaxis %K PrEP %K mobile phone %D 2024 %7 26.1.2024 %9 Original Paper %J JMIR Hum Factors %G English %X Background: The HIV epidemic continues to grow fastest among men who have sex with men (MSM) in Malaysia in the presence of stigma and discrimination. Engaging MSM on the internet using chatbots supported through artificial intelligence (AI) can potentially help HIV prevention efforts. We previously identified the benefits, limitations, and preferred features of HIV prevention AI chatbots and developed an AI chatbot prototype that is now tested for feasibility and acceptability. Objective: This study aims to test the feasibility and acceptability of an AI chatbot in promoting the uptake of HIV testing and pre-exposure prophylaxis (PrEP) in MSM. Methods: We conducted beta testing with 14 MSM from February to April 2022 using Zoom (Zoom Video Communications, Inc). Beta testing involved 3 steps: a 45-minute human-chatbot interaction using the think-aloud method, a 35-minute semistructured interview, and a 10-minute web-based survey. The first 2 steps were recorded, transcribed verbatim, and analyzed using the Unified Theory of Acceptance and Use of Technology. Emerging themes from the qualitative data were mapped on the 4 domains of the Unified Theory of Acceptance and Use of Technology: performance expectancy, effort expectancy, facilitating conditions, and social influence. Results: Most participants (13/14, 93%) perceived the chatbot to be useful because it provided comprehensive information on HIV testing and PrEP (performance expectancy). All participants indicated that the chatbot was easy to use because of its simple, straightforward design and quick, friendly responses (effort expectancy). Moreover, 93% (13/14) of the participants rated the overall chatbot quality as high, and all participants perceived the chatbot as a helpful tool and would refer it to others. Approximately 79% (11/14) of the participants agreed they would continue using the chatbot. They suggested adding a local language (ie, Bahasa Malaysia) to customize the chatbot to the Malaysian context (facilitating condition) and suggested that the chatbot should also incorporate more information on mental health, HIV risk assessment, and consequences of HIV. In terms of social influence, all participants perceived the chatbot as helpful in avoiding stigma-inducing interactions and thus could increase the frequency of HIV testing and PrEP uptake among MSM. Conclusions: The current AI chatbot is feasible and acceptable to promote the uptake of HIV testing and PrEP. To ensure the successful implementation and dissemination of AI chatbots in Malaysia, they should be customized to communicate in Bahasa Malaysia and upgraded to provide other HIV-related information to improve usability, such as mental health support, risk assessment for sexually transmitted infections, AIDS treatment, and the consequences of contracting HIV. %M 38277206 %R 10.2196/52055 %U https://humanfactors.jmir.org/2024/1/e52055 %U https://doi.org/10.2196/52055 %U http://www.ncbi.nlm.nih.gov/pubmed/38277206 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e52200 %T Patient Phenotyping for Atopic Dermatitis With Transformers and Machine Learning: Algorithm Development and Validation Study %A Wang,Andrew %A Fulton,Rachel %A Hwang,Sy %A Margolis,David J %A Mowery,Danielle %+ University of Pennsylvania, A206 Richards Building, 3700 Hamilton Walk, Philadelphia, PA, 19104, United States, 1 2157466677, dlmowery@pennmedicine.upenn.edu %K atopic dermatitis %K classification %K classifier %K dermatitis %K dermatology %K EHR %K electronic health record %K health records %K health %K informatics %K machine learning %K natural language processing %K NLP %K patient phenotyping %K phenotype %K skin %K transformer %K transformers %D 2024 %7 26.1.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Atopic dermatitis (AD) is a chronic skin condition that millions of people around the world live with each day. Performing research into identifying the causes and treatment for this disease has great potential to provide benefits for these individuals. However, AD clinical trial recruitment is not a trivial task due to the variance in diagnostic precision and phenotypic definitions leveraged by different clinicians, as well as the time spent finding, recruiting, and enrolling patients by clinicians to become study participants. Thus, there is a need for automatic and effective patient phenotyping for cohort recruitment. Objective: This study aims to present an approach for identifying patients whose electronic health records suggest that they may have AD. Methods: We created a vectorized representation of each patient and trained various supervised machine learning methods to classify when a patient has AD. Each patient is represented by a vector of either probabilities or binary values, where each value indicates whether they meet a different criteria for AD diagnosis. Results: The most accurate AD classifier performed with a class-balanced accuracy of 0.8036, a precision of 0.8400, and a recall of 0.7500 when using XGBoost (Extreme Gradient Boosting). Conclusions: Creating an automated approach for identifying patient cohorts has the potential to accelerate, standardize, and automate the process of patient recruitment for AD studies; therefore, reducing clinician burden and informing the discovery of better treatment options for AD. %M 38277207 %R 10.2196/52200 %U https://formative.jmir.org/2024/1/e52200 %U https://doi.org/10.2196/52200 %U http://www.ncbi.nlm.nih.gov/pubmed/38277207 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e48443 %T Unlocking the Secrets Behind Advanced Artificial Intelligence Language Models in Deidentifying Chinese-English Mixed Clinical Text: Development and Validation Study %A Lee,You-Qian %A Chen,Ching-Tai %A Chen,Chien-Chang %A Lee,Chung-Hong %A Chen,Peitsz %A Wu,Chi-Shin %A Dai,Hong-Jie %+ Intelligent System Laboratory, Department of Electrical Engineering, College of Electrical Engineering and Computer Science, National Kaohsiung University of Science and Technology, No. 415, Jiangong Road, Sanmin District, Kaohsiung, 80778, Taiwan, 886 73814526 ext 15510, hjdai@nkust.edu.tw %K code mixing %K electronic health record %K deidentification %K pretrained language model %K large language model %K ChatGPT %D 2024 %7 25.1.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: The widespread use of electronic health records in the clinical and biomedical fields makes the removal of protected health information (PHI) essential to maintain privacy. However, a significant portion of information is recorded in unstructured textual forms, posing a challenge for deidentification. In multilingual countries, medical records could be written in a mixture of more than one language, referred to as code mixing. Most current clinical natural language processing techniques are designed for monolingual text, and there is a need to address the deidentification of code-mixed text. Objective: The aim of this study was to investigate the effectiveness and underlying mechanism of fine-tuned pretrained language models (PLMs) in identifying PHI in the code-mixed context. Additionally, we aimed to evaluate the potential of prompting large language models (LLMs) for recognizing PHI in a zero-shot manner. Methods: We compiled the first clinical code-mixed deidentification data set consisting of text written in Chinese and English. We explored the effectiveness of fine-tuned PLMs for recognizing PHI in code-mixed content, with a focus on whether PLMs exploit naming regularity and mention coverage to achieve superior performance, by probing the developed models’ outputs to examine their decision-making process. Furthermore, we investigated the potential of prompt-based in-context learning of LLMs for recognizing PHI in code-mixed text. Results: The developed methods were evaluated on a code-mixed deidentification corpus of 1700 discharge summaries. We observed that different PHI types had preferences in their occurrences within the different types of language-mixed sentences, and PLMs could effectively recognize PHI by exploiting the learned name regularity. However, the models may exhibit suboptimal results when regularity is weak or mentions contain unknown words that the representations cannot generate well. We also found that the availability of code-mixed training instances is essential for the model’s performance. Furthermore, the LLM-based deidentification method was a feasible and appealing approach that can be controlled and enhanced through natural language prompts. Conclusions: The study contributes to understanding the underlying mechanism of PLMs in addressing the deidentification process in the code-mixed context and highlights the significance of incorporating code-mixed training instances into the model training phase. To support the advancement of research, we created a manipulated subset of the resynthesized data set available for research purposes. Based on the compiled data set, we found that the LLM-based deidentification method is a feasible approach, but carefully crafted prompts are essential to avoid unwanted output. However, the use of such methods in the hospital setting requires careful consideration of data security and privacy concerns. Further research could explore the augmentation of PLMs and LLMs with external knowledge to improve their strength in recognizing rare PHI. %M 38271060 %R 10.2196/48443 %U https://www.jmir.org/2024/1/e48443 %U https://doi.org/10.2196/48443 %U http://www.ncbi.nlm.nih.gov/pubmed/38271060 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e49575 %T Digital Transformation of Public Health for Noncommunicable Diseases: Narrative Viewpoint of Challenges and Opportunities %A Leal Neto,Onicio %A Von Wyl,Viktor %+ Department of Computer Science, ETH Zurich, Universitätstrasse 6, Zurich, 8092, Switzerland, 41 44 632 50 94, onicio@gmail.com %K digital public health %K artificial intelligence %K non-communicable diseases %K digital health %K surveillance %K well being %K technological advancement %K public health efficiency %K digital innovation %D 2024 %7 25.1.2024 %9 Viewpoint %J JMIR Public Health Surveill %G English %X The recent SARS-CoV-2 pandemic underscored the effectiveness and rapid deployment of digital public health interventions, notably the digital proximity tracing apps, leveraging Bluetooth capabilities to trace and notify users about potential infection exposures. Backed by renowned organizations such as the World Health Organization and the European Union, digital proximity tracings showcased the promise of digital public health. As the world pivots from pandemic responses, it becomes imperative to address noncommunicable diseases (NCDs) that account for a vast majority of health care expenses and premature disability-adjusted life years lost. The narrative of digital transformation in the realm of NCD public health is distinct from infectious diseases. Public health, with its multifaceted approach from disciplines such as medicine, epidemiology, and psychology, focuses on promoting healthy living and choices through functions categorized as “Assessment,” “Policy Development,” “Resource Allocation,” “Assurance,” and “Access.” The power of artificial intelligence (AI) in this digital transformation is noteworthy. AI can automate repetitive tasks, facilitating health care providers to prioritize personal interactions, especially those that cannot be digitalized like emotional support. Moreover, AI presents tools for individuals to be proactive in their health management. However, the human touch remains irreplaceable; AI serves as a companion guiding through the health care landscape. Digital evolution, while revolutionary, poses its own set of challenges. Issues of equity and access are at the forefront. Vulnerable populations, whether due to economic constraints, geographical barriers, or digital illiteracy, face the threat of being marginalized further. This transformation mandates an inclusive strategy, focusing on not amplifying existing health disparities but eliminating them. Population-level digital interventions in NCD prevention demand societal agreement. Policies, like smoking bans or sugar taxes, though effective, might affect those not directly benefiting. Hence, all involved parties, from policy makers to the public, should have a balanced perspective on the advantages, risks, and expenses of these digital shifts. For a successful digital shift in public health, especially concerning NCDs, AI’s potential to enhance efficiency, effectiveness, user experience, and equity—the “quadruple aim”—is undeniable. However, it is vital that AI-driven initiatives in public health domains remain purposeful, offering improvements without compromising other objectives. The broader success of digital public health hinges on transparent benchmarks and criteria, ensuring maximum benefits without sidelining minorities or vulnerable groups. Especially in population-centric decisions, like resource allocation, AI’s ability to avoid bias is paramount. Therefore, the continuous involvement of stakeholders, including patients and minority groups, remains pivotal in the progression of AI-integrated digital public health. %M 38271097 %R 10.2196/49575 %U https://publichealth.jmir.org/2024/1/e49575 %U https://doi.org/10.2196/49575 %U http://www.ncbi.nlm.nih.gov/pubmed/38271097 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 11 %N %P e50150 %T A Comparison of ChatGPT and Fine-Tuned Open Pre-Trained Transformers (OPT) Against Widely Used Sentiment Analysis Tools: Sentiment Analysis of COVID-19 Survey Data %A Lossio-Ventura,Juan Antonio %A Weger,Rachel %A Lee,Angela Y %A Guinee,Emily P %A Chung,Joyce %A Atlas,Lauren %A Linos,Eleni %A Pereira,Francisco %+ National Institute of Mental Health, National Institutes of Health, 3D41, 10 Center Dr, Bethesda, MD, 20814, United States, 1 3018272632, juan.lossio@nih.gov %K sentiment analysis %K COVID-19 survey %K large language model %K few-shot learning %K zero-shot learning %K ChatGPT %K COVID-19 %D 2024 %7 25.1.2024 %9 Original Paper %J JMIR Ment Health %G English %X Background: Health care providers and health-related researchers face significant challenges when applying sentiment analysis tools to health-related free-text survey data. Most state-of-the-art applications were developed in domains such as social media, and their performance in the health care context remains relatively unknown. Moreover, existing studies indicate that these tools often lack accuracy and produce inconsistent results. Objective: This study aims to address the lack of comparative analysis on sentiment analysis tools applied to health-related free-text survey data in the context of COVID-19. The objective was to automatically predict sentence sentiment for 2 independent COVID-19 survey data sets from the National Institutes of Health and Stanford University. Methods: Gold standard labels were created for a subset of each data set using a panel of human raters. We compared 8 state-of-the-art sentiment analysis tools on both data sets to evaluate variability and disagreement across tools. In addition, few-shot learning was explored by fine-tuning Open Pre-Trained Transformers (OPT; a large language model [LLM] with publicly available weights) using a small annotated subset and zero-shot learning using ChatGPT (an LLM without available weights). Results: The comparison of sentiment analysis tools revealed high variability and disagreement across the evaluated tools when applied to health-related survey data. OPT and ChatGPT demonstrated superior performance, outperforming all other sentiment analysis tools. Moreover, ChatGPT outperformed OPT, exhibited higher accuracy by 6% and higher F-measure by 4% to 7%. Conclusions: This study demonstrates the effectiveness of LLMs, particularly the few-shot learning and zero-shot learning approaches, in the sentiment analysis of health-related survey data. These results have implications for saving human labor and improving efficiency in sentiment analysis tasks, contributing to advancements in the field of automated sentiment analysis. %M 38271138 %R 10.2196/50150 %U https://mental.jmir.org/2024/1/e50150 %U https://doi.org/10.2196/50150 %U http://www.ncbi.nlm.nih.gov/pubmed/38271138 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 11 %N %P e53378 %T A Machine Learning Approach with Human-AI Collaboration for Automated Classification of Patient Safety Event Reports: Algorithm Development and Validation Study %A Chen,Hongbo %A Cohen,Eldan %A Wilson,Dulaney %A Alfred,Myrtede %+ Department of Mechanical & Industrial Engineering, Faculty of Applied Science & Engineering, University of Toronto, 27 King's College Cir, Toronto, ON, M5S 1A1, Canada, 1 4372154739, myrtede.alfred@utoronto.ca %K accident %K accidents %K black box %K classification %K classifier %K collaboration %K design %K document %K documentation %K documents %K explainability %K explainable %K human-AI collaboration %K human-AI %K human-computer %K human-machine %K incident reporting %K interface design %K interface %K interpretable %K LIME %K machine learning %K patient safety %K predict %K prediction %K predictions %K predictive %K report %K reporting %K safety %K text %K texts %K textual %K artificial intelligence %D 2024 %7 25.1.2024 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Adverse events refer to incidents with potential or actual harm to patients in hospitals. These events are typically documented through patient safety event (PSE) reports, which consist of detailed narratives providing contextual information on the occurrences. Accurate classification of PSE reports is crucial for patient safety monitoring. However, this process faces challenges due to inconsistencies in classifications and the sheer volume of reports. Recent advancements in text representation, particularly contextual text representation derived from transformer-based language models, offer a promising solution for more precise PSE report classification. Integrating the machine learning (ML) classifier necessitates a balance between human expertise and artificial intelligence (AI). Central to this integration is the concept of explainability, which is crucial for building trust and ensuring effective human-AI collaboration. Objective: This study aims to investigate the efficacy of ML classifiers trained using contextual text representation in automatically classifying PSE reports. Furthermore, the study presents an interface that integrates the ML classifier with the explainability technique to facilitate human-AI collaboration for PSE report classification. Methods: This study used a data set of 861 PSE reports from a large academic hospital’s maternity units in the Southeastern United States. Various ML classifiers were trained with both static and contextual text representations of PSE reports. The trained ML classifiers were evaluated with multiclass classification metrics and the confusion matrix. The local interpretable model-agnostic explanations (LIME) technique was used to provide the rationale for the ML classifier’s predictions. An interface that integrates the ML classifier with the LIME technique was designed for incident reporting systems. Results: The top-performing classifier using contextual representation was able to obtain an accuracy of 75.4% (95/126) compared to an accuracy of 66.7% (84/126) by the top-performing classifier trained using static text representation. A PSE reporting interface has been designed to facilitate human-AI collaboration in PSE report classification. In this design, the ML classifier recommends the top 2 most probable event types, along with the explanations for the prediction, enabling PSE reporters and patient safety analysts to choose the most suitable one. The LIME technique showed that the classifier occasionally relies on arbitrary words for classification, emphasizing the necessity of human oversight. Conclusions: This study demonstrates that training ML classifiers with contextual text representations can significantly enhance the accuracy of PSE report classification. The interface designed in this study lays the foundation for human-AI collaboration in the classification of PSE reports. The insights gained from this research enhance the decision-making process in PSE report classification, enabling hospitals to more efficiently identify potential risks and hazards and enabling patient safety analysts to take timely actions to prevent patient harm. %M 38271086 %R 10.2196/53378 %U https://humanfactors.jmir.org/2024/1/e53378 %U https://doi.org/10.2196/53378 %U http://www.ncbi.nlm.nih.gov/pubmed/38271086 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e49784 %T Sepsis Prediction at Emergency Department Triage Using Natural Language Processing: Retrospective Cohort Study %A Brann,Felix %A Sterling,Nicholas William %A Frisch,Stephanie O %A Schrager,Justin D %+ Department of Emergency Medicine, Emory University School of Medicine, 531 Asbury Circle, Annex Building N340, Atlanta, GA, 30322, United States, 1 404 778 5975, justin@vitaler.com %K natural language processing %K machine learning %K sepsis %K emergency department %K triage %D 2024 %7 25.1.2024 %9 Original Paper %J JMIR AI %G English %X Background: Despite its high lethality, sepsis can be difficult to detect on initial presentation to the emergency department (ED). Machine learning–based tools may provide avenues for earlier detection and lifesaving intervention. Objective: The study aimed to predict sepsis at the time of ED triage using natural language processing of nursing triage notes and available clinical data. Methods: We constructed a retrospective cohort of all 1,234,434 consecutive ED encounters in 2015-2021 from 4 separate clinically heterogeneous academically affiliated EDs. After exclusion criteria were applied, the final cohort included 1,059,386 adult ED encounters. The primary outcome criteria for sepsis were presumed severe infection and acute organ dysfunction. After vectorization and dimensional reduction of triage notes and clinical data available at triage, a decision tree–based ensemble (time-of-triage) model was trained to predict sepsis using the training subset (n=950,921). A separate (comprehensive) model was trained using these data and laboratory data, as it became available at 1-hour intervals, after triage. Model performances were evaluated using the test (n=108,465) subset. Results: Sepsis occurred in 35,318 encounters (incidence 3.45%). For sepsis prediction at the time of patient triage, using the primary definition, the area under the receiver operating characteristic curve (AUC) and macro F1-score for sepsis were 0.94 and 0.61, respectively. Sensitivity, specificity, and false positive rate were 0.87, 0.85, and 0.15, respectively. The time-of-triage model accurately predicted sepsis in 76% (1635/2150) of sepsis cases where sepsis screening was not initiated at triage and 97.5% (1630/1671) of cases where sepsis screening was initiated at triage. Positive and negative predictive values were 0.18 and 0.99, respectively. For sepsis prediction using laboratory data available each hour after ED arrival, the AUC peaked to 0.97 at 12 hours. Similar results were obtained when stratifying by hospital and when Centers for Disease Control and Prevention hospital toolkit for adult sepsis surveillance criteria were used to define sepsis. Among septic cases, sepsis was predicted in 36.1% (1375/3814), 49.9% (1902/3814), and 68.3% (2604/3814) of encounters, respectively, at 3, 2, and 1 hours prior to the first intravenous antibiotic order or where antibiotics where not ordered within the first 12 hours. Conclusions: Sepsis can accurately be predicted at ED presentation using nursing triage notes and clinical information available at the time of triage. This indicates that machine learning can facilitate timely and reliable alerting for intervention. Free-text data can improve the performance of predictive modeling at the time of triage and throughout the ED course. %M 38875594 %R 10.2196/49784 %U https://ai.jmir.org/2024/1/e49784 %U https://doi.org/10.2196/49784 %U http://www.ncbi.nlm.nih.gov/pubmed/38875594 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52113 %T Assessing ChatGPT’s Mastery of Bloom’s Taxonomy Using Psychosomatic Medicine Exam Questions: Mixed-Methods Study %A Herrmann-Werner,Anne %A Festl-Wietek,Teresa %A Holderried,Friederike %A Herschbach,Lea %A Griewatz,Jan %A Masters,Ken %A Zipfel,Stephan %A Mahling,Moritz %+ Tübingen Institute for Medical Education, Faculty of Medicine, University of Tübingen, Elfriede-Aulhorn-Strasse 10, Tübingen, 72076 Tübingen, Germany, 49 7071 29 73715, teresa.festl-wietek@med.uni-tuebingen.de %K answer %K artificial intelligence %K assessment %K Bloom’s taxonomy %K ChatGPT %K classification %K error %K exam %K examination %K generative %K GPT-4 %K Generative Pre-trained Transformer 4 %K language model %K learning outcome %K LLM %K MCQ %K medical education %K medical exam %K multiple-choice question %K natural language processing %K NLP %K psychosomatic %K question %K response %K taxonomy %D 2024 %7 23.1.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Large language models such as GPT-4 (Generative Pre-trained Transformer 4) are being increasingly used in medicine and medical education. However, these models are prone to “hallucinations” (ie, outputs that seem convincing while being factually incorrect). It is currently unknown how these errors by large language models relate to the different cognitive levels defined in Bloom’s taxonomy. Objective: This study aims to explore how GPT-4 performs in terms of Bloom’s taxonomy using psychosomatic medicine exam questions. Methods: We used a large data set of psychosomatic medicine multiple-choice questions (N=307) with real-world results derived from medical school exams. GPT-4 answered the multiple-choice questions using 2 distinct prompt versions: detailed and short. The answers were analyzed using a quantitative approach and a qualitative approach. Focusing on incorrectly answered questions, we categorized reasoning errors according to the hierarchical framework of Bloom’s taxonomy. Results: GPT-4’s performance in answering exam questions yielded a high success rate: 93% (284/307) for the detailed prompt and 91% (278/307) for the short prompt. Questions answered correctly by GPT-4 had a statistically significant higher difficulty than questions answered incorrectly (P=.002 for the detailed prompt and P<.001 for the short prompt). Independent of the prompt, GPT-4’s lowest exam performance was 78.9% (15/19), thereby always surpassing the “pass” threshold. Our qualitative analysis of incorrect answers, based on Bloom’s taxonomy, showed that errors were primarily in the “remember” (29/68) and “understand” (23/68) cognitive levels; specific issues arose in recalling details, understanding conceptual relationships, and adhering to standardized guidelines. Conclusions: GPT-4 demonstrated a remarkable success rate when confronted with psychosomatic medicine multiple-choice exam questions, aligning with previous findings. When evaluated through Bloom’s taxonomy, our data revealed that GPT-4 occasionally ignored specific facts (remember), provided illogical reasoning (understand), or failed to apply concepts to a new situation (apply). These errors, which were confidently presented, could be attributed to inherent model biases and the tendency to generate outputs that maximize likelihood. %M 38261378 %R 10.2196/52113 %U https://www.jmir.org/2024/1/e52113 %U https://doi.org/10.2196/52113 %U http://www.ncbi.nlm.nih.gov/pubmed/38261378 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 11 %N %P e49577 %T Health Care Professionals’ Views on the Use of Passive Sensing, AI, and Machine Learning in Mental Health Care: Systematic Review With Meta-Synthesis %A Rogan,Jessica %A Bucci,Sandra %A Firth,Joseph %+ Division of Psychology and Mental Health, School of Health Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Sciences, The University of Manchester, 2nd Floor, Zochonis Building, Brunswick Street, Manchester, M13 9PL, United Kingdom, 44 161 306 0422, sandra.bucci@manchester.ac.uk %K artificial intelligence %K machine learning %K passive sensing %K mental health care %K clinicians %K views %K meta-synthesis %K review %K mental health %K health care %K health care professionals %K psychology %K psychiatry %K mental health professionals %K mobile phone %D 2024 %7 23.1.2024 %9 Review %J JMIR Ment Health %G English %X Background: Mental health difficulties are highly prevalent worldwide. Passive sensing technologies and applied artificial intelligence (AI) methods can provide an innovative means of supporting the management of mental health problems and enhancing the quality of care. However, the views of stakeholders are important in understanding the potential barriers to and facilitators of their implementation. Objective: This study aims to review, critically appraise, and synthesize qualitative findings relating to the views of mental health care professionals on the use of passive sensing and AI in mental health care. Methods: A systematic search of qualitative studies was performed using 4 databases. A meta-synthesis approach was used, whereby studies were analyzed using an inductive thematic analysis approach within a critical realist epistemological framework. Results: Overall, 10 studies met the eligibility criteria. The 3 main themes were uses of passive sensing and AI in clinical practice, barriers to and facilitators of use in practice, and consequences for service users. A total of 5 subthemes were identified: barriers, facilitators, empowerment, risk to well-being, and data privacy and protection issues. Conclusions: Although clinicians are open-minded about the use of passive sensing and AI in mental health care, important factors to consider are service user well-being, clinician workloads, and therapeutic relationships. Service users and clinicians must be involved in the development of digital technologies and systems to ensure ease of use. The development of, and training in, clear policies and guidelines on the use of passive sensing and AI in mental health care, including risk management and data security procedures, will also be key to facilitating clinician engagement. The means for clinicians and service users to provide feedback on how the use of passive sensing and AI in practice is being received should also be considered. Trial Registration: PROSPERO International Prospective Register of Systematic Reviews CRD42022331698; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=331698 %M 38261403 %R 10.2196/49577 %U https://mental.jmir.org/2024/1/e49577 %U https://doi.org/10.2196/49577 %U http://www.ncbi.nlm.nih.gov/pubmed/38261403 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 7 %N %P e49415 %T Promoting Personalized Reminiscence Among Cognitively Intact Older Adults Through an AI-Driven Interactive Multimodal Photo Album: Development and Usability Study %A Wang,Xin %A Li,Juan %A Liang,Tianyi %A Hasan,Wordh Ul %A Zaman,Kimia Tuz %A Du,Yang %A Xie,Bo %A Tao,Cui %+ Department of Computer Science, North Dakota State University, Quentin Burdick Building Room 258, 1320 Albrecht Boulevard, Fargo, ND, 58105, United States, 1 7012318562, J.Li@ndsu.edu %K aging %K knowledge graph %K machine learning %K reminiscence %K voice assistant %D 2024 %7 23.1.2024 %9 Original Paper %J JMIR Aging %G English %X Background: Reminiscence, a therapy that uses stimulating materials such as old photos and videos to stimulate long-term memory, can improve the emotional well-being and life satisfaction of older adults, including those who are cognitively intact. However, providing personalized reminiscence therapy can be challenging for caregivers and family members. Objective: This study aimed to achieve three objectives: (1) design and develop the GoodTimes app, an interactive multimodal photo album that uses artificial intelligence (AI) to engage users in personalized conversations and storytelling about their pictures, encompassing family, friends, and special moments; (2) examine the app’s functionalities in various scenarios using use-case studies and assess the app’s usability and user experience through the user study; and (3) investigate the app’s potential as a supplementary tool for reminiscence therapy among cognitively intact older adults, aiming to enhance their psychological well-being by facilitating the recollection of past experiences. Methods: We used state-of-the-art AI technologies, including image recognition, natural language processing, knowledge graph, logic, and machine learning, to develop GoodTimes. First, we constructed a comprehensive knowledge graph that models the information required for effective communication, including photos, people, locations, time, and stories related to the photos. Next, we developed a voice assistant that interacts with users by leveraging the knowledge graph and machine learning techniques. Then, we created various use cases to examine the functions of the system in different scenarios. Finally, to evaluate GoodTimes’ usability, we conducted a study with older adults (N=13; age range 58-84, mean 65.8 years). The study period started from January to March 2023. Results: The use-case tests demonstrated the performance of GoodTimes in handling a variety of scenarios, highlighting its versatility and adaptability. For the user study, the feedback from our participants was highly positive, with 92% (12/13) reporting a positive experience conversing with GoodTimes. All participants mentioned that the app invoked pleasant memories and aided in recollecting loved ones, resulting in a sense of happiness for the majority (11/13, 85%). Additionally, a significant majority found GoodTimes to be helpful (11/13, 85%) and user-friendly (12/13, 92%). Most participants (9/13, 69%) expressed a desire to use the app frequently, although some (4/13, 31%) indicated a need for technical support to navigate the system effectively. Conclusions: Our AI-based interactive photo album, GoodTimes, was able to engage users in browsing their photos and conversing about them. Preliminary evidence supports GoodTimes’ usability and benefits cognitively intact older adults. Future work is needed to explore its potential positive effects among older adults with cognitive impairment. %M 38261365 %R 10.2196/49415 %U https://aging.jmir.org/2024/1/e49415 %U https://doi.org/10.2196/49415 %U http://www.ncbi.nlm.nih.gov/pubmed/38261365 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51926 %T Uncovering Language Disparity of ChatGPT on Retinal Vascular Disease Classification: Cross-Sectional Study %A Liu,Xiaocong %A Wu,Jiageng %A Shao,An %A Shen,Wenyue %A Ye,Panpan %A Wang,Yao %A Ye,Juan %A Jin,Kai %A Yang,Jie %+ Eye Center, The Second Affiliated Hospital, Zhejiang University, 88 Jiefang Road, Hangzhou, Zhejiang, 310009, China, 86 571 87783907, jinkai@zju.edu.cn %K large language models %K ChatGPT %K clinical decision support %K retinal vascular disease %K artificial intelligence %D 2024 %7 22.1.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Benefiting from rich knowledge and the exceptional ability to understand text, large language models like ChatGPT have shown great potential in English clinical environments. However, the performance of ChatGPT in non-English clinical settings, as well as its reasoning, have not been explored in depth. Objective: This study aimed to evaluate ChatGPT’s diagnostic performance and inference abilities for retinal vascular diseases in a non-English clinical environment. Methods: In this cross-sectional study, we collected 1226 fundus fluorescein angiography reports and corresponding diagnoses written in Chinese and tested ChatGPT with 4 prompting strategies (direct diagnosis or diagnosis with a step-by-step reasoning process and in Chinese or English). Results: Compared with ChatGPT using Chinese prompts for direct diagnosis that achieved an F1-score of 70.47%, ChatGPT using English prompts for direct diagnosis achieved the best diagnostic performance (80.05%), which was inferior to ophthalmologists (89.35%) but close to ophthalmologist interns (82.69%). As for its inference abilities, although ChatGPT can derive a reasoning process with a low error rate (0.4 per report) for both Chinese and English prompts, ophthalmologists identified that the latter brought more reasoning steps with less incompleteness (44.31%), misinformation (1.96%), and hallucinations (0.59%) (all P<.001). Also, analysis of the robustness of ChatGPT with different language prompts indicated significant differences in the recall (P=.03) and F1-score (P=.04) between Chinese and English prompts. In short, when prompted in English, ChatGPT exhibited enhanced diagnostic and inference capabilities for retinal vascular disease classification based on Chinese fundus fluorescein angiography reports. Conclusions: ChatGPT can serve as a helpful medical assistant to provide diagnosis in non-English clinical environments, but there are still performance gaps, language disparities, and errors compared to professionals, which demonstrate the potential limitations and the need to continually explore more robust large language models in ophthalmology practice. %M 38252483 %R 10.2196/51926 %U https://www.jmir.org/2024/1/e51926 %U https://doi.org/10.2196/51926 %U http://www.ncbi.nlm.nih.gov/pubmed/38252483 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e49082 %T Beyond the Hype—The Actual Role and Risks of AI in Today’s Medical Practice: Comparative-Approach Study %A Hansen,Steffan %A Brandt,Carl Joakim %A Søndergaard,Jens %+ Research Unit of General Practice, Institution of Public Health, University of Southern Denmark, J.B. Winsløws Vej 9, Odense, 5000, Denmark, 45 65 50 36 19, sholsthansen@health.sdu.dk %K AI %K artificial intelligence %K ChatGPT-4 %K Microsoft Bing %K general practice %K ChatGPT %K chatbot %K chatbots %K writing %K academic %K academia %K Bing %D 2024 %7 22.1.2024 %9 Original Paper %J JMIR AI %G English %X Background: The evolution of artificial intelligence (AI) has significantly impacted various sectors, with health care witnessing some of its most groundbreaking contributions. Contemporary models, such as ChatGPT-4 and Microsoft Bing, have showcased capabilities beyond just generating text, aiding in complex tasks like literature searches and refining web-based queries. Objective: This study explores a compelling query: can AI author an academic paper independently? Our assessment focuses on four core dimensions: relevance (to ensure that AI’s response directly addresses the prompt), accuracy (to ascertain that AI’s information is both factually correct and current), clarity (to examine AI’s ability to present coherent and logical ideas), and tone and style (to evaluate whether AI can align with the formality expected in academic writings). Additionally, we will consider the ethical implications and practicality of integrating AI into academic writing. Methods: To assess the capabilities of ChatGPT-4 and Microsoft Bing in the context of academic paper assistance in general practice, we used a systematic approach. ChatGPT-4, an advanced AI language model by Open AI, excels in generating human-like text and adapting responses based on user interactions, though it has a knowledge cut-off in September 2021. Microsoft Bing's AI chatbot facilitates user navigation on the Bing search engine, offering tailored search Results: In terms of relevance, ChatGPT-4 delved deeply into AI’s health care role, citing academic sources and discussing diverse applications and concerns, while Microsoft Bing provided a concise, less detailed overview. In terms of accuracy, ChatGPT-4 correctly cited 72% (23/32) of its peer-reviewed articles but included some nonexistent references. Microsoft Bing’s accuracy stood at 46% (6/13), supplemented by relevant non–peer-reviewed articles. In terms of clarity, both models conveyed clear, coherent text. ChatGPT-4 was particularly adept at detailing technical concepts, while Microsoft Bing was more general. In terms of tone, both models maintained an academic tone, but ChatGPT-4 exhibited superior depth and breadth in content delivery. Conclusions: Comparing ChatGPT-4 and Microsoft Bing for academic assistance revealed strengths and limitations. ChatGPT-4 excels in depth and relevance but falters in citation accuracy. Microsoft Bing is concise but lacks robust detail. Though both models have potential, neither can independently handle comprehensive academic tasks. As AI evolves, combining ChatGPT-4’s depth with Microsoft Bing’s up-to-date referencing could optimize academic support. Researchers should critically assess AI outputs to maintain academic credibility. %M 38875597 %R 10.2196/49082 %U https://ai.jmir.org/2024/1/e49082 %U https://doi.org/10.2196/49082 %U http://www.ncbi.nlm.nih.gov/pubmed/38875597 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e47430 %T The Reporting Quality of Machine Learning Studies on Pediatric Diabetes Mellitus: Systematic Review %A Zrubka,Zsombor %A Kertész,Gábor %A Gulácsi,László %A Czere,János %A Hölgyesi,Áron %A Nezhad,Hossein Motahari %A Mosavi,Amir %A Kovács,Levente %A Butte,Atul J %A Péntek,Márta %+ HECON Health Economics Research Center, University Research and Innovation Center, Óbuda University, Bécsi út 96/b, Budapest, 1034, Hungary, 36 302029415, zrubka.zsombor@uni-obuda.hu %K diabetes mellitus %K children %K adolescent %K pediatric %K machine learning %K Minimum Information About Clinical Artificial Intelligence Modelling %K MI-CLAIM %K reporting quality %D 2024 %7 19.1.2024 %9 Review %J J Med Internet Res %G English %X Background: Diabetes mellitus (DM) is a major health concern among children with the widespread adoption of advanced technologies. However, concerns are growing about the transparency, replicability, biasedness, and overall validity of artificial intelligence studies in medicine. Objective: We aimed to systematically review the reporting quality of machine learning (ML) studies of pediatric DM using the Minimum Information About Clinical Artificial Intelligence Modelling (MI-CLAIM) checklist, a general reporting guideline for medical artificial intelligence studies. Methods: We searched the PubMed and Web of Science databases from 2016 to 2020. Studies were included if the use of ML was reported in children with DM aged 2 to 18 years, including studies on complications, screening studies, and in silico samples. In studies following the ML workflow of training, validation, and testing of results, reporting quality was assessed via MI-CLAIM by consensus judgments of independent reviewer pairs. Positive answers to the 17 binary items regarding sufficient reporting were qualitatively summarized and counted as a proxy measure of reporting quality. The synthesis of results included testing the association of reporting quality with publication and data type, participants (human or in silico), research goals, level of code sharing, and the scientific field of publication (medical or engineering), as well as with expert judgments of clinical impact and reproducibility. Results: After screening 1043 records, 28 studies were included. The sample size of the training cohort ranged from 5 to 561. Six studies featured only in silico patients. The reporting quality was low, with great variation among the 21 studies assessed using MI-CLAIM. The number of items with sufficient reporting ranged from 4 to 12 (mean 7.43, SD 2.62). The items on research questions and data characterization were reported adequately most often, whereas items on patient characteristics and model examination were reported adequately least often. The representativeness of the training and test cohorts to real-world settings and the adequacy of model performance evaluation were the most difficult to judge. Reporting quality improved over time (r=0.50; P=.02); it was higher than average in prognostic biomarker and risk factor studies (P=.04) and lower in noninvasive hypoglycemia detection studies (P=.006), higher in studies published in medical versus engineering journals (P=.004), and higher in studies sharing any code of the ML pipeline versus not sharing (P=.003). The association between expert judgments and MI-CLAIM ratings was not significant. Conclusions: The reporting quality of ML studies in the pediatric population with DM was generally low. Important details for clinicians, such as patient characteristics; comparison with the state-of-the-art solution; and model examination for valid, unbiased, and robust results, were often the weak points of reporting. To assess their clinical utility, the reporting standards of ML studies must evolve, and algorithms for this challenging population must become more transparent and replicable. %M 38241075 %R 10.2196/47430 %U https://www.jmir.org/2024/1/e47430 %U https://doi.org/10.2196/47430 %U http://www.ncbi.nlm.nih.gov/pubmed/38241075 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e51925 %T Predicting Depression Risk in Patients With Cancer Using Multimodal Data: Algorithm Development Study %A de Hond,Anne %A van Buchem,Marieke %A Fanconi,Claudio %A Roy,Mohana %A Blayney,Douglas %A Kant,Ilse %A Steyerberg,Ewout %A Hernandez-Boussard,Tina %+ Department of Medicine (Biomedical Informatics), Stanford Medicine, Stanford University, 1265 Welch Road, Stanford, CA, 94305, United States, 1 650 725 5507, boussard@stanford.edu %K natural language processing %K machine learning %K artificial intelligence %K oncology %K depression %K clinical decision support %K decision support %K cancer %K patients with cancer %K chemotherapy %K mental health %K prediction model %K depression risk %K cancer treatment %K radiotherapy %K diagnosis %K validation %K cancer care %K care %D 2024 %7 18.1.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Patients with cancer starting systemic treatment programs, such as chemotherapy, often develop depression. A prediction model may assist physicians and health care workers in the early identification of these vulnerable patients. Objective: This study aimed to develop a prediction model for depression risk within the first month of cancer treatment. Methods: We included 16,159 patients diagnosed with cancer starting chemo- or radiotherapy treatment between 2008 and 2021. Machine learning models (eg, least absolute shrinkage and selection operator [LASSO] logistic regression) and natural language processing models (Bidirectional Encoder Representations from Transformers [BERT]) were used to develop multimodal prediction models using both electronic health record data and unstructured text (patient emails and clinician notes). Model performance was assessed in an independent test set (n=5387, 33%) using area under the receiver operating characteristic curve (AUROC), calibration curves, and decision curve analysis to assess initial clinical impact use. Results: Among 16,159 patients, 437 (2.7%) received a depression diagnosis within the first month of treatment. The LASSO logistic regression models based on the structured data (AUROC 0.74, 95% CI 0.71-0.78) and structured data with email classification scores (AUROC 0.74, 95% CI 0.71-0.78) had the best discriminative performance. The BERT models based on clinician notes and structured data with email classification scores had AUROCs around 0.71. The logistic regression model based on email classification scores alone performed poorly (AUROC 0.54, 95% CI 0.52-0.56), and the model based solely on clinician notes had the worst performance (AUROC 0.50, 95% CI 0.49-0.52). Calibration was good for the logistic regression models, whereas the BERT models produced overly extreme risk estimates even after recalibration. There was a small range of decision thresholds for which the best-performing model showed promising clinical effectiveness use. The risks were underestimated for female and Black patients. Conclusions: The results demonstrated the potential and limitations of machine learning and multimodal models for predicting depression risk in patients with cancer. Future research is needed to further validate these models, refine the outcome label and predictors related to mental health, and address biases across subgroups. %M 38236635 %R 10.2196/51925 %U https://medinform.jmir.org/2024/1/e51925 %U https://doi.org/10.2196/51925 %U http://www.ncbi.nlm.nih.gov/pubmed/38236635 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e50842 %T Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study %A Haddad,Firas %A Saade,Joanna S %+ Department of Ophthalmology, American University of Beirut Medical Center, Bliss Street, Beirut, 1107 2020, Lebanon, 961 1350000 ext 8031, js62@aub.edu.lb %K ChatGPT %K artificial intelligence %K AI %K board examinations %K ophthalmology %K testing %D 2024 %7 18.1.2024 %9 Original Paper %J JMIR Med Educ %G English %X Background: ChatGPT and language learning models have gained attention recently for their ability to answer questions on various examinations across various disciplines. The question of whether ChatGPT could be used to aid in medical education is yet to be answered, particularly in the field of ophthalmology. Objective: The aim of this study is to assess the ability of ChatGPT-3.5 (GPT-3.5) and ChatGPT-4.0 (GPT-4.0) to answer ophthalmology-related questions across different levels of ophthalmology training. Methods: Questions from the United States Medical Licensing Examination (USMLE) steps 1 (n=44), 2 (n=60), and 3 (n=28) were extracted from AMBOSS, and 248 questions (64 easy, 122 medium, and 62 difficult questions) were extracted from the book, Ophthalmology Board Review Q&A, for the Ophthalmic Knowledge Assessment Program and the Board of Ophthalmology (OB) Written Qualifying Examination (WQE). Questions were prompted identically and inputted to GPT-3.5 and GPT-4.0. Results: GPT-3.5 achieved a total of 55% (n=210) of correct answers, while GPT-4.0 achieved a total of 70% (n=270) of correct answers. GPT-3.5 answered 75% (n=33) of questions correctly in USMLE step 1, 73.33% (n=44) in USMLE step 2, 60.71% (n=17) in USMLE step 3, and 46.77% (n=116) in the OB-WQE. GPT-4.0 answered 70.45% (n=31) of questions correctly in USMLE step 1, 90.32% (n=56) in USMLE step 2, 96.43% (n=27) in USMLE step 3, and 62.90% (n=156) in the OB-WQE. GPT-3.5 performed poorer as examination levels advanced (P<.001), while GPT-4.0 performed better on USMLE steps 2 and 3 and worse on USMLE step 1 and the OB-WQE (P<.001). The coefficient of correlation (r) between ChatGPT answering correctly and human users answering correctly was 0.21 (P=.01) for GPT-3.5 as compared to –0.31 (P<.001) for GPT-4.0. GPT-3.5 performed similarly across difficulty levels, while GPT-4.0 performed more poorly with an increase in the difficulty level. Both GPT models performed significantly better on certain topics than on others. Conclusions: ChatGPT is far from being considered a part of mainstream medical education. Future models with higher accuracy are needed for the platform to be effective in medical education. %M 38236632 %R 10.2196/50842 %U https://mededu.jmir.org/2024/1/e50842 %U https://doi.org/10.2196/50842 %U http://www.ncbi.nlm.nih.gov/pubmed/38236632 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e50174 %T ChatGPT in Medical Education: A Precursor for Automation Bias? %A Nguyen,Tina %+ The University of Texas Medical Branch, 301 University Blvd, Galveston, TX, 77551, United States, 1 4097721118, nguy.t921@gmail.com %K ChatGPT %K artificial intelligence %K AI %K medical students %K residents %K medical school curriculum %K medical education %K automation bias %K large language models %K LLMs %K bias %D 2024 %7 17.1.2024 %9 Editorial %J JMIR Med Educ %G English %X Artificial intelligence (AI) in health care has the promise of providing accurate and efficient results. However, AI can also be a black box, where the logic behind its results is nonrational. There are concerns if these questionable results are used in patient care. As physicians have the duty to provide care based on their clinical judgment in addition to their patients’ values and preferences, it is crucial that physicians validate the results from AI. Yet, there are some physicians who exhibit a phenomenon known as automation bias, where there is an assumption from the user that AI is always right. This is a dangerous mindset, as users exhibiting automation bias will not validate the results, given their trust in AI systems. Several factors impact a user’s susceptibility to automation bias, such as inexperience or being born in the digital age. In this editorial, I argue that these factors and a lack of AI education in the medical school curriculum cause automation bias. I also explore the harms of automation bias and why prospective physicians need to be vigilant when using AI. Furthermore, it is important to consider what attitudes are being taught to students when introducing ChatGPT, which could be some students’ first time using AI, prior to their use of AI in the clinical setting. Therefore, in attempts to avoid the problem of automation bias in the long-term, in addition to incorporating AI education into the curriculum, as is necessary, the use of ChatGPT in medical education should be limited to certain tasks. Otherwise, having no constraints on what ChatGPT should be used for could lead to automation bias. %M 38231545 %R 10.2196/50174 %U https://mededu.jmir.org/2024/1/e50174 %U https://doi.org/10.2196/50174 %U http://www.ncbi.nlm.nih.gov/pubmed/38231545 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 11 %N %P e47031 %T Trust in and Acceptance of Artificial Intelligence Applications in Medicine: Mixed Methods Study %A Shevtsova,Daria %A Ahmed,Anam %A Boot,Iris W A %A Sanges,Carmen %A Hudecek,Michael %A Jacobs,John J L %A Hort,Simon %A Vrijhoef,Hubertus J M %+ Panaxea bv, Pettelaarpark 84, Den Bosch, 5216 PP, Netherlands, 31 639421854, anam.ahmed@panaxea.eu %K trust %K acceptance %K artificial intelligence %K medicine %K mixed methods %K rapid review %K survey %D 2024 %7 17.1.2024 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Artificial intelligence (AI)–powered technologies are being increasingly used in almost all fields, including medicine. However, to successfully implement medical AI applications, ensuring trust and acceptance toward such technologies is crucial for their successful spread and timely adoption worldwide. Although AI applications in medicine provide advantages to the current health care system, there are also various associated challenges regarding, for instance, data privacy, accountability, and equity and fairness, which could hinder medical AI application implementation. Objective: The aim of this study was to identify factors related to trust in and acceptance of novel AI-powered medical technologies and to assess the relevance of those factors among relevant stakeholders. Methods: This study used a mixed methods design. First, a rapid review of the existing literature was conducted, aiming to identify various factors related to trust in and acceptance of novel AI applications in medicine. Next, an electronic survey including the rapid review–derived factors was disseminated among key stakeholder groups. Participants (N=22) were asked to assess on a 5-point Likert scale (1=irrelevant to 5=relevant) to what extent they thought the various factors (N=19) were relevant to trust in and acceptance of novel AI applications in medicine. Results: The rapid review (N=32 papers) yielded 110 factors related to trust and 77 factors related to acceptance toward AI technology in medicine. Closely related factors were assigned to 1 of the 19 overarching umbrella factors, which were further grouped into 4 categories: human-related (ie, the type of institution AI professionals originate from), technology-related (ie, the explainability and transparency of AI application processes and outcomes), ethical and legal (ie, data use transparency), and additional factors (ie, AI applications being environment friendly). The categorized 19 umbrella factors were presented as survey statements, which were evaluated by relevant stakeholders. Survey participants (N=22) represented researchers (n=18, 82%), technology providers (n=5, 23%), hospital staff (n=3, 14%), and policy makers (n=3, 14%). Of the 19 factors, 16 (84%) human-related, technology-related, ethical and legal, and additional factors were considered to be of high relevance to trust in and acceptance of novel AI applications in medicine. The patient’s gender, age, and education level were found to be of low relevance (3/19, 16%). Conclusions: The results of this study could help the implementers of medical AI applications to understand what drives trust and acceptance toward AI-powered technologies among key stakeholders in medicine. Consequently, this would allow the implementers to identify strategies that facilitate trust in and acceptance of medical AI applications among key stakeholders and potential users. %M 38231544 %R 10.2196/47031 %U https://humanfactors.jmir.org/2024/1/e47031 %U https://doi.org/10.2196/47031 %U http://www.ncbi.nlm.nih.gov/pubmed/38231544 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e53961 %T A Generative Pretrained Transformer (GPT)–Powered Chatbot as a Simulated Patient to Practice History Taking: Prospective, Mixed Methods Study %A Holderried,Friederike %A Stegemann–Philipps,Christian %A Herschbach,Lea %A Moldt,Julia-Astrid %A Nevins,Andrew %A Griewatz,Jan %A Holderried,Martin %A Herrmann-Werner,Anne %A Festl-Wietek,Teresa %A Mahling,Moritz %+ Tübingen Institute for Medical Education, Eberhard Karls University, Elfriede-Aulhorn-Str 10, Tübingen, 72076, Germany, 49 7071 2973715, friederike.holderried@med.uni-tuebingen.de %K simulated patient %K GPT %K generative pretrained transformer %K ChatGPT %K history taking %K medical education %K documentation %K history %K simulated %K simulation %K simulations %K NLP %K natural language processing %K artificial intelligence %K interactive %K chatbot %K chatbots %K conversational agent %K conversational agents %K answer %K answers %K response %K responses %K human computer %K human machine %K usability %K satisfaction %D 2024 %7 16.1.2024 %9 Original Paper %J JMIR Med Educ %G English %X Background: Communication is a core competency of medical professionals and of utmost importance for patient safety. Although medical curricula emphasize communication training, traditional formats, such as real or simulated patient interactions, can present psychological stress and are limited in repetition. The recent emergence of large language models (LLMs), such as generative pretrained transformer (GPT), offers an opportunity to overcome these restrictions Objective: The aim of this study was to explore the feasibility of a GPT-driven chatbot to practice history taking, one of the core competencies of communication. Methods: We developed an interactive chatbot interface using GPT-3.5 and a specific prompt including a chatbot-optimized illness script and a behavioral component. Following a mixed methods approach, we invited medical students to voluntarily practice history taking. To determine whether GPT provides suitable answers as a simulated patient, the conversations were recorded and analyzed using quantitative and qualitative approaches. We analyzed the extent to which the questions and answers aligned with the provided script, as well as the medical plausibility of the answers. Finally, the students filled out the Chatbot Usability Questionnaire (CUQ). Results: A total of 28 students practiced with our chatbot (mean age 23.4, SD 2.9 years). We recorded a total of 826 question-answer pairs (QAPs), with a median of 27.5 QAPs per conversation and 94.7% (n=782) pertaining to history taking. When questions were explicitly covered by the script (n=502, 60.3%), the GPT-provided answers were mostly based on explicit script information (n=471, 94.4%). For questions not covered by the script (n=195, 23.4%), the GPT answers used 56.4% (n=110) fictitious information. Regarding plausibility, 842 (97.9%) of 860 QAPs were rated as plausible. Of the 14 (2.1%) implausible answers, GPT provided answers rated as socially desirable, leaving role identity, ignoring script information, illogical reasoning, and calculation error. Despite these results, the CUQ revealed an overall positive user experience (77/100 points). Conclusions: Our data showed that LLMs, such as GPT, can provide a simulated patient experience and yield a good user experience and a majority of plausible answers. Our analysis revealed that GPT-provided answers use either explicit script information or are based on available information, which can be understood as abductive reasoning. Although rare, the GPT-based chatbot provides implausible information in some instances, with the major tendency being socially desirable instead of medically plausible information. %M 38227363 %R 10.2196/53961 %U https://mededu.jmir.org/2024/1/e53961 %U https://doi.org/10.2196/53961 %U http://www.ncbi.nlm.nih.gov/pubmed/38227363 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e51388 %T Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project %A Kuo,Nicholas I-Hsien %A Perez-Concha,Oscar %A Hanly,Mark %A Mnatzaganian,Emmanuel %A Hao,Brandon %A Di Sipio,Marcus %A Yu,Guolin %A Vanjara,Jash %A Valerie,Ivy Cerelia %A de Oliveira Costa,Juliana %A Churches,Timothy %A Lujic,Sanja %A Hegarty,Jo %A Jorm,Louisa %A Barbieri,Sebastiano %+ Centre for Big Data Research in Health, The University of New South Wales, Level 2, AGSM Building (G27), Botany St, Kensington NSW, Sydney, 2052, Australia, 61 0293850645, n.kuo@unsw.edu.au %K medical education %K generative model %K generative adversarial networks %K privacy %K antiretroviral therapy (ART) %K human immunodeficiency virus (HIV) %K data science %K educational purposes %K accessibility %K data privacy %K data sets %K sepsis %K hypotension %K HIV %K science education %K health care AI %D 2024 %7 16.1.2024 %9 Viewpoint %J JMIR Med Educ %G English %X Large-scale medical data sets are vital for hands-on education in health data science but are often inaccessible due to privacy concerns. Addressing this gap, we developed the Health Gym project, a free and open-source platform designed to generate synthetic health data sets applicable to various areas of data science education, including machine learning, data visualization, and traditional statistical models. Initially, we generated 3 synthetic data sets for sepsis, acute hypotension, and antiretroviral therapy for HIV infection. This paper discusses the educational applications of Health Gym’s synthetic data sets. We illustrate this through their use in postgraduate health data science courses delivered by the University of New South Wales, Australia, and a Datathon event, involving academics, students, clinicians, and local health district professionals. We also include adaptable worked examples using our synthetic data sets, designed to enrich hands-on tutorial and workshop experiences. Although we highlight the potential of these data sets in advancing data science education and health care artificial intelligence, we also emphasize the need for continued research into the inherent limitations of synthetic data. %M 38227356 %R 10.2196/51388 %U https://mededu.jmir.org/2024/1/e51388 %U https://doi.org/10.2196/51388 %U http://www.ncbi.nlm.nih.gov/pubmed/38227356 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e49970 %T A Novel Evaluation Model for Assessing ChatGPT on Otolaryngology–Head and Neck Surgery Certification Examinations: Performance Study %A Long,Cai %A Lowe,Kayle %A Zhang,Jessica %A Santos,André dos %A Alanazi,Alaa %A O'Brien,Daniel %A Wright,Erin D %A Cote,David %+ Division of Otolaryngology–Head and Neck Surgery, University of Alberta, 8440-112 Street, Edmonton, AB, T6G 2B7, Canada, 1 (780) 407 8822, cai.long.med@gmail.com %K medical licensing %K otolaryngology %K otology %K laryngology %K ear %K nose %K throat %K ENT %K surgery %K surgical %K exam %K exams %K response %K responses %K answer %K answers %K chatbot %K chatbots %K examination %K examinations %K medical education %K otolaryngology/head and neck surgery %K OHNS %K artificial intelligence %K AI %K ChatGPT %K medical examination %K large language models %K language model %K LLM %K LLMs %K wide range information %K patient safety %K clinical implementation %K safety %K machine learning %K NLP %K natural language processing %D 2024 %7 16.1.2024 %9 Original Paper %J JMIR Med Educ %G English %X Background: ChatGPT is among the most popular large language models (LLMs), exhibiting proficiency in various standardized tests, including multiple-choice medical board examinations. However, its performance on otolaryngology–head and neck surgery (OHNS) certification examinations and open-ended medical board certification examinations has not been reported. Objective: We aimed to evaluate the performance of ChatGPT on OHNS board examinations and propose a novel method to assess an AI model’s performance on open-ended medical board examination questions. Methods: Twenty-one open-ended questions were adopted from the Royal College of Physicians and Surgeons of Canada’s sample examination to query ChatGPT on April 11, 2023, with and without prompts. A new model, named Concordance, Validity, Safety, Competency (CVSC), was developed to evaluate its performance. Results: In an open-ended question assessment, ChatGPT achieved a passing mark (an average of 75% across 3 trials) in the attempts and demonstrated higher accuracy with prompts. The model demonstrated high concordance (92.06%) and satisfactory validity. While demonstrating considerable consistency in regenerating answers, it often provided only partially correct responses. Notably, concerning features such as hallucinations and self-conflicting answers were observed. Conclusions: ChatGPT achieved a passing score in the sample examination and demonstrated the potential to pass the OHNS certification examination of the Royal College of Physicians and Surgeons of Canada. Some concerns remain due to its hallucinations, which could pose risks to patient safety. Further adjustments are necessary to yield safer and more accurate answers for clinical implementation. %M 38227351 %R 10.2196/49970 %U https://mededu.jmir.org/2024/1/e49970 %U https://doi.org/10.2196/49970 %U http://www.ncbi.nlm.nih.gov/pubmed/38227351 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e51732 %T Enabling Personalization for Digital Cognitive Stimulation to Support Communication With People With Dementia: Pilot Intervention Study as a Prelude to AI Development %A Hird,Nick %A Osaki,Tohmi %A Ghosh,Samik %A Palaniappan,Sucheendra K %A Maeda,Kiyoshi %+ Aikomi Ltd Co, Yokohama Blue Avenue 12th Floor, 4-4-2 Minatomirai, Yokohama, Kanagawa, 220-0012, Japan, 81 70 4538 2854, nick.hird@aikomi.co.jp %K dementia %K digital technology %K communication %K engagement %K cognitive stimulation %K artificial intelligence %K AI %D 2024 %7 16.1.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Maintaining good communication and engagement between people with dementia and their caregivers is a major challenge in dementia care. Cognitive stimulation is a psychosocial intervention that supports communication and engagement, and several digital applications for cognitive stimulation have been developed. Personalization is an important factor for obtaining sustainable benefits, but the time and effort required to personalize and optimize applications often makes them difficult for routine use by nonspecialist caregivers and families. Although artificial intelligence (AI) has great potential to support automation of the personalization process, its use is largely unexplored because of the lack of suitable data from which to develop and train machine learning models. Objective: This pilot study aims to evaluate a digital application called Aikomi in Japanese care homes for its potential to (1) create and deliver personalized cognitive stimulation programs to promote communication and engagement between people with dementia and usual care staff and (2) capture meaningful personalized data suitable for the development of AI systems. Methods: A modular technology platform was developed and used to create personalized programs for 15 people with dementia living in 4 residential care facilities in Japan with the cooperation of a family member or care staff. A single intervention with the program was conducted with the person with dementia together with a care staff member, and for some participants, smell stimulation was provided using selected smell sticks in conjunction with the digital program. All sessions were recorded using a video camera, and the combined personalized data obtained by the platform were analyzed. Results: Most people with dementia (10/15, 67%) showed high levels of engagement (>40 on Engagement of a Person with Dementia Scale), and there were no incidences of negative reactions toward the programs. Care staff reported that some participants showed extended concentration and spontaneous communication while using Aikomi, which was not their usual behavior. Smell stimulation promoted engagement for some participants even when they were unable to identify the smell. No changes in well-being were observed following the intervention according to the Mental Function Impairment Scale. The level of response to each type of content in the stimulation program varied greatly according to the person with dementia, and personalized data captured by the Aikomi platform enabled understanding of correlations between stimulation content and responses for each participant. Conclusions: This study suggests that the Aikomi digital application is acceptable for use by persons with dementia and care staff and may have the potential to promote communication and engagement. The platform captures personalized data, which can provide suitable input for machine learning. Further investigation of Aikomi will be conducted to develop AI systems and create personalized digital cognitive stimulation applications that can be easily used by nonspecialist caregivers. %M 38227357 %R 10.2196/51732 %U https://formative.jmir.org/2024/1/e51732 %U https://doi.org/10.2196/51732 %U http://www.ncbi.nlm.nih.gov/pubmed/38227357 %0 Journal Article %@ 2291-9279 %I JMIR Publications %V 12 %N %P e48258 %T The Role of AI in Serious Games and Gamification for Health: Scoping Review %A Tolks,Daniel %A Schmidt,Johannes Jeremy %A Kuhn,Sebastian %+ Institute for Digital Medicine, University Clinic of Gießen und Marburg, Philipps University Marburg, Baldingerstraße, Marburg, 35042, Germany, 49 15120053577, sebastian.kuhn@uni-marburg.de %K artificial intelligence %K AI %K games %K serious games %K gamification %K health care %K review %D 2024 %7 15.1.2024 %9 Review %J JMIR Serious Games %G English %X Background: Artificial intelligence (AI) and game-based methods such as serious games or gamification are both emerging technologies and methodologies in health care. The merging of the two could provide greater advantages, particularly in the field of therapeutic interventions in medicine. Objective: This scoping review sought to generate an overview of the currently existing literature on the connection of AI and game-based approaches in health care. The primary objectives were to cluster studies by disease and health topic addressed, level of care, and AI or games technology. Methods: For this scoping review, the databases PubMed, Scopus, IEEE Xplore, Cochrane Library, and PubPsych were comprehensively searched on February 2, 2022. Two independent authors conducted the screening process using Rayyan software (Rayyan Systems Inc). Only original studies published in English since 1992 were eligible for inclusion. The studies had to involve aspects of therapy or education in medicine and the use of AI in combination with game-based approaches. Each publication was coded for basic characteristics, including the population, intervention, comparison, and outcomes (PICO) criteria; the level of evidence; the disease and health issue; the level of care; the game variant; the AI technology; and the function type. Inductive coding was used to identify the patterns, themes, and categories in the data. Individual codings were analyzed and summarized narratively. Results: A total of 16 papers met all inclusion criteria. Most of the studies (10/16, 63%) were conducted in disease rehabilitation, tackling motion impairment (eg, after stroke or trauma). Another cluster of studies (3/16, 19%) was found in the detection and rehabilitation of cognitive impairment. Machine learning was the main AI technology applied and serious games the main game-based approach used. However, direct interaction between the technologies occurred only in 3 (19%) of the 16 studies. The included studies all show very limited quality evidence. From the patients’ and healthy individuals’ perspective, generally high usability, motivation, and satisfaction were found. Conclusions: The review shows limited quality of evidence for the combination of AI and games in health care. Most of the included studies were nonrandomized pilot studies with few participants (14/16, 88%). This leads to a high risk for a range of biases and limits overall conclusions. However, the first results present a broad scope of possible applications, especially in motion and cognitive impairment, as well as positive perceptions by patients. In future, the development of adaptive game designs with direct interaction between AI and games seems promising and should be a topic for future reviews. %M 38224472 %R 10.2196/48258 %U https://games.jmir.org/2024/1/e48258 %U https://doi.org/10.2196/48258 %U http://www.ncbi.nlm.nih.gov/pubmed/38224472 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e51240 %T Identifying Symptoms Prior to Pancreatic Ductal Adenocarcinoma Diagnosis in Real-World Care Settings: Natural Language Processing Approach %A Xie,Fagen %A Chang,Jenny %A Luong,Tiffany %A Wu,Bechien %A Lustigova,Eva %A Shrader,Eva %A Chen,Wansu %+ Department of Research and Evaluation, Kaiser Permanente Southern California, 100 S Los Robles Avenue, Pasadena, CA, 91101, United States, 1 6265643294, fagen.xie@kp.org %K cancer %K pancreatic ductal adenocarcinoma %K symptom %K clinical note %K electronic health record %K natural language processing %K computerized algorithm %K pancreatic cancer %K cancer death %K abdominal pain %K pain %K validation %K detection %K pancreas %D 2024 %7 15.1.2024 %9 Original Paper %J JMIR AI %G English %X Background: Pancreatic cancer is the third leading cause of cancer deaths in the United States. Pancreatic ductal adenocarcinoma (PDAC) is the most common form of pancreatic cancer, accounting for up to 90% of all cases. Patient-reported symptoms are often the triggers of cancer diagnosis and therefore, understanding the PDAC-associated symptoms and the timing of symptom onset could facilitate early detection of PDAC. Objective: This paper aims to develop a natural language processing (NLP) algorithm to capture symptoms associated with PDAC from clinical notes within a large integrated health care system. Methods: We used unstructured data within 2 years prior to PDAC diagnosis between 2010 and 2019 and among matched patients without PDAC to identify 17 PDAC-related symptoms. Related terms and phrases were first compiled from publicly available resources and then recursively reviewed and enriched with input from clinicians and chart review. A computerized NLP algorithm was iteratively developed and fine-trained via multiple rounds of chart review followed by adjudication. Finally, the developed algorithm was applied to the validation data set to assess performance and to the study implementation notes. Results: A total of 408,147 and 709,789 notes were retrieved from 2611 patients with PDAC and 10,085 matched patients without PDAC, respectively. In descending order, the symptom distribution of the study implementation notes ranged from 4.98% for abdominal or epigastric pain to 0.05% for upper extremity deep vein thrombosis in the PDAC group, and from 1.75% for back pain to 0.01% for pale stool in the non-PDAC group. Validation of the NLP algorithm against adjudicated chart review results of 1000 notes showed that precision ranged from 98.9% (jaundice) to 84% (upper extremity deep vein thrombosis), recall ranged from 98.1% (weight loss) to 82.8% (epigastric bloating), and F1-scores ranged from 0.97 (jaundice) to 0.86 (depression). Conclusions: The developed and validated NLP algorithm could be used for the early detection of PDAC. %M 38875566 %R 10.2196/51240 %U https://ai.jmir.org/2024/1/e51240 %U https://doi.org/10.2196/51240 %U http://www.ncbi.nlm.nih.gov/pubmed/38875566 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e45391 %T Clinical Needs Assessment of a Machine Learning–Based Asthma Management Tool: User-Centered Design Approach %A Zheng,Lu %A Ohde,Joshua W %A Overgaard,Shauna M %A Brereton,Tracey A %A Jose,Kristelle %A Wi,Chung-Il %A Peterson,Kevin J %A Juhn,Young J %+ Center for Digital Health, Mayo Clinic, 200 1st Street South West, Rochester, MN, United States, 1 480 758 0664, zheng.lu@mayo.edu %K asthma %K formative research %K user-centered design %K machine learning (ML) %K artificial intelligence (AI) %K qualitative %K user needs. %D 2024 %7 15.1.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Personalized asthma management depends on a clinician’s ability to efficiently review patient’s data and make timely clinical decisions. Unfortunately, efficient and effective review of these data is impeded by the varied format, location, and workflow of data acquisition, storage, and processing in the electronic health record. While machine learning (ML) and clinical decision support tools are well-positioned as potential solutions, the translation of such frameworks requires that barriers to implementation be addressed in the formative research stages. Objective: We aimed to use a structured user-centered design approach (double-diamond design framework) to (1) qualitatively explore clinicians’ experience with the current asthma management system, (2) identify user requirements to improve algorithm explainability and Asthma Guidance and Prediction System prototype, and (3) identify potential barriers to ML-based clinical decision support system use. Methods: At the “discovery” phase, we first shadowed to understand the practice context. Then, semistructured interviews were conducted digitally with 14 clinicians who encountered pediatric asthma patients at 2 outpatient facilities. Participants were asked about their current difficulties in gathering information for patients with pediatric asthma, their expectations of ideal workflows and tools, and suggestions on user-centered interfaces and features. At the “define” phase, a synthesis analysis was conducted to converge key results from interviewees’ insights into themes, eventually forming critical “how might we” research questions to guide model development and implementation. Results: We identified user requirements and potential barriers associated with three overarching themes: (1) usability and workflow aspects of the ML system, (2) user expectations and algorithm explainability, and (3) barriers to implementation in context. Even though the responsibilities and workflows vary among different roles, the core asthma-related information and functions they requested were highly cohesive, which allows for a shared information view of the tool. Clinicians hope to perceive the usability of the model with the ability to note patients’ high risks and take proactive actions to manage asthma efficiently and effectively. For optimal ML algorithm explainability, requirements included documentation to support the validity of algorithm development and output logic, and a request for increased transparency to build trust and validate how the algorithm arrived at the decision. Acceptability, adoption, and sustainability of the asthma management tool are implementation outcomes that are reliant on the proper design and training as suggested by participants. Conclusions: As part of our comprehensive informatics-based process centered on clinical usability, we approach the problem using a theoretical framework grounded in user experience research leveraging semistructured interviews. Our focus on meeting the needs of the practice with ML technology is emphasized by a user-centered approach to clinician engagement through upstream technology design. %M 38224482 %R 10.2196/45391 %U https://formative.jmir.org/2024/1/e45391 %U https://doi.org/10.2196/45391 %U http://www.ncbi.nlm.nih.gov/pubmed/38224482 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e47339 %T The Use of ChatGPT for Education Modules on Integrated Pharmacotherapy of Infectious Disease: Educators' Perspectives %A Al-Worafi,Yaser Mohammed %A Goh,Khang Wen %A Hermansyah,Andi %A Tan,Ching Siang %A Ming,Long Chiau %+ School of Pharmacy, KPJ Healthcare University, Lot PT 17010 Persiaran Seriemas, Kota Seriemas, Nilai, 71800, Malaysia, 60 67942692, tcsiang@kpju.edu.my %K innovation and technology %K quality education %K sustainable communities %K innovation and infrastructure %K partnerships for the goals %K sustainable education %K social justice %K ChatGPT %K artificial intelligence %K feasibility %D 2024 %7 12.1.2024 %9 Original Paper %J JMIR Med Educ %G English %X Background: Artificial Intelligence (AI) plays an important role in many fields, including medical education, practice, and research. Many medical educators started using ChatGPT at the end of 2022 for many purposes. Objective: The aim of this study was to explore the potential uses, benefits, and risks of using ChatGPT in education modules on integrated pharmacotherapy of infectious disease. Methods: A content analysis was conducted to investigate the applications of ChatGPT in education modules on integrated pharmacotherapy of infectious disease. Questions pertaining to curriculum development, syllabus design, lecture note preparation, and examination construction were posed during data collection. Three experienced professors rated the appropriateness and precision of the answers provided by ChatGPT. The consensus rating was considered. The professors also discussed the prospective applications, benefits, and risks of ChatGPT in this educational setting. Results: ChatGPT demonstrated the ability to contribute to various aspects of curriculum design, with ratings ranging from 50% to 92% for appropriateness and accuracy. However, there were limitations and risks associated with its use, including incomplete syllabi, the absence of essential learning objectives, and the inability to design valid questionnaires and qualitative studies. It was suggested that educators use ChatGPT as a resource rather than relying primarily on its output. There are recommendations for effectively incorporating ChatGPT into the curriculum of the education modules on integrated pharmacotherapy of infectious disease. Conclusions: Medical and health sciences educators can use ChatGPT as a guide in many aspects related to the development of the curriculum of the education modules on integrated pharmacotherapy of infectious disease, syllabus design, lecture notes preparation, and examination preparation with caution. %M 38214967 %R 10.2196/47339 %U https://mededu.jmir.org/2024/1/e47339 %U https://doi.org/10.2196/47339 %U http://www.ncbi.nlm.nih.gov/pubmed/38214967 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e48273 %T Enhancing Health Equity by Predicting Missed Appointments in Health Care: Machine Learning Study %A Yang,Yi %A Madanian,Samaneh %A Parry,David %+ Auckland University of Technology, 6 St Paul Street, AUT WZ Building, Auckland, 1010, New Zealand, 64 99219999 ext 6539, sam.madanian@aut.ac.nz %K Did Not Show %K Did Not Attend %K machine learning %K prediction %K decision support system %K health care operation %K data analytics %K patients no-show %K predictive modeling %K appointment nonadherence %K health equity %D 2024 %7 12.1.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: The phenomenon of patients missing booked appointments without canceling them—known as Did Not Show (DNS), Did Not Attend (DNA), or Failed To Attend (FTA)—has a detrimental effect on patients’ health and results in massive health care resource wastage. Objective: Our objective was to develop machine learning (ML) models and evaluate their performance in predicting the likelihood of DNS for hospital outpatient appointments at the MidCentral District Health Board (MDHB) in New Zealand. Methods: We sourced 5 years of MDHB outpatient records (a total of 1,080,566 outpatient visits) to build the ML prediction models. We developed 3 ML models using logistic regression, random forest, and Extreme Gradient Boosting (XGBoost). Subsequently, 10-fold cross-validation and hyperparameter tuning were deployed to minimize model bias and boost the algorithms’ prediction strength. All models were evaluated against accuracy, sensitivity, specificity, and area under the receiver operating characteristic (AUROC) curve metrics. Results: Based on 5 years of MDHB data, the best prediction classifier was XGBoost, with an area under the curve (AUC) of 0.92, sensitivity of 0.83, and specificity of 0.85. The patients’ DNS history, age, ethnicity, and appointment lead time significantly contributed to DNS prediction. An ML system trained on a large data set can produce useful levels of DNS prediction. Conclusions: This research is one of the very first published studies that use ML technologies to assist with DNS management in New Zealand. It is a proof of concept and could be used to benchmark DNS predictions for the MDHB and other district health boards. We encourage conducting additional qualitative research to investigate the root cause of DNS issues and potential solutions. Addressing DNS using better strategies potentially can result in better utilization of health care resources and improve health equity. %M 38214974 %R 10.2196/48273 %U https://medinform.jmir.org/2024/1/e48273 %U https://doi.org/10.2196/48273 %U http://www.ncbi.nlm.nih.gov/pubmed/38214974 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e48996 %T Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study %A Guo,Eddie %A Gupta,Mehul %A Deng,Jiawen %A Park,Ye-Jean %A Paget,Michael %A Naugler,Christopher %+ Cumming School of Medicine, University of Calgary, 3330 University Dr NW, Calgary, AB, T2N 1N4, Canada, 1 5879880292, eddie.guo@ucalgary.ca %K abstract screening %K Chat GPT %K classification %K extract %K extraction %K free text %K GPT %K GPT-4 %K language model %K large language models %K LLM %K natural language processing %K NLP %K nonopiod analgesia %K review methodology %K review methods %K screening %K systematic review %K systematic %K unstructured data %D 2024 %7 12.1.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: The systematic review of clinical research papers is a labor-intensive and time-consuming process that often involves the screening of thousands of titles and abstracts. The accuracy and efficiency of this process are critical for the quality of the review and subsequent health care decisions. Traditional methods rely heavily on human reviewers, often requiring a significant investment of time and resources. Objective: This study aims to assess the performance of the OpenAI generative pretrained transformer (GPT) and GPT-4 application programming interfaces (APIs) in accurately and efficiently identifying relevant titles and abstracts from real-world clinical review data sets and comparing their performance against ground truth labeling by 2 independent human reviewers. Methods: We introduce a novel workflow using the Chat GPT and GPT-4 APIs for screening titles and abstracts in clinical reviews. A Python script was created to make calls to the API with the screening criteria in natural language and a corpus of title and abstract data sets filtered by a minimum of 2 human reviewers. We compared the performance of our model against human-reviewed papers across 6 review papers, screening over 24,000 titles and abstracts. Results: Our results show an accuracy of 0.91, a macro F1-score of 0.60, a sensitivity of excluded papers of 0.91, and a sensitivity of included papers of 0.76. The interrater variability between 2 independent human screeners was κ=0.46, and the prevalence and bias-adjusted κ between our proposed methods and the consensus-based human decisions was κ=0.96. On a randomly selected subset of papers, the GPT models demonstrated the ability to provide reasoning for their decisions and corrected their initial decisions upon being asked to explain their reasoning for incorrect classifications. Conclusions: Large language models have the potential to streamline the clinical review process, save valuable time and effort for researchers, and contribute to the overall quality of clinical reviews. By prioritizing the workflow and acting as an aid rather than a replacement for researchers and reviewers, models such as GPT-4 can enhance efficiency and lead to more accurate and reliable conclusions in medical research. %M 38214966 %R 10.2196/48996 %U https://www.jmir.org/2024/1/e48996 %U https://doi.org/10.2196/48996 %U http://www.ncbi.nlm.nih.gov/pubmed/38214966 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e46402 %T Acceptance of Medical Artificial Intelligence in Skin Cancer Screening: Choice-Based Conjoint Survey %A Jagemann,Inga %A Wensing,Ole %A Stegemann,Manuel %A Hirschfeld,Gerrit %+ School of Business, University of Applied Sciences and Arts Bielefeld, Interaktion 1, Bielefeld, 33619, Germany, 49 521106 ext 70508, inga.jagemann@hsbi.de %K artificial intelligence %K skin cancer screening %K choice experiment %K melanoma %K conjoint analysis, technology acceptance %K adoption %K technology use %K dermatology %K skin cancer %K oncology %K screening %K choice based %K trust %D 2024 %7 12.1.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: There is great interest in using artificial intelligence (AI) to screen for skin cancer. This is fueled by a rising incidence of skin cancer and an increasing scarcity of trained dermatologists. AI systems capable of identifying melanoma could save lives, enable immediate access to screenings, and reduce unnecessary care and health care costs. While such AI-based systems are useful from a public health perspective, past research has shown that individual patients are very hesitant about being examined by an AI system. Objective: The aim of this study was two-fold: (1) to determine the relative importance of the provider (in-person physician, physician via teledermatology, AI, personalized AI), costs of screening (free, 10€, 25€, 40€; 1€=US $1.09), and waiting time (immediate, 1 day, 1 week, 4 weeks) as attributes contributing to patients’ choices of a particular mode of skin cancer screening; and (2) to investigate whether sociodemographic characteristics, especially age, were systematically related to participants’ individual choices. Methods: A choice-based conjoint analysis was used to examine the acceptance of medical AI for a skin cancer screening from the patient’s perspective. Participants responded to 12 choice sets, each containing three screening variants, where each variant was described through the attributes of provider, costs, and waiting time. Furthermore, the impacts of sociodemographic characteristics (age, gender, income, job status, and educational background) on the choices were assessed. Results: Among the 383 clicks on the survey link, a total of 126 (32.9%) respondents completed the online survey. The conjoint analysis showed that the three attributes had more or less equal importance in contributing to the participants’ choices, with provider being the most important attribute. Inspecting the individual part-worths of conjoint attributes showed that treatment by a physician was the most preferred modality, followed by electronic consultation with a physician and personalized AI; the lowest scores were found for the three AI levels. Concerning the relationship between sociodemographic characteristics and relative importance, only age showed a significant positive association to the importance of the attribute provider (r=0.21, P=.02), in which younger participants put less importance on the provider than older participants. All other correlations were not significant. Conclusions: This study adds to the growing body of research using choice-based experiments to investigate the acceptance of AI in health contexts. Future studies are needed to explore the reasons why AI is accepted or rejected and whether sociodemographic characteristics are associated with this decision. %M 38214959 %R 10.2196/46402 %U https://formative.jmir.org/2024/1/e46402 %U https://doi.org/10.2196/46402 %U http://www.ncbi.nlm.nih.gov/pubmed/38214959 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52134 %T Development and Validation of a Robust and Interpretable Early Triaging Support System for Patients Hospitalized With COVID-19: Predictive Algorithm Modeling and Interpretation Study %A Baek,Sangwon %A Jeong,Yeon joo %A Kim,Yun-Hyeon %A Kim,Jin Young %A Kim,Jin Hwan %A Kim,Eun Young %A Lim,Jae-Kwang %A Kim,Jungok %A Kim,Zero %A Kim,Kyunga %A Chung,Myung Jin %+ Biomedical Statistics Center, Research Institute for Future Medicine, Samsung Medical Center, 81 Irwon-ro, Gangnam-gu, Seoul, 06351, Republic of Korea, 82 2 3410 6745, kyunga.j.kim@gmail.com %K COVID-19 %K prognosis %K prognostic %K prognostics %K prediction model %K early triaging %K interpretability %K machine learning %K predict %K prediction %K predictive %K triage %K triaging %K emergency %K severity %K biomarker %K biomarkers %K SHAP %K Shapley %K clustering %K hospital admission %K hospital admissions %K hospitalize %K hospitalization %K hospitalizations %K neural network %K neural networks %K deep learning %K Omicron %K SARS-CoV-2 %K coronavirus %D 2024 %7 11.1.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Robust and accurate prediction of severity for patients with COVID-19 is crucial for patient triaging decisions. Many proposed models were prone to either high bias risk or low-to-moderate discrimination. Some also suffered from a lack of clinical interpretability and were developed based on early pandemic period data. Hence, there has been a compelling need for advancements in prediction models for better clinical applicability. Objective: The primary objective of this study was to develop and validate a machine learning–based Robust and Interpretable Early Triaging Support (RIETS) system that predicts severity progression (involving any of the following events: intensive care unit admission, in-hospital death, mechanical ventilation required, or extracorporeal membrane oxygenation required) within 15 days upon hospitalization based on routinely available clinical and laboratory biomarkers. Methods: We included data from 5945 hospitalized patients with COVID-19 from 19 hospitals in South Korea collected between January 2020 and August 2022. For model development and external validation, the whole data set was partitioned into 2 independent cohorts by stratified random cluster sampling according to hospital type (general and tertiary care) and geographical location (metropolitan and nonmetropolitan). Machine learning models were trained and internally validated through a cross-validation technique on the development cohort. They were externally validated using a bootstrapped sampling technique on the external validation cohort. The best-performing model was selected primarily based on the area under the receiver operating characteristic curve (AUROC), and its robustness was evaluated using bias risk assessment. For model interpretability, we used Shapley and patient clustering methods. Results: Our final model, RIETS, was developed based on a deep neural network of 11 clinical and laboratory biomarkers that are readily available within the first day of hospitalization. The features predictive of severity included lactate dehydrogenase, age, absolute lymphocyte count, dyspnea, respiratory rate, diabetes mellitus, c-reactive protein, absolute neutrophil count, platelet count, white blood cell count, and saturation of peripheral oxygen. RIETS demonstrated excellent discrimination (AUROC=0.937; 95% CI 0.935-0.938) with high calibration (integrated calibration index=0.041), satisfied all the criteria of low bias risk in a risk assessment tool, and provided detailed interpretations of model parameters and patient clusters. In addition, RIETS showed potential for transportability across variant periods with its sustainable prediction on Omicron cases (AUROC=0.903, 95% CI 0.897-0.910). Conclusions: RIETS was developed and validated to assist early triaging by promptly predicting the severity of hospitalized patients with COVID-19. Its high performance with low bias risk ensures considerably reliable prediction. The use of a nationwide multicenter cohort in the model development and validation implicates generalizability. The use of routinely collected features may enable wide adaptability. Interpretations of model parameters and patients can promote clinical applicability. Together, we anticipate that RIETS will facilitate the patient triaging workflow and efficient resource allocation when incorporated into a routine clinical practice. %M 38206673 %R 10.2196/52134 %U https://www.jmir.org/2024/1/e52134 %U https://doi.org/10.2196/52134 %U http://www.ncbi.nlm.nih.gov/pubmed/38206673 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 11 %N %P e50738 %T Identification of Predictors of Mood Disorder Misdiagnosis and Subsequent Help-Seeking Behavior in Individuals With Depressive Symptoms: Gradient-Boosted Tree Machine Learning Approach %A Benacek,Jiri %A Lawal,Nimotalai %A Ong,Tommy %A Tomasik,Jakub %A Martin-Key,Nayra A %A Funnell,Erin L %A Barton-Owen,Giles %A Olmert,Tony %A Cowell,Dan %A Bahn,Sabine %+ Department of Chemical Engineering and Biotechnology, Cambridge Centre for Neuropsychiatric Research, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, United Kingdom, 44 1223334151, sb209@cam.ac.uk %K misdiagnosis %K help-seeking %K gradient-boosted trees %K machine learning %K depression %K bipolar disorder %K diagnose %K diagnosis %K mood %K mental health %K mental disorder %K mental disorders %K depression %K depressive %K predict %K predictive %K prediction %K depressed %K algorithm %K algorithms %D 2024 %7 11.1.2024 %9 Original Paper %J JMIR Ment Health %G English %X Background: Misdiagnosis and delayed help-seeking cause significant burden for individuals with mood disorders such as major depressive disorder and bipolar disorder. Misdiagnosis can lead to inappropriate treatment, while delayed help-seeking can result in more severe symptoms, functional impairment, and poor treatment response. Such challenges are common in individuals with major depressive disorder and bipolar disorder due to the overlap of symptoms with other mental and physical health conditions, as well as, stigma and insufficient understanding of these disorders. Objective: In this study, we aimed to identify factors that may contribute to mood disorder misdiagnosis and delayed help-seeking. Methods: Participants with current depressive symptoms were recruited online and data were collected using an extensive digital mental health questionnaire, with the World Health Organization World Mental Health Composite International Diagnostic Interview delivered via telephone. A series of predictive gradient-boosted tree algorithms were trained and validated to identify the most important predictors of misdiagnosis and subsequent help-seeking in misdiagnosed individuals. Results: The analysis included data from 924 symptomatic individuals for predicting misdiagnosis and from a subset of 379 misdiagnosed participants who provided follow-up information when predicting help-seeking. Models achieved good predictive power, with area under the receiver operating characteristic curve of 0.75 and 0.71 for misdiagnosis and help-seeking, respectively. The most predictive features with respect to misdiagnosis were high severity of depressed mood, instability of self-image, the involvement of a psychiatrist in diagnosing depression, higher age at depression diagnosis, and reckless spending. Regarding help-seeking behavior, the strongest predictors included shorter time elapsed since last speaking to a general practitioner about mental health, sleep problems disrupting daily tasks, taking antidepressant medication, and being diagnosed with depression at younger ages. Conclusions: This study provides a novel, machine learning–based approach to understand the interplay of factors that may contribute to the misdiagnosis and subsequent help-seeking in patients experiencing low mood. The present findings can inform the development of targeted interventions to improve early detection and appropriate treatment of individuals with mood disorders. %M 38206660 %R 10.2196/50738 %U https://mental.jmir.org/2024/1/e50738 %U https://doi.org/10.2196/50738 %U http://www.ncbi.nlm.nih.gov/pubmed/38206660 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 11 %N %P e49331 %T A Closed-Loop Falls Monitoring and Prevention App for Multiple Sclerosis Clinical Practice: Human-Centered Design of the Multiple Sclerosis Falls InsightTrack %A Block,Valerie J %A Koshal,Kanishka %A Wijangco,Jaeleene %A Miller,Nicolette %A Sara,Narender %A Henderson,Kyra %A Reihm,Jennifer %A Gopal,Arpita %A Mohan,Sonam D %A Gelfand,Jeffrey M %A Guo,Chu-Yueh %A Oommen,Lauren %A Nylander,Alyssa %A Rowson,James A %A Brown,Ethan %A Sanders,Stephen %A Rankin,Katherine %A Lyles,Courtney R %A Sim,Ida %A Bove,Riley %+ Department of Neurology, University of California San Francisco Weill Institute, University of California San Francisco, Box 3126 1651 4th St, Room 612A, San Francisco, CA, 94143, United States, 1 (415) 353 2069, riley.bove@ucsf.edu %K digital health %K mobile tools %K falls %K prevention %K behavioral medicine %K implementation science %K closed-loop monitoring %K multiple sclerosis %K mobile phone %D 2024 %7 11.1.2024 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Falls are common in people with multiple sclerosis (MS), causing injuries, fear of falling, and loss of independence. Although targeted interventions (physical therapy) can help, patients underreport and clinicians undertreat this issue. Patient-generated data, combined with clinical data, can support the prediction of falls and lead to timely intervention (including referral to specialized physical therapy). To be actionable, such data must be efficiently delivered to clinicians, with care customized to the patient’s specific context. Objective: This study aims to describe the iterative process of the design and development of Multiple Sclerosis Falls InsightTrack (MS-FIT), identifying the clinical and technological features of this closed-loop app designed to support streamlined falls reporting, timely falls evaluation, and comprehensive and sustained falls prevention efforts. Methods: Stakeholders were engaged in a double diamond process of human-centered design to ensure that technological features aligned with users’ needs. Patient and clinician interviews were designed to elicit insight around ability blockers and boosters using the capability, opportunity, motivation, and behavior (COM-B) framework to facilitate subsequent mapping to the Behavior Change Wheel. To support generalizability, patients and experts from other clinical conditions associated with falls (geriatrics, orthopedics, and Parkinson disease) were also engaged. Designs were iterated based on each round of feedback, and final mock-ups were tested during routine clinical visits. Results: A sample of 30 patients and 14 clinicians provided at least 1 round of feedback. To support falls reporting, patients favored a simple biweekly survey built using REDCap (Research Electronic Data Capture; Vanderbilt University) to support bring-your-own-device accessibility—with optional additional context (the severity and location of falls). To support the evaluation and prevention of falls, clinicians favored a clinical dashboard featuring several key visualization widgets: a longitudinal falls display coded by the time of data capture, severity, and context; a comprehensive, multidisciplinary, and evidence-based checklist of actions intended to evaluate and prevent falls; and MS resources local to a patient’s community. In-basket messaging alerts clinicians of severe falls. The tool scored highly for usability, likability, usefulness, and perceived effectiveness (based on the Health IT Usability Evaluation Model scoring). Conclusions: To our knowledge, this is the first falls app designed using human-centered design to prioritize behavior change and, while being accessible at home for patients, to deliver actionable data to clinicians at the point of care. MS-FIT streamlines data delivery to clinicians via an electronic health record–embedded window, aligning with the 5 rights approach. Leveraging MS-FIT for data processing and algorithms minimizes clinician load while boosting care quality. Our innovation seamlessly integrates real-world patient-generated data as well as clinical and community-level factors, empowering self-care and addressing the impact of falls in people with MS. Preliminary findings indicate wider relevance, extending to other neurological conditions associated with falls and their consequences. %M 38206662 %R 10.2196/49331 %U https://humanfactors.jmir.org/2024/1/e49331 %U https://doi.org/10.2196/49331 %U http://www.ncbi.nlm.nih.gov/pubmed/38206662 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e47134 %T Embodied Conversational Agents for Chronic Diseases: Scoping Review %A Jiang,Zhili %A Huang,Xiting %A Wang,Zhiqian %A Liu,Yang %A Huang,Lihua %A Luo,Xiaolin %+ Department of Nursing, The First Affiliated Hospital, Zhejiang University School of Medicine, Building 17, 3rd Floor, 79 Qingchun Road, Hangzhou, 310003, China, 86 13867129329, lihuahuang818@zju.edu.cn %K embodied conversational agent %K ECA %K chronic diseases %K eHealth %K health care %K mobile phone %D 2024 %7 9.1.2024 %9 Review %J J Med Internet Res %G English %X Background: Embodied conversational agents (ECAs) are computer-generated animated humanlike characters that interact with users through verbal and nonverbal behavioral cues. They are increasingly used in a range of fields, including health care. Objective: This scoping review aims to identify the current practice in the development and evaluation of ECAs for chronic diseases. Methods: We applied a methodological framework in this review. A total of 6 databases (ie, PubMed, Embase, CINAHL, ACM Digital Library, IEEE Xplore Digital Library, and Web of Science) were searched using a combination of terms related to ECAs and health in October 2023. Two independent reviewers selected the studies and extracted the data. This review followed the PRISMA-ScR (Preferred Reporting Items of Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) statement. Results: The literature search found 6332 papers, of which 36 (0.57%) met the inclusion criteria. Among the 36 studies, 27 (75%) originated from the United States, and 28 (78%) were published from 2020 onward. The reported ECAs covered a wide range of chronic diseases, with a focus on cancers, atrial fibrillation, and type 2 diabetes, primarily to promote screening and self-management. Most ECAs were depicted as middle-aged women based on screenshots and communicated with users through voice and nonverbal behavior. The most frequently reported evaluation outcomes were acceptability and effectiveness. Conclusions: This scoping review provides valuable insights for technology developers and health care professionals regarding the development and implementation of ECAs. It emphasizes the importance of technological advances in the embodiment, personalized strategy, and communication modality and requires in-depth knowledge of user preferences regarding appearance, animation, and intervention content. Future studies should incorporate measures of cost, efficiency, and productivity to provide a comprehensive evaluation of the benefits of using ECAs in health care. %M 38194260 %R 10.2196/47134 %U https://www.jmir.org/2024/1/e47134 %U https://doi.org/10.2196/47134 %U http://www.ncbi.nlm.nih.gov/pubmed/38194260 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e43112 %T General Characteristics and Design Taxonomy of Chatbots for COVID-19: Systematic Review %A Lim,Wendell Adrian %A Custodio,Razel %A Sunga,Monica %A Amoranto,Abegail Jayne %A Sarmiento,Raymond Francis %+ National Telehealth Center, National Institutes of Health, University of the Philippines Manila, 670 Padre Faura Street, Ermita, Manila, 1000, Philippines, 63 9269819254, wolim@up.edu.ph %K COVID-19 %K health chatbot %K conversational agent in health care %K artificial intelligence %K systematic review %K mobile phone %D 2024 %7 5.1.2024 %9 Review %J J Med Internet Res %G English %X Background: A conversational agent powered by artificial intelligence, commonly known as a chatbot, is one of the most recent innovations used to provide information and services during the COVID-19 pandemic. However, the multitude of conversational agents explicitly designed during the COVID-19 pandemic calls for characterization and analysis using rigorous technological frameworks and extensive systematic reviews. Objective: This study aims to describe the general characteristics of COVID-19 chatbots and examine their system designs using a modified adapted design taxonomy framework. Methods: We conducted a systematic review of the general characteristics and design taxonomy of COVID-19 chatbots, with 56 studies included in the final analysis. This review followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines to select papers published between March 2020 and April 2022 from various databases and search engines. Results: Results showed that most studies on COVID-19 chatbot design and development worldwide are implemented in Asia and Europe. Most chatbots are also accessible on websites, internet messaging apps, and Android devices. The COVID-19 chatbots are further classified according to their temporal profiles, appearance, intelligence, interaction, and context for system design trends. From the temporal profile perspective, almost half of the COVID-19 chatbots interact with users for several weeks for >1 time and can remember information from previous user interactions. From the appearance perspective, most COVID-19 chatbots assume the expert role, are task oriented, and have no visual or avatar representation. From the intelligence perspective, almost half of the COVID-19 chatbots are artificially intelligent and can respond to textual inputs and a set of rules. In addition, more than half of these chatbots operate on a structured flow and do not portray any socioemotional behavior. Most chatbots can also process external data and broadcast resources. Regarding their interaction with users, most COVID-19 chatbots are adaptive, can communicate through text, can react to user input, are not gamified, and do not require additional human support. From the context perspective, all COVID-19 chatbots are goal oriented, although most fall under the health care application domain and are designed to provide information to the user. Conclusions: The conceptualization, development, implementation, and use of COVID-19 chatbots emerged to mitigate the effects of a global pandemic in societies worldwide. This study summarized the current system design trends of COVID-19 chatbots based on 5 design perspectives, which may help developers conveniently choose a future-proof chatbot archetype that will meet the needs of the public in the face of growing demand for a better pandemic response. %M 38064638 %R 10.2196/43112 %U https://www.jmir.org/2024/1/e43112 %U https://doi.org/10.2196/43112 %U http://www.ncbi.nlm.nih.gov/pubmed/38064638 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e51247 %T Artificial Intelligence in Medicine: Cross-Sectional Study Among Medical Students on Application, Education, and Ethical Aspects %A Weidener,Lukas %A Fischer,Michael %+ Research Unit for Quality and Ethics in Health Care, UMIT TIROL – Private University for Health Sciences and Health Technology, Eduard-Wallnöfer-Zentrum 1, Hall in Tirol, 6060, Austria, 43 17670491594, lukas.weidener@edu.umit-tirol.at %K artificial intelligence %K AI technology %K medicine %K medical education %K medical curriculum %K medical school %K AI ethics %K ethics %D 2024 %7 5.1.2024 %9 Original Paper %J JMIR Med Educ %G English %X Background: The use of artificial intelligence (AI) in medicine not only directly impacts the medical profession but is also increasingly associated with various potential ethical aspects. In addition, the expanding use of AI and AI-based applications such as ChatGPT demands a corresponding shift in medical education to adequately prepare future practitioners for the effective use of these tools and address the associated ethical challenges they present. Objective: This study aims to explore how medical students from Germany, Austria, and Switzerland perceive the use of AI in medicine and the teaching of AI and AI ethics in medical education in accordance with their use of AI-based chat applications, such as ChatGPT. Methods: This cross-sectional study, conducted from June 15 to July 15, 2023, surveyed medical students across Germany, Austria, and Switzerland using a web-based survey. This study aimed to assess students’ perceptions of AI in medicine and the integration of AI and AI ethics into medical education. The survey, which included 53 items across 6 sections, was developed and pretested. Data analysis used descriptive statistics (median, mode, IQR, total number, and percentages) and either the chi-square or Mann-Whitney U tests, as appropriate. Results: Surveying 487 medical students across Germany, Austria, and Switzerland revealed limited formal education on AI or AI ethics within medical curricula, although 38.8% (189/487) had prior experience with AI-based chat applications, such as ChatGPT. Despite varied prior exposures, 71.7% (349/487) anticipated a positive impact of AI on medicine. There was widespread consensus (385/487, 74.9%) on the need for AI and AI ethics instruction in medical education, although the current offerings were deemed inadequate. Regarding the AI ethics education content, all proposed topics were rated as highly relevant. Conclusions: This study revealed a pronounced discrepancy between the use of AI-based (chat) applications, such as ChatGPT, among medical students in Germany, Austria, and Switzerland and the teaching of AI in medical education. To adequately prepare future medical professionals, there is an urgent need to integrate the teaching of AI and AI ethics into the medical curricula. %M 38180787 %R 10.2196/51247 %U https://mededu.jmir.org/2024/1/e51247 %U https://doi.org/10.2196/51247 %U http://www.ncbi.nlm.nih.gov/pubmed/38180787 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e51148 %T Pure Wisdom or Potemkin Villages? A Comparison of ChatGPT 3.5 and ChatGPT 4 on USMLE Step 3 Style Questions: Quantitative Analysis %A Knoedler,Leonard %A Alfertshofer,Michael %A Knoedler,Samuel %A Hoch,Cosima C %A Funk,Paul F %A Cotofana,Sebastian %A Maheta,Bhagvat %A Frank,Konstantin %A Brébant,Vanessa %A Prantl,Lukas %A Lamby,Philipp %+ Department of Plastic, Hand and Reconstructive Surgery, University Hospital Regensburg, Franz-Josef-Strauß-Allee 11, Regensburg, 93053, Germany, 49 151 44824958, leonardknoedler@t-online.de %K ChatGPT %K United States Medical Licensing Examination %K artificial intelligence %K USMLE %K USMLE Step 1 %K OpenAI %K medical education %K clinical decision-making %D 2024 %7 5.1.2024 %9 Original Paper %J JMIR Med Educ %G English %X Background: The United States Medical Licensing Examination (USMLE) has been critical in medical education since 1992, testing various aspects of a medical student’s knowledge and skills through different steps, based on their training level. Artificial intelligence (AI) tools, including chatbots like ChatGPT, are emerging technologies with potential applications in medicine. However, comprehensive studies analyzing ChatGPT’s performance on USMLE Step 3 in large-scale scenarios and comparing different versions of ChatGPT are limited. Objective: This paper aimed to analyze ChatGPT’s performance on USMLE Step 3 practice test questions to better elucidate the strengths and weaknesses of AI use in medical education and deduce evidence-based strategies to counteract AI cheating. Methods: A total of 2069 USMLE Step 3 practice questions were extracted from the AMBOSS study platform. After including 229 image-based questions, a total of 1840 text-based questions were further categorized and entered into ChatGPT 3.5, while a subset of 229 questions were entered into ChatGPT 4. Responses were recorded, and the accuracy of ChatGPT answers as well as its performance in different test question categories and for different difficulty levels were compared between both versions. Results: Overall, ChatGPT 4 demonstrated a statistically significant superior performance compared to ChatGPT 3.5, achieving an accuracy of 84.7% (194/229) and 56.9% (1047/1840), respectively. A noteworthy correlation was observed between the length of test questions and the performance of ChatGPT 3.5 (ρ=–0.069; P=.003), which was absent in ChatGPT 4 (P=.87). Additionally, the difficulty of test questions, as categorized by AMBOSS hammer ratings, showed a statistically significant correlation with performance for both ChatGPT versions, with ρ=–0.289 for ChatGPT 3.5 and ρ=–0.344 for ChatGPT 4. ChatGPT 4 surpassed ChatGPT 3.5 in all levels of test question difficulty, except for the 2 highest difficulty tiers (4 and 5 hammers), where statistical significance was not reached. Conclusions: In this study, ChatGPT 4 demonstrated remarkable proficiency in taking the USMLE Step 3, with an accuracy rate of 84.7% (194/229), outshining ChatGPT 3.5 with an accuracy rate of 56.9% (1047/1840). Although ChatGPT 4 performed exceptionally, it encountered difficulties in questions requiring the application of theoretical concepts, particularly in cardiology and neurology. These insights are pivotal for the development of examination strategies that are resilient to AI and underline the promising role of AI in the realm of medical education and diagnostics. %M 38180782 %R 10.2196/51148 %U https://mededu.jmir.org/2024/1/e51148 %U https://doi.org/10.2196/51148 %U http://www.ncbi.nlm.nih.gov/pubmed/38180782 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 3 %N %P e51168 %T Learning From International Comparators of National Medical Imaging Initiatives for AI Development: Multiphase Qualitative Study %A Karpathakis,Kassandra %A Pencheon,Emma %A Cushnan,Dominic %+ Decimal.health, 50 Milk Street, Boston, MA, 02109, United States, 1 6086285988, kass.karpathakis@gmail.com %K digital health %K mobile health %K mHealth %K medical imaging %K artificial intelligence %K health policy %D 2024 %7 4.1.2024 %9 Original Paper %J JMIR AI %G English %X Background: The COVID-19 pandemic drove investment and research into medical imaging platforms to provide data to create artificial intelligence (AI) algorithms for the management of patients with COVID-19. Building on the success of England’s National COVID-19 Chest Imaging Database, the national digital policy body (NHSX) sought to create a generalized national medical imaging platform for the development, validation, and deployment of algorithms. Objective: This study aims to understand international use cases of medical imaging platforms for the development and implementation of algorithms to inform the creation of England’s national imaging platform. Methods: The National Health Service (NHS) AI Lab Policy and Strategy Team adopted a multiphased approach: (1) identification and prioritization of national AI imaging platforms; (2) Political, Economic, Social, Technological, Legal, and Environmental (PESTLE) factor analysis deep dive into national AI imaging platforms; (3) semistructured interviews with key stakeholders; (4) workshop on emerging themes and insights with the internal NHSX team; and (5) formulation of policy recommendations. Results: International use cases of national AI imaging platforms (n=7) were prioritized for PESTLE factor analysis. Stakeholders (n=13) from the international use cases were interviewed. Themes (n=8) from the semistructured interviews, including interview quotes, were analyzed with workshop participants (n=5). The outputs of the deep dives, interviews, and workshop were synthesized thematically into 8 categories with 17 subcategories. On the basis of the insights from the international use cases, policy recommendations (n=12) were developed to support the NHS AI Lab in the design and development of the English national medical imaging platform. Conclusions: The creation of AI algorithms supporting technology and infrastructure such as platforms often occurs in isolation within countries, let alone between countries. This novel policy research project sought to bridge the gap by learning from the challenges, successes, and experience of England’s international counterparts. Policy recommendations based on international learnings focused on the demonstrable benefits of the platform to secure sustainable funding, validation of algorithms and infrastructure to support in situ deployment, and creating wraparound tools for nontechnical participants such as clinicians to engage with algorithm creation. As health care organizations increasingly adopt technological solutions, policy makers have a responsibility to ensure that initiatives are informed by learnings from both national and international initiatives as well as disseminating the outcomes of their work. %M 38875584 %R 10.2196/51168 %U https://ai.jmir.org/2024/1/e51168 %U https://doi.org/10.2196/51168 %U http://www.ncbi.nlm.nih.gov/pubmed/38875584 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e51501 %T Text Dialogue Analysis for Primary Screening of Mild Cognitive Impairment: Development and Validation Study %A Wang,Changyu %A Liu,Siru %A Li,Aiqing %A Liu,Jialin %+ Information Center, West China Hospital, Sichuan University, No. 37 Guo Xue Xiang28 85422306, Chengdu, 610041, China, 86 28 85422306, DLJL8@163.com %K artificial intelligence %K AI %K AI models %K ChatGPT %K primary screening %K mild cognitive impairment %K standardization %K prompt design %K design %K artificial intelligence %K cognitive impairment %K screening %K model %K clinician %K diagnosis %D 2023 %7 29.12.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence models tailored to diagnose cognitive impairment have shown excellent results. However, it is unclear whether large linguistic models can rival specialized models by text alone. Objective: In this study, we explored the performance of ChatGPT for primary screening of mild cognitive impairment (MCI) and standardized the design steps and components of the prompts. Methods: We gathered a total of 174 participants from the DementiaBank screening and classified 70% of them into the training set and 30% of them into the test set. Only text dialogues were kept. Sentences were cleaned using a macro code, followed by a manual check. The prompt consisted of 5 main parts, including character setting, scoring system setting, indicator setting, output setting, and explanatory information setting. Three dimensions of variables from published studies were included: vocabulary (ie, word frequency and word ratio, phrase frequency and phrase ratio, and lexical complexity), syntax and grammar (ie, syntactic complexity and grammatical components), and semantics (ie, semantic density and semantic coherence). We used R 4.3.0. for the analysis of variables and diagnostic indicators. Results: Three additional indicators related to the severity of MCI were incorporated into the final prompt for the model. These indicators were effective in discriminating between MCI and cognitively normal participants: tip-of-the-tongue phenomenon (P<.001), difficulty with complex ideas (P<.001), and memory issues (P<.001). The final GPT-4 model achieved a sensitivity of 0.8636, a specificity of 0.9487, and an area under the curve of 0.9062 on the training set; on the test set, the sensitivity, specificity, and area under the curve reached 0.7727, 0.8333, and 0.8030, respectively. Conclusions: ChatGPT was effective in the primary screening of participants with possible MCI. Improved standardization of prompts by clinicians would also improve the performance of the model. It is important to note that ChatGPT is not a substitute for a clinician making a diagnosis. %M 38157230 %R 10.2196/51501 %U https://www.jmir.org/2023/1/e51501 %U https://doi.org/10.2196/51501 %U http://www.ncbi.nlm.nih.gov/pubmed/38157230 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e48834 %T Developing a Machine Learning Algorithm to Predict the Probability of Medical Staff Work Mode Using Human-Smartphone Interaction Patterns: Algorithm Development and Validation Study %A Chen,Hung-Hsun %A Lu,Henry Horng-Shing %A Weng,Wei-Hung %A Lin,Yu-Hsuan %+ Institute of Population Health Sciences, National Health Research Institutes, 35 Keyan Road Zhunan, Miaoli County, 35053, Taiwan, 886 37 206 166 ext 36383, yuhsuanlin@nhri.edu.tw %K human-smartphone interaction %K digital phenotyping %K work hours %K machine learning %K deep learning %K probability in work mode %K one-dimensional convolutional neural network %K extreme gradient-boosted trees %D 2023 %7 29.12.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Traditional methods for investigating work hours rely on an employee’s physical presence at the worksite. However, accurately identifying break times at the worksite and distinguishing remote work outside the worksite poses challenges in work hour estimations. Machine learning has the potential to differentiate between human-smartphone interactions at work and off work. Objective: In this study, we aimed to develop a novel approach called “probability in work mode,” which leverages human-smartphone interaction patterns and corresponding GPS location data to estimate work hours. Methods: To capture human-smartphone interactions and GPS locations, we used the “Staff Hours” app, developed by our team, to passively and continuously record participants’ screen events, including timestamps of notifications, screen on or off occurrences, and app usage patterns. Extreme gradient boosted trees were used to transform these interaction patterns into a probability, while 1-dimensional convolutional neural networks generated successive probabilities based on previous sequence probabilities. The resulting probability in work mode allowed us to discern periods of office work, off-work, breaks at the worksite, and remote work. Results: Our study included 121 participants, contributing to a total of 5503 person-days (person-days represent the cumulative number of days across all participants on which data were collected and analyzed). The developed machine learning model exhibited an average prediction performance, measured by the area under the receiver operating characteristic curve, of 0.915 (SD 0.064). Work hours estimated using the probability in work mode (higher than 0.5) were significantly longer (mean 11.2, SD 2.8 hours per day) than the GPS-defined counterparts (mean 10.2, SD 2.3 hours per day; P<.001). This discrepancy was attributed to the higher remote work time of 111.6 (SD 106.4) minutes compared to the break time of 54.7 (SD 74.5) minutes. Conclusions: Our novel approach, the probability in work mode, harnessed human-smartphone interaction patterns and machine learning models to enhance the precision and accuracy of work hour investigation. By integrating human-smartphone interactions and GPS data, our method provides valuable insights into work patterns, including remote work and breaks, offering potential applications in optimizing work productivity and well-being. %M 38157232 %R 10.2196/48834 %U https://www.jmir.org/2023/1/e48834 %U https://doi.org/10.2196/48834 %U http://www.ncbi.nlm.nih.gov/pubmed/38157232 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e51199 %T Empathy and Equity: Key Considerations for Large Language Model Adoption in Health Care %A Koranteng,Erica %A Rao,Arya %A Flores,Efren %A Lev,Michael %A Landman,Adam %A Dreyer,Keith %A Succi,Marc %+ Massachusetts General Hospital, 55 Fruit St, Boston, 02114, United States, 1 617 935 9144, msucci@mgh.harvard.edu %K ChatGPT %K AI %K artificial intelligence %K large language models %K LLMs %K ethics %K empathy %K equity %K bias %K language model %K health care application %K patient care %K care %K development %K framework %K model %K ethical implication %D 2023 %7 28.12.2023 %9 Viewpoint %J JMIR Med Educ %G English %X The growing presence of large language models (LLMs) in health care applications holds significant promise for innovative advancements in patient care. However, concerns about ethical implications and potential biases have been raised by various stakeholders. Here, we evaluate the ethics of LLMs in medicine along 2 key axes: empathy and equity. We outline the importance of these factors in novel models of care and develop frameworks for addressing these alongside LLM deployment. %M 38153778 %R 10.2196/51199 %U https://mededu.jmir.org/2023/1/e51199 %U https://doi.org/10.2196/51199 %U http://www.ncbi.nlm.nih.gov/pubmed/38153778 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e48904 %T Differentiating ChatGPT-Generated and Human-Written Medical Texts: Quantitative Study %A Liao,Wenxiong %A Liu,Zhengliang %A Dai,Haixing %A Xu,Shaochen %A Wu,Zihao %A Zhang,Yiyang %A Huang,Xiaoke %A Zhu,Dajiang %A Cai,Hongmin %A Li,Quanzheng %A Liu,Tianming %A Li,Xiang %+ Department of Radiology, Massachusetts General Hospital, 55 Fruit St, Boston, MA, 02114, United States, 1 7062480264, xli60@mgh.harvard.edu %K ChatGPT %K medical ethics %K linguistic analysis %K text classification %K artificial intelligence %K medical texts %K machine learning %D 2023 %7 28.12.2023 %9 Original Paper %J JMIR Med Educ %G English %X Background: Large language models, such as ChatGPT, are capable of generating grammatically perfect and human-like text content, and a large number of ChatGPT-generated texts have appeared on the internet. However, medical texts, such as clinical notes and diagnoses, require rigorous validation, and erroneous medical content generated by ChatGPT could potentially lead to disinformation that poses significant harm to health care and the general public. Objective: This study is among the first on responsible artificial intelligence–generated content in medicine. We focus on analyzing the differences between medical texts written by human experts and those generated by ChatGPT and designing machine learning workflows to effectively detect and differentiate medical texts generated by ChatGPT. Methods: We first constructed a suite of data sets containing medical texts written by human experts and generated by ChatGPT. We analyzed the linguistic features of these 2 types of content and uncovered differences in vocabulary, parts-of-speech, dependency, sentiment, perplexity, and other aspects. Finally, we designed and implemented machine learning methods to detect medical text generated by ChatGPT. The data and code used in this paper are published on GitHub. Results: Medical texts written by humans were more concrete, more diverse, and typically contained more useful information, while medical texts generated by ChatGPT paid more attention to fluency and logic and usually expressed general terminologies rather than effective information specific to the context of the problem. A bidirectional encoder representations from transformers–based model effectively detected medical texts generated by ChatGPT, and the F1 score exceeded 95%. Conclusions: Although text generated by ChatGPT is grammatically perfect and human-like, the linguistic characteristics of generated medical texts were different from those written by human experts. Medical text generated by ChatGPT could be effectively detected by the proposed machine learning algorithms. This study provides a pathway toward trustworthy and accountable use of large language models in medicine. %M 38153785 %R 10.2196/48904 %U https://mededu.jmir.org/2023/1/e48904 %U https://doi.org/10.2196/48904 %U http://www.ncbi.nlm.nih.gov/pubmed/38153785 %0 Journal Article %@ 1947-2579 %I JMIR Publications %V 15 %N %P e52782 %T Machine Learning Model for Predicting Mortality Risk in Patients With Complex Chronic Conditions: Retrospective Analysis %A Hernández Guillamet,Guillem %A Morancho Pallaruelo,Ariadna Ning %A Miró Mezquita,Laura %A Miralles,Ramón %A Mas,Miquel Àngel %A Ulldemolins Papaseit,María José %A Estrada Cuxart,Oriol %A López Seguí,Francesc %+ Chair in ICT and Health, Centre for Health and Social Care Research (CESS), University of Vic - Central University of Catalonia (UVic-UCC), Carrer Miquel Martí i Pol, 1, Vic, 08500, Spain, 1 938863342, francesc.lopez.segui@gmail.com %K machine learning %K mortality prediction %K chronicity %K chromic %K complex %K artificial intelligence %K complexity %K health data %K predict %K prediction %K predictive %K mortality %K death %K classification %K algorithm %K algorithms %K mortality risk %K risk prediction %D 2023 %7 28.12.2023 %9 Original Paper %J Online J Public Health Inform %G English %X Background: The health care system is undergoing a shift toward a more patient-centered approach for individuals with chronic and complex conditions, which presents a series of challenges, such as predicting hospital needs and optimizing resources. At the same time, the exponential increase in health data availability has made it possible to apply advanced statistics and artificial intelligence techniques to develop decision-support systems and improve resource planning, diagnosis, and patient screening. These methods are key to automating the analysis of large volumes of medical data and reducing professional workloads. Objective: This article aims to present a machine learning model and a case study in a cohort of patients with highly complex conditions. The object was to predict mortality within the following 4 years and early mortality over 6 months following diagnosis. The method used easily accessible variables and health care resource utilization information. Methods: A classification algorithm was selected among 6 models implemented and evaluated using a stratified cross-validation strategy with k=10 and a 70/30 train-test split. The evaluation metrics used included accuracy, recall, precision, F1-score, and area under the receiver operating characteristic (AUROC) curve. Results: The model predicted patient death with an 87% accuracy, recall of 87%, precision of 82%, F1-score of 84%, and area under the curve (AUC) of 0.88 using the best model, the Extreme Gradient Boosting (XGBoost) classifier. The results were worse when predicting premature deaths (following 6 months) with an 83% accuracy (recall=55%, precision=64% F1-score=57%, and AUC=0.88) using the Gradient Boosting (GRBoost) classifier. Conclusions: This study showcases encouraging outcomes in forecasting mortality among patients with intricate and persistent health conditions. The employed variables are conveniently accessible, and the incorporation of health care resource utilization information of the patient, which has not been employed by current state-of-the-art approaches, displays promising predictive power. The proposed prediction model is designed to efficiently identify cases that need customized care and proactively anticipate the demand for critical resources by health care providers. %M 38223690 %R 10.2196/52782 %U https://ojphi.jmir.org/2023/1/e52782 %U https://doi.org/10.2196/52782 %U http://www.ncbi.nlm.nih.gov/pubmed/38223690 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e51798 %T Exploring the Potential of ChatGPT-4 in Predicting Refractive Surgery Categorizations: Comparative Study %A Ćirković,Aleksandar %A Katz,Toam %+ Care Vision Germany, Ltd, Zeltnerstraße 1-3, Nuremberg, 90443, Germany, 49 9119564950, aleksandar.cirkovic@mailbox.org %K artificial intelligence %K machine learning %K decision support systems %K clinical %K refractive surgical procedures %K risk assessment %K ophthalmology %K health informatics %K predictive modeling %K data analysis %K medical decision-making %K eHealth %K ChatGPT-4 %K ChatGPT %K refractive surgery %K categorization %K AI-powered algorithm %K large language model %K decision-making %D 2023 %7 28.12.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Refractive surgery research aims to optimally precategorize patients by their suitability for various types of surgery. Recent advances have led to the development of artificial intelligence–powered algorithms, including machine learning approaches, to assess risks and enhance workflow. Large language models (LLMs) like ChatGPT-4 (OpenAI LP) have emerged as potential general artificial intelligence tools that can assist across various disciplines, possibly including refractive surgery decision-making. However, their actual capabilities in precategorizing refractive surgery patients based on real-world parameters remain unexplored. Objective: This exploratory study aimed to validate ChatGPT-4’s capabilities in precategorizing refractive surgery patients based on commonly used clinical parameters. The goal was to assess whether ChatGPT-4’s performance when categorizing batch inputs is comparable to those made by a refractive surgeon. A simple binary set of categories (patient suitable for laser refractive surgery or not) as well as a more detailed set were compared. Methods: Data from 100 consecutive patients from a refractive clinic were anonymized and analyzed. Parameters included age, sex, manifest refraction, visual acuity, and various corneal measurements and indices from Scheimpflug imaging. This study compared ChatGPT-4’s performance with a clinician’s categorizations using Cohen κ coefficient, a chi-square test, a confusion matrix, accuracy, precision, recall, F1-score, and receiver operating characteristic area under the curve. Results: A statistically significant noncoincidental accordance was found between ChatGPT-4 and the clinician’s categorizations with a Cohen κ coefficient of 0.399 for 6 categories (95% CI 0.256-0.537) and 0.610 for binary categorization (95% CI 0.372-0.792). The model showed temporal instability and response variability, however. The chi-square test on 6 categories indicated an association between the 2 raters’ distributions (χ²5=94.7, P<.001). Here, the accuracy was 0.68, precision 0.75, recall 0.68, and F1-score 0.70. For 2 categories, the accuracy was 0.88, precision 0.88, recall 0.88, F1-score 0.88, and area under the curve 0.79. Conclusions: This study revealed that ChatGPT-4 exhibits potential as a precategorization tool in refractive surgery, showing promising agreement with clinician categorizations. However, its main limitations include, among others, dependency on solely one human rater, small sample size, the instability and variability of ChatGPT’s (OpenAI LP) output between iterations and nontransparency of the underlying models. The results encourage further exploration into the application of LLMs like ChatGPT-4 in health care, particularly in decision-making processes that require understanding vast clinical data. Future research should focus on defining the model’s accuracy with prompt and vignette standardization, detecting confounding factors, and comparing to other versions of ChatGPT-4 and other LLMs to pave the way for larger-scale validation and real-world implementation. %M 38153777 %R 10.2196/51798 %U https://formative.jmir.org/2023/1/e51798 %U https://doi.org/10.2196/51798 %U http://www.ncbi.nlm.nih.gov/pubmed/38153777 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e48544 %T Economic Evaluations and Equity in the Use of Artificial Intelligence in Imaging Exams for Medical Diagnosis in People With Skin, Neurological, and Pulmonary Diseases: Protocol for a Systematic Review %A Santana,Giulia Osório %A Couto,Rodrigo de Macedo %A Loureiro,Rafael Maffei %A Furriel,Brunna Carolinne Rocha Silva %A Rother,Edna Terezinha %A de Paiva,Joselisa Péres Queiroz %A Correia,Lucas Reis %+ PROADI-SUS, Hospital Israelita Albert Einstein, Madre Cabrini Street, 462, Tower A, 5th Floor, São Paulo, Brazil, 55 11 97444 8995, giulia.santana@einstein.br %K artificial intelligence %K economic evaluation %K equity %K medical diagnosis %K health care system %K technology %K systematic review %K cost-effectiveness %K imaging exam %K intervention %D 2023 %7 28.12.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: Traditional health care systems face long-standing challenges, including patient diversity, geographical disparities, and financial constraints. The emergence of artificial intelligence (AI) in health care offers solutions to these challenges. AI, a multidisciplinary field, enhances clinical decision-making. However, imbalanced AI models may enhance health disparities. Objective: This systematic review aims to investigate the economic performance and equity impact of AI in diagnostic imaging for skin, neurological, and pulmonary diseases. The research question is “To what extent does the use of AI in imaging exams for diagnosing skin, neurological, and pulmonary diseases result in improved economic outcomes, and does it promote equity in health care systems?” Methods: The study is a systematic review of economic and equity evaluations following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) and CHEERS (Consolidated Health Economic Evaluation Reporting Standards) guidelines. Eligibility criteria include articles reporting on economic evaluations or equity considerations related to AI-based diagnostic imaging for specified diseases. Data will be collected from PubMed, Embase, Scopus, Web of Science, and reference lists. Data quality and transferability will be assessed according to CHEC (Consensus on Health Economic Criteria), EPHPP (Effective Public Health Practice Project), and Welte checklists. Results: This systematic review began in March 2023. The literature search identified 9,526 publications and, after full-text screening, 9 publications were included in the study. We plan to submit a manuscript to a peer-reviewed journal once it is finalized, with an expected completion date in January 2024. Conclusions: AI in diagnostic imaging offers potential benefits but also raises concerns about equity and economic impact. Bias in algorithms and disparities in access may hinder equitable outcomes. Evaluating the economic viability of AI applications is essential for resource allocation and affordability. Policy makers and health care stakeholders can benefit from this review’s insights to make informed decisions. Limitations, including study variability and publication bias, will be considered in the analysis. This systematic review will provide valuable insights into the economic and equity implications of AI in diagnostic imaging. It aims to inform evidence-based decision-making and contribute to more efficient and equitable health care systems. International Registered Report Identifier (IRRID): DERR1-10.2196/48544 %M 38153775 %R 10.2196/48544 %U https://www.researchprotocols.org/2023/1/e48544 %U https://doi.org/10.2196/48544 %U http://www.ncbi.nlm.nih.gov/pubmed/38153775 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 6 %N %P e48589 %T Crowdsourcing Skin Demarcations of Chronic Graft-Versus-Host Disease in Patient Photographs: Training Versus Performance Study %A McNeil,Andrew J %A Parks,Kelsey %A Liu,Xiaoqi %A Jiang,Bohan %A Coco,Joseph %A McCool,Kira %A Fabbri,Daniel %A Duhaime,Erik P %A Dawant,Benoit M %A Tkaczyk,Eric R %+ Dermatology Service and Research Service, Department of Veterans Affairs, Tennessee Valley Healthcare System, 1310 24th Avenue South, Nashville, TN, 37212, United States, 1 6159364633, eric.tkaczyk@vumc.org %K graft-versus-host disease %K cGVHD %K crowdsourcing %K dermatology %K labeling %K segmentation %K skin %K medical image %K imaging %K feasibility %K artificial intelligence %D 2023 %7 26.12.2023 %9 Original Paper %J JMIR Dermatol %G English %X Background: Chronic graft-versus-host disease (cGVHD) is a significant cause of long-term morbidity and mortality in patients after allogeneic hematopoietic cell transplantation. Skin is the most commonly affected organ, and visual assessment of cGVHD can have low reliability. Crowdsourcing data from nonexpert participants has been used for numerous medical applications, including image labeling and segmentation tasks. Objective: This study aimed to assess the ability of crowds of nonexpert raters—individuals without any prior training for identifying or marking cGHVD—to demarcate photos of cGVHD-affected skin. We also studied the effect of training and feedback on crowd performance. Methods: Using a Canfield Vectra H1 3D camera, 360 photographs of the skin of 36 patients with cGVHD were taken. Ground truth demarcations were provided in 3D by a trained expert and reviewed by a board-certified dermatologist. In total, 3000 2D images (projections from various angles) were created for crowd demarcation through the DiagnosUs mobile app. Raters were split into high and low feedback groups. The performances of 4 different crowds of nonexperts were analyzed, including 17 raters per image for the low and high feedback groups, 32-35 raters per image for the low feedback group, and the top 5 performers for each image from the low feedback group. Results: Across 8 demarcation competitions, 130 raters were recruited to the high feedback group and 161 to the low feedback group. This resulted in a total of 54,887 individual demarcations from the high feedback group and 78,967 from the low feedback group. The nonexpert crowds achieved good overall performance for segmenting cGVHD-affected skin with minimal training, achieving a median surface area error of less than 12% of skin pixels for all crowds in both the high and low feedback groups. The low feedback crowds performed slightly poorer than the high feedback crowd, even when a larger crowd was used. Tracking the 5 most reliable raters from the low feedback group for each image recovered a performance similar to that of the high feedback crowd. Higher variability between raters for a given image was not found to correlate with lower performance of the crowd consensus demarcation and cannot therefore be used as a measure of reliability. No significant learning was observed during the task as more photos and feedback were seen. Conclusions: Crowds of nonexpert raters can demarcate cGVHD images with good overall performance. Tracking the top 5 most reliable raters provided optimal results, obtaining the best performance with the lowest number of expert demarcations required for adequate training. However, the agreement amongst individual nonexperts does not help predict whether the crowd has provided an accurate result. Future work should explore the performance of crowdsourcing in standard clinical photos and further methods to estimate the reliability of consensus demarcations. %M 38147369 %R 10.2196/48589 %U https://derma.jmir.org/2023/1/e48589 %U https://doi.org/10.2196/48589 %U http://www.ncbi.nlm.nih.gov/pubmed/38147369 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e51229 %T Comparisons of Quality, Correctness, and Similarity Between ChatGPT-Generated and Human-Written Abstracts for Basic Research: Cross-Sectional Study %A Cheng,Shu-Li %A Tsai,Shih-Jen %A Bai,Ya-Mei %A Ko,Chih-Hung %A Hsu,Chih-Wei %A Yang,Fu-Chi %A Tsai,Chia-Kuang %A Tu,Yu-Kang %A Yang,Szu-Nian %A Tseng,Ping-Tao %A Hsu,Tien-Wei %A Liang,Chih-Sung %A Su,Kuan-Pin %+ Department of Psychiatry, E-Da Dachang Hospital, I-Shou University, No. 305, Dachang 1st Rd., Sanmin District, Kaohsiung, 807, Taiwan, 886 7 5599123, s9801101@gmail.com %K ChatGPT %K abstract %K AI-generated scientific content %K plagiarism %K artificial intelligence %K NLP %K natural language processing %K LLM %K language model %K language models %K text %K textual %K generation %K generative %K extract %K extraction %K scientific research %K academic research %K publication %K publications %K abstracts %D 2023 %7 25.12.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: ChatGPT may act as a research assistant to help organize the direction of thinking and summarize research findings. However, few studies have examined the quality, similarity (abstracts being similar to the original one), and accuracy of the abstracts generated by ChatGPT when researchers provide full-text basic research papers. Objective: We aimed to assess the applicability of an artificial intelligence (AI) model in generating abstracts for basic preclinical research. Methods: We selected 30 basic research papers from Nature, Genome Biology, and Biological Psychiatry. Excluding abstracts, we inputted the full text into ChatPDF, an application of a language model based on ChatGPT, and we prompted it to generate abstracts with the same style as used in the original papers. A total of 8 experts were invited to evaluate the quality of these abstracts (based on a Likert scale of 0-10) and identify which abstracts were generated by ChatPDF, using a blind approach. These abstracts were also evaluated for their similarity to the original abstracts and the accuracy of the AI content. Results: The quality of ChatGPT-generated abstracts was lower than that of the actual abstracts (10-point Likert scale: mean 4.72, SD 2.09 vs mean 8.09, SD 1.03; P<.001). The difference in quality was significant in the unstructured format (mean difference –4.33; 95% CI –4.79 to –3.86; P<.001) but minimal in the 4-subheading structured format (mean difference –2.33; 95% CI –2.79 to –1.86). Among the 30 ChatGPT-generated abstracts, 3 showed wrong conclusions, and 10 were identified as AI content. The mean percentage of similarity between the original and the generated abstracts was not high (2.10%-4.40%). The blinded reviewers achieved a 93% (224/240) accuracy rate in guessing which abstracts were written using ChatGPT. Conclusions: Using ChatGPT to generate a scientific abstract may not lead to issues of similarity when using real full texts written by humans. However, the quality of the ChatGPT-generated abstracts was suboptimal, and their accuracy was not 100%. %M 38145486 %R 10.2196/51229 %U https://www.jmir.org/2023/1/e51229 %U https://doi.org/10.2196/51229 %U http://www.ncbi.nlm.nih.gov/pubmed/38145486 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e50373 %T AI-Enabled Medical Education: Threads of Change, Promising Futures, and Risky Realities Across Four Potential Future Worlds %A Knopp,Michelle I %A Warm,Eric J %A Weber,Danielle %A Kelleher,Matthew %A Kinnear,Benjamin %A Schumacher,Daniel J %A Santen,Sally A %A Mendonça,Eneida %A Turner,Laurah %+ Department of Medical Education, College of Medicine, University of Cincinnati, Cincinnati, OH, United States, 1 5133303999, turnela@ucmail.uc.edu %K artificial intelligence %K medical education %K scenario planning %K future of healthcare %K ethics and AI %K future %K scenario %K ChatGPT %K generative %K GPT-4 %K ethic %K ethics %K ethical %K strategic planning %K Open-AI %K OpenAI %K privacy %K autonomy %K autonomous %D 2023 %7 25.12.2023 %9 Viewpoint %J JMIR Med Educ %G English %X Background: The rapid trajectory of artificial intelligence (AI) development and advancement is quickly outpacing society's ability to determine its future role. As AI continues to transform various aspects of our lives, one critical question arises for medical education: what will be the nature of education, teaching, and learning in a future world where the acquisition, retention, and application of knowledge in the traditional sense are fundamentally altered by AI? Objective: The purpose of this perspective is to plan for the intersection of health care and medical education in the future. Methods: We used GPT-4 and scenario-based strategic planning techniques to craft 4 hypothetical future worlds influenced by AI's integration into health care and medical education. This method, used by organizations such as Shell and the Accreditation Council for Graduate Medical Education, assesses readiness for alternative futures and effectively manages uncertainty, risk, and opportunity. The detailed scenarios provide insights into potential environments the medical profession may face and lay the foundation for hypothesis generation and idea-building regarding responsible AI implementation. Results: The following 4 worlds were created using OpenAI’s GPT model: AI Harmony, AI conflict, The world of Ecological Balance, and Existential Risk. Risks include disinformation and misinformation, loss of privacy, widening inequity, erosion of human autonomy, and ethical dilemmas. Benefits involve improved efficiency, personalized interventions, enhanced collaboration, early detection, and accelerated research. Conclusions: To ensure responsible AI use, the authors suggest focusing on 3 key areas: developing a robust ethical framework, fostering interdisciplinary collaboration, and investing in education and training. A strong ethical framework emphasizes patient safety, privacy, and autonomy while promoting equity and inclusivity. Interdisciplinary collaboration encourages cooperation among various experts in developing and implementing AI technologies, ensuring that they address the complex needs and challenges in health care and medical education. Investing in education and training prepares professionals and trainees with necessary skills and knowledge to effectively use and critically evaluate AI technologies. The integration of AI in health care and medical education presents a critical juncture between transformative advancements and significant risks. By working together to address both immediate and long-term risks and consequences, we can ensure that AI integration leads to a more equitable, sustainable, and prosperous future for both health care and medical education. As we engage with AI technologies, our collective actions will ultimately determine the state of the future of health care and medical education to harness AI's power while ensuring the safety and well-being of humanity. %M 38145471 %R 10.2196/50373 %U https://mededu.jmir.org/2023/1/e50373 %U https://doi.org/10.2196/50373 %U http://www.ncbi.nlm.nih.gov/pubmed/38145471 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e51921 %T Designing Human-Centered AI to Prevent Medication Dispensing Errors: Focus Group Study With Pharmacists %A Zheng,Yifan %A Rowell,Brigid %A Chen,Qiyuan %A Kim,Jin Yong %A Kontar,Raed Al %A Yang,X Jessie %A Lester,Corey A %+ Department of Clinical Pharmacy, College of Pharmacy, University of Michigan, 428 Church St, Ann Arbor, MI, 48109, United States, 1 734 647 8849, lesterca@umich.edu %K artificial intelligence %K communication %K design methods %K design %K development %K engineering %K focus groups %K human-computer interaction %K medication errors %K morbidity %K mortality %K patient safety %K safety %K SEIPS %K Systems Engineering Initiative for Patient Safety %K tool %K user-centered design methods %K user-centered %K visualization %D 2023 %7 25.12.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Medication errors, including dispensing errors, represent a substantial worldwide health risk with significant implications in terms of morbidity, mortality, and financial costs. Although pharmacists use methods like barcode scanning and double-checking for dispensing verification, these measures exhibit limitations. The application of artificial intelligence (AI) in pharmacy verification emerges as a potential solution, offering precision, rapid data analysis, and the ability to recognize medications through computer vision. For AI to be embraced, it must be designed with the end user in mind, fostering trust, clear communication, and seamless collaboration between AI and pharmacists. Objective: This study aimed to gather pharmacists’ feedback in a focus group setting to help inform the initial design of the user interface and iterative designs of the AI prototype. Methods: A multidisciplinary research team engaged pharmacists in a 3-stage process to develop a human-centered AI system for medication dispensing verification. To design the AI model, we used a Bayesian neural network that predicts the dispensed pills’ National Drug Code (NDC). Discussion scripts regarding how to design the system and feedback in focus groups were collected through audio recordings and professionally transcribed, followed by a content analysis guided by the Systems Engineering Initiative for Patient Safety and Human-Machine Teaming theoretical frameworks. Results: A total of 8 pharmacists participated in 3 rounds of focus groups to identify current challenges in medication dispensing verification, brainstorm solutions, and provide feedback on our AI prototype. Participants considered several teaming scenarios, generally favoring a hybrid teaming model where the AI assists in the verification process and a pharmacist intervenes based on medication risk level and the AI’s confidence level. Pharmacists highlighted the need for improving the interpretability of AI systems, such as adding stepwise checkmarks, probability scores, and details about drugs the AI model frequently confuses with the target drug. Pharmacists emphasized the need for simplicity and accessibility. They favored displaying only essential information to prevent overwhelming users with excessive data. Specific design features, such as juxtaposing pill images with their packaging for quick comparisons, were requested. Pharmacists preferred accept, reject, or unsure options. The final prototype interface included (1) checkmarks to compare pill characteristics between the AI-predicted NDC and the prescription’s expected NDC, (2) a histogram showing predicted probabilities for the AI-identified NDC, (3) an image of an AI-provided “confused” pill, and (4) an NDC match status (ie, match, unmatched, or unsure). Conclusions: In partnership with pharmacists, we developed a human-centered AI prototype designed to enhance AI interpretability and foster trust. This initiative emphasized human-machine collaboration and positioned AI as an augmentative tool rather than a replacement. This study highlights the process of designing a human-centered AI for dispensing verification, emphasizing its interpretability, confidence visualization, and collaborative human-machine teaming styles. %M 38145475 %R 10.2196/51921 %U https://formative.jmir.org/2023/1/e51921 %U https://doi.org/10.2196/51921 %U http://www.ncbi.nlm.nih.gov/pubmed/38145475 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e50865 %T Evaluation of GPT-4’s Chest X-Ray Impression Generation: A Reader Study on Performance and Perception %A Ziegelmayer,Sebastian %A Marka,Alexander W %A Lenhart,Nicolas %A Nehls,Nadja %A Reischl,Stefan %A Harder,Felix %A Sauter,Andreas %A Makowski,Marcus %A Graf,Markus %A Gawlitza,Joshua %+ Department of Diagnostic and Interventional Radiology, School of Medicine & Klinikum rechts der Isar, Technical University of Munich, Ismaninger Straße 22, Munich, 81675, Germany, 49 1759153694, ga89rog@mytum.de %K generative model %K GPT %K medical imaging %K artificial intelligence %K imaging %K radiology %K radiological %K radiography %K diagnostic %K chest %K x-ray %K x-rays %K generative %K multimodal %K impression %K impressions %K image %K images %K AI %D 2023 %7 22.12.2023 %9 Research Letter %J J Med Internet Res %G English %X Exploring the generative capabilities of the multimodal GPT-4, our study uncovered significant differences between radiological assessments and automatic evaluation metrics for chest x-ray impression generation and revealed radiological bias. %M 38133918 %R 10.2196/50865 %U https://www.jmir.org/2023/1/e50865 %U https://doi.org/10.2196/50865 %U http://www.ncbi.nlm.nih.gov/pubmed/38133918 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e48244 %T Explainable Artificial Intelligence Warning Model Using an Ensemble Approach for In-Hospital Cardiac Arrest Prediction: Retrospective Cohort Study %A Kim,Yun Kwan %A Koo,Ja Hyung %A Lee,Sun Jung %A Song,Hee Seok %A Lee,Minji %+ Department of Biomedical Software Engineering, The Catholic University of Korea, 43, Jibong-ro, Bucheon, Gyeonggi, 14662, Republic of Korea, 82 2 2164 4364, minjilee@catholic.ac.kr %K cardiac arrest prediction %K ensemble learning %K temporal pattern changes %K cost-sensitive learning %K electronic medical records %D 2023 %7 22.12.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Cardiac arrest (CA) is the leading cause of death in critically ill patients. Clinical research has shown that early identification of CA reduces mortality. Algorithms capable of predicting CA with high sensitivity have been developed using multivariate time series data. However, these algorithms suffer from a high rate of false alarms, and their results are not clinically interpretable. Objective: We propose an ensemble approach using multiresolution statistical features and cosine similarity–based features for the timely prediction of CA. Furthermore, this approach provides clinically interpretable results that can be adopted by clinicians. Methods: Patients were retrospectively analyzed using data from the Medical Information Mart for Intensive Care-IV database and the eICU Collaborative Research Database. Based on the multivariate vital signs of a 24-hour time window for adults diagnosed with heart failure, we extracted multiresolution statistical and cosine similarity–based features. These features were used to construct and develop gradient boosting decision trees. Therefore, we adopted cost-sensitive learning as a solution. Then, 10-fold cross-validation was performed to check the consistency of the model performance, and the Shapley additive explanation algorithm was used to capture the overall interpretability of the proposed model. Next, external validation using the eICU Collaborative Research Database was performed to check the generalization ability. Results: The proposed method yielded an overall area under the receiver operating characteristic curve (AUROC) of 0.86 and area under the precision-recall curve (AUPRC) of 0.58. In terms of the timely prediction of CA, the proposed model achieved an AUROC above 0.80 for predicting CA events up to 6 hours in advance. The proposed method simultaneously improved precision and sensitivity to increase the AUPRC, which reduced the number of false alarms while maintaining high sensitivity. This result indicates that the predictive performance of the proposed model is superior to the performances of the models reported in previous studies. Next, we demonstrated the effect of feature importance on the clinical interpretability of the proposed method and inferred the effect between the non-CA and CA groups. Finally, external validation was performed using the eICU Collaborative Research Database, and an AUROC of 0.74 and AUPRC of 0.44 were obtained in a general intensive care unit population. Conclusions: The proposed framework can provide clinicians with more accurate CA prediction results and reduce false alarm rates through internal and external validation. In addition, clinically interpretable prediction results can facilitate clinician understanding. Furthermore, the similarity of vital sign changes can provide insights into temporal pattern changes in CA prediction in patients with heart failure–related diagnoses. Therefore, our system is sufficiently feasible for routine clinical use. In addition, regarding the proposed CA prediction system, a clinically mature application has been developed and verified in the future digital health field. %M 38133922 %R 10.2196/48244 %U https://www.jmir.org/2023/1/e48244 %U https://doi.org/10.2196/48244 %U http://www.ncbi.nlm.nih.gov/pubmed/38133922 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e51302 %T Medical Student Experiences and Perceptions of ChatGPT and Artificial Intelligence: Cross-Sectional Study %A Alkhaaldi,Saif M I %A Kassab,Carl H %A Dimassi,Zakia %A Oyoun Alsoud,Leen %A Al Fahim,Maha %A Al Hageh,Cynthia %A Ibrahim,Halah %+ Department of Medical Science, Khalifa University College of Medicine and Health Sciences, PO Box 127788, Abu Dhabi, United Arab Emirates, 971 23125423, halah.ibrahim@ku.ac.ae %K medical education %K ChatGPT %K artificial intelligence %K large language models %K LLMs %K AI %K medical student %K medical students %K cross-sectional study %K training %K technology %K medicine %K health care professionals %K risk %K technology %K education %D 2023 %7 22.12.2023 %9 Original Paper %J JMIR Med Educ %G English %X Background: Artificial intelligence (AI) has the potential to revolutionize the way medicine is learned, taught, and practiced, and medical education must prepare learners for these inevitable changes. Academic medicine has, however, been slow to embrace recent AI advances. Since its launch in November 2022, ChatGPT has emerged as a fast and user-friendly large language model that can assist health care professionals, medical educators, students, trainees, and patients. While many studies focus on the technology’s capabilities, potential, and risks, there is a gap in studying the perspective of end users. Objective: The aim of this study was to gauge the experiences and perspectives of graduating medical students on ChatGPT and AI in their training and future careers. Methods: A cross-sectional web-based survey of recently graduated medical students was conducted in an international academic medical center between May 5, 2023, and June 13, 2023. Descriptive statistics were used to tabulate variable frequencies. Results: Of 325 applicants to the residency programs, 265 completed the survey (an 81.5% response rate). The vast majority of respondents denied using ChatGPT in medical school, with 20.4% (n=54) using it to help complete written assessments and only 9.4% using the technology in their clinical work (n=25). More students planned to use it during residency, primarily for exploring new medical topics and research (n=168, 63.4%) and exam preparation (n=151, 57%). Male students were significantly more likely to believe that AI will improve diagnostic accuracy (n=47, 51.7% vs n=69, 39.7%; P=.001), reduce medical error (n=53, 58.2% vs n=71, 40.8%; P=.002), and improve patient care (n=60, 65.9% vs n=95, 54.6%; P=.007). Previous experience with AI was significantly associated with positive AI perception in terms of improving patient care, decreasing medical errors and misdiagnoses, and increasing the accuracy of diagnoses (P=.001, P<.001, P=.008, respectively). Conclusions: The surveyed medical students had minimal formal and informal experience with AI tools and limited perceptions of the potential uses of AI in health care but had overall positive views of ChatGPT and AI and were optimistic about the future of AI in medical education and health care. Structured curricula and formal policies and guidelines are needed to adequately prepare medical learners for the forthcoming integration of AI in medicine. %M 38133911 %R 10.2196/51302 %U https://mededu.jmir.org/2023/1/e51302 %U https://doi.org/10.2196/51302 %U http://www.ncbi.nlm.nih.gov/pubmed/38133911 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e50658 %T Using ChatGPT for Clinical Practice and Medical Education: Cross-Sectional Survey of Medical Students’ and Physicians’ Perceptions %A Tangadulrat,Pasin %A Sono,Supinya %A Tangtrakulwanich,Boonsin %+ Department of Orthopedics, Faculty of Medicine, Prince of Songkla University, Floor 9 Rattanacheewarak Building, 15 Kanchanavanich Rd, Hatyai, 90110, Thailand, 66 74451601, boonsin.b@psu.ac.th %K ChatGPT %K AI %K artificial intelligence %K medical education %K medical students %K student %K students %K intern %K interns %K resident %K residents %K knee osteoarthritis %K survey %K surveys %K questionnaire %K questionnaires %K chatbot %K chatbots %K conversational agent %K conversational agents %K attitude %K attitudes %K opinion %K opinions %K perception %K perceptions %K perspective %K perspectives %K acceptance %D 2023 %7 22.12.2023 %9 Original Paper %J JMIR Med Educ %G English %X Background: ChatGPT is a well-known large language model–based chatbot. It could be used in the medical field in many aspects. However, some physicians are still unfamiliar with ChatGPT and are concerned about its benefits and risks. Objective: We aim to evaluate the perception of physicians and medical students toward using ChatGPT in the medical field. Methods: A web-based questionnaire was sent to medical students, interns, residents, and attending staff with questions regarding their perception toward using ChatGPT in clinical practice and medical education. Participants were also asked to rate their perception of ChatGPT’s generated response about knee osteoarthritis. Results: Participants included 124 medical students, 46 interns, 37 residents, and 32 attending staff. After reading ChatGPT’s response, 132 of the 239 (55.2%) participants had a positive rating about using ChatGPT for clinical practice. The proportion of positive answers was significantly lower in graduated physicians (48/115, 42%) compared with medical students (84/124, 68%; P<.001). Participants listed a lack of a patient-specific treatment plan, updated evidence, and a language barrier as ChatGPT’s pitfalls. Regarding using ChatGPT for medical education, the proportion of positive responses was also significantly lower in graduate physicians (71/115, 62%) compared to medical students (103/124, 83.1%; P<.001). Participants were concerned that ChatGPT’s response was too superficial, might lack scientific evidence, and might need expert verification. Conclusions: Medical students generally had a positive perception of using ChatGPT for guiding treatment and medical education, whereas graduated doctors were more cautious in this regard. Nonetheless, both medical students and graduated doctors positively perceived using ChatGPT for creating patient educational materials. %M 38133908 %R 10.2196/50658 %U https://mededu.jmir.org/2023/1/e50658 %U https://doi.org/10.2196/50658 %U http://www.ncbi.nlm.nih.gov/pubmed/38133908 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e50413 %T College Students’ Employability, Cognition, and Demands for ChatGPT in the AI Era Among Chinese Nursing Students: Web-Based Survey %A Luo,Yuanyuan %A Weng,Huiting %A Yang,Li %A Ding,Ziwei %A Wang,Qin %+ Clinical Nursing Teaching and Research Section, The Second Xiangya Hospital of Central South University, 139 Renming Middle Road of Furong District, Changsha, 410011, China, 86 187 7480 6226, wangqin3421@csu.edu.cn %K college students’ employability %K artificial intelligence quotient %K ChatGPT %K nursing students %K China %K college student %K AI %K artificial intelligence %D 2023 %7 22.12.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: With the rapid development of artificial intelligence (AI) and the widespread use of ChatGPT, nursing students’ artificial intelligence quotient (AIQ), employability, cognition, and demand for ChatGPT are worthy of attention. Objective: We aimed to investigate Chinese nursing students’ AIQ and employability status as well as their cognition and demand for the latest AI tool—ChatGPT. This study was conducted to guide future initiatives in nursing intelligence education and to improve the employability of nursing students. Methods: We used a cross-sectional survey to understand nursing college students’ AIQ, employability, cognition, and demand for ChatGPT. Using correlation analysis and multiple hierarchical regression analysis, we explored the relevant factors in the employability of nursing college students. Results: In this study, out of 1788 students, 1453 (81.30%) had not used ChatGPT, and 1170 (65.40%) had never heard of ChatGPT before this survey. College students’ employability scores were positively correlated with AIQ, self-regulation ability, and their home location and negatively correlated with school level. Additionally, men scored higher on college students’ employability compared to women. Furthermore, 76.5% of the variance was explained by the multiple hierarchical regression model for predicting college students’ employability scores. Conclusions: Chinese nursing students have limited familiarity and experience with ChatGPT, while their AIQ remains intermediate. Thus, educators should pay more attention to cultivating nursing students’ AIQ and self-regulation ability to enhance their employability. Employability, especially for female students, those from rural backgrounds, and students in key colleges, deserves more attention in future educational efforts. %M 38133923 %R 10.2196/50413 %U https://formative.jmir.org/2023/1/e50413 %U https://doi.org/10.2196/50413 %U http://www.ncbi.nlm.nih.gov/pubmed/38133923 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e48892 %T NephroCAGE—German-Canadian Consortium on AI for Improved Kidney Transplantation Outcome: Protocol for an Algorithm Development and Validation Study %A Schapranow,Matthieu-P %A Bayat,Mozhgan %A Rasheed,Aadil %A Naik,Marcel %A Graf,Verena %A Schmidt,Danilo %A Budde,Klemens %A Cardinal,Héloïse %A Sapir-Pichhadze,Ruth %A Fenninger,Franz %A Sherwood,Karen %A Keown,Paul %A Günther,Oliver P %A Pandl,Konstantin D %A Leiser,Florian %A Thiebes,Scott %A Sunyaev,Ali %A Niemann,Matthias %A Schimanski,Andreas %A Klein,Thomas %+ Hasso Plattner Institute for Digital Engineering, University of Potsdam, Prof.-Dr.-Helmert-Street 2-3, Potsdam, 14482, Germany, 49 3315509 ext 1331, schapranow@hpi.de %K posttransplant risks %K kidney transplantation %K federated learning infrastructure %K clinical prediction model %K donor-recipient matching %K multinational transplant data set %D 2023 %7 22.12.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: Recent advances in hardware and software enabled the use of artificial intelligence (AI) algorithms for analysis of complex data in a wide range of daily-life use cases. We aim to explore the benefits of applying AI to a specific use case in transplant nephrology: risk prediction for severe posttransplant events. For the first time, we combine multinational real-world transplant data, which require specific legal and technical protection measures. Objective: The German-Canadian NephroCAGE consortium aims to develop and evaluate specific processes, software tools, and methods to (1) combine transplant data of more than 8000 cases over the past decades from leading transplant centers in Germany and Canada, (2) implement specific measures to protect sensitive transplant data, and (3) use multinational data as a foundation for developing high-quality prognostic AI models. Methods: To protect sensitive transplant data addressing the first and second objectives, we aim to implement a decentralized NephroCAGE federated learning infrastructure upon a private blockchain. Our NephroCAGE federated learning infrastructure enables a switch of paradigms: instead of pooling sensitive data into a central database for analysis, it enables the transfer of clinical prediction models (CPMs) to clinical sites for local data analyses. Thus, sensitive transplant data reside protected in their original sites while the comparable small algorithms are exchanged instead. For our third objective, we will compare the performance of selected AI algorithms, for example, random forest and extreme gradient boosting, as foundation for CPMs to predict severe short- and long-term posttransplant risks, for example, graft failure or mortality. The CPMs will be trained on donor and recipient data from retrospective cohorts of kidney transplant patients. Results: We have received initial funding for NephroCAGE in February 2021. All clinical partners have applied for and received ethics approval as of 2022. The process of exploration of clinical transplant database for variable extraction has started at all the centers in 2022. In total, 8120 patient records have been retrieved as of August 2023. The development and validation of CPMs is ongoing as of 2023. Conclusions: For the first time, we will (1) combine kidney transplant data from nephrology centers in Germany and Canada, (2) implement federated learning as a foundation to use such real-world transplant data as a basis for the training of CPMs in a privacy-preserving way, and (3) develop a learning software system to investigate population specifics, for example, to understand population heterogeneity, treatment specificities, and individual impact on selected posttransplant outcomes. International Registered Report Identifier (IRRID): DERR1-10.2196/48892 %M 38133915 %R 10.2196/48892 %U https://www.researchprotocols.org/2023/1/e48892 %U https://doi.org/10.2196/48892 %U http://www.ncbi.nlm.nih.gov/pubmed/38133915 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 11 %N %P e53785 %T Introducing the “AI Language Models in Health Care” Section: Actionable Strategies for Targeted and Wide-Scale Deployment %A Castonguay,Alexandre %A Lovis,Christian %+ Faculté des sciences infirmières, Université de Montréal, 2375, chemin de la Côte-Sainte-Catherine, Montréal, QC, H3T1A8, Canada, alexandre.castonguay.2@umontreal.ca %K generative AI %K health care digitalization %K AI in health care %K digital health standards %K AI implementation %K artificial intelligence %D 2023 %7 21.12.2023 %9 Editorial %J JMIR Med Inform %G English %X The realm of health care is on the cusp of a significant technological leap, courtesy of the advancements in artificial intelligence (AI) language models, but ensuring the ethical design, deployment, and use of these technologies is imperative to truly realize their potential in improving health care delivery and promoting human well-being and safety. Indeed, these models have demonstrated remarkable prowess in generating humanlike text, evidenced by a growing body of research and real-world applications. This capability paves the way for enhanced patient engagement, clinical decision support, and a plethora of other applications that were once considered beyond reach. However, the journey from potential to real-world application is laden with challenges ranging from ensuring reliability and transparency to navigating a complex regulatory landscape. There is still a need for comprehensive evaluation and rigorous validation to ensure that these models are reliable, transparent, and ethically sound. This editorial introduces the new section, titled “AI Language Models in Health Care.” This section seeks to create a platform for academics, practitioners, and innovators to share their insights, research findings, and real-world applications of AI language models in health care. The aim is to foster a community that is not only excited about the possibilities but also critically engaged with the ethical, practical, and regulatory challenges that lie ahead. %M 38127431 %R 10.2196/53785 %U https://medinform.jmir.org/2023/1/e53785 %U https://doi.org/10.2196/53785 %U http://www.ncbi.nlm.nih.gov/pubmed/38127431 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e50158 %T Holistic Human-Serving Digitization of Health Care Needs Integrated Automated System-Level Assessment Tools %A Welzel,Cindy %A Cotte,Fabienne %A Wekenborg,Magdalena %A Vasey,Baptiste %A McCulloch,Peter %A Gilbert,Stephen %+ Else Kröner Fresenius Center for Digital Health, TUD Dresden University of Technology, Fetscherstraße 74, Dresden, 01307, Germany, 49 35145819630, stephen.gilbert@ukdd.de %K health technology assessment %K human factors %K postmarket surveillance %K software as a medical device %K digital health tools %K quality assessment %K quality improvement %K regulatory framework %K user experience %K health care %D 2023 %7 20.12.2023 %9 Viewpoint %J J Med Internet Res %G English %X Digital health tools, platforms, and artificial intelligence– or machine learning–based clinical decision support systems are increasingly part of health delivery approaches, with an ever-greater degree of system interaction. Critical to the successful deployment of these tools is their functional integration into existing clinical routines and workflows. This depends on system interoperability and on intuitive and safe user interface design. The importance of minimizing emergent workflow stress through human factors research and purposeful design for integration cannot be overstated. Usability of tools in practice is as important as algorithm quality. Regulatory and health technology assessment frameworks recognize the importance of these factors to a certain extent, but their focus remains mainly on the individual product rather than on emergent system and workflow effects. The measurement of performance and user experience has so far been performed in ad hoc, nonstandardized ways by individual actors using their own evaluation approaches. We propose that a standard framework for system-level and holistic evaluation could be built into interacting digital systems to enable systematic and standardized system-wide, multiproduct, postmarket surveillance and technology assessment. Such a system could be made available to developers through regulatory or assessment bodies as an application programming interface and could be a requirement for digital tool certification, just as interoperability is. This would enable health systems and tool developers to collect system-level data directly from real device use cases, enabling the controlled and safe delivery of systematic quality assessment or improvement studies suitable for the complexity and interconnectedness of clinical workflows using developing digital health technologies. %M 38117545 %R 10.2196/50158 %U https://www.jmir.org/2023/1/e50158 %U https://doi.org/10.2196/50158 %U http://www.ncbi.nlm.nih.gov/pubmed/38117545 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e50903 %T Reimagining Core Entrustable Professional Activities for Undergraduate Medical Education in the Era of Artificial Intelligence %A Jacobs,Sarah Marie %A Lundy,Neva Nicole %A Issenberg,Saul Barry %A Chandran,Latha %+ Department of Medical Education, University of Miami Miller School of Medicine, 1120 NW 14th Street, Miami, FL, 33136, United States, 1 3052436491, bissenbe@miami.edu %K artificial intelligence %K entrustable professional activities %K medical education %K competency-based education %K educational technology %K machine learning %D 2023 %7 19.12.2023 %9 Viewpoint %J JMIR Med Educ %G English %X The proliferation of generative artificial intelligence (AI) and its extensive potential for integration into many aspects of health care signal a transformational shift within the health care environment. In this context, medical education must evolve to ensure that medical trainees are adequately prepared to navigate the rapidly changing health care landscape. Medical education has moved toward a competency-based education paradigm, leading the Association of American Medical Colleges (AAMC) to define a set of Entrustable Professional Activities (EPAs) as its practical operational framework in undergraduate medical education. The AAMC’s 13 core EPAs for entering residencies have been implemented with varying levels of success across medical schools. In this paper, we critically assess the existing core EPAs in the context of rapid AI integration in medicine. We identify EPAs that require refinement, redefinition, or comprehensive change to align with the emerging trends in health care. Moreover, this perspective proposes a set of “emerging” EPAs, informed by the changing landscape and capabilities presented by generative AI technologies. We provide a practical evaluation of the EPAs, alongside actionable recommendations on how medical education, viewed through the lens of the AAMC EPAs, can adapt and remain relevant amid rapid technological advancements. By leveraging the transformative potential of AI, we can reshape medical education to align with an AI-integrated future of medicine. This approach will help equip future health care professionals with technological competence and adaptive skills to meet the dynamic and evolving demands in health care. %M 38052721 %R 10.2196/50903 %U https://mededu.jmir.org/2023/1/e50903 %U https://doi.org/10.2196/50903 %U http://www.ncbi.nlm.nih.gov/pubmed/38052721 %0 Journal Article %@ 2817-092X %I JMIR Publications %V 2 %N %P e50660 %T Application of a Low-Cost mHealth Solution for the Remote Monitoring of Patients With Epilepsy: Algorithm Development and Validation %A Sriraam,Natarajan %A Raghu,S %A Gommer,Erik D %A Hilkman,Danny M W %A Temel,Yasin %A Vasudeva Rao,Shyam %A Hegde,Alangar Satyaranjandas %A L Kubben,Pieter %+ Center for Medical Electronics and Computing, Ramaiah Institute of Technology, MSRIT Post, M S Ramaiah Nagar, Bengaluru, 560054, India, 91 9632294999, sriraam@msrit.edu %K Android %K epileptic seizures %K mobile health %K mHealth %K mobile phone–based epilepsy monitoring %K support vector machine %K seizure %K epileptic %K epilepsy %K monitoring %K smartphone %K smartphones %K mobile phone %K neurology %K neuroscience %K electroencephalography %K EEG %K brain %K classification %K detect %K detection %K neurological %K electroencephalogram %K diagnose %K diagnosis %K diagnostic %K imaging %D 2023 %7 19.12.2023 %9 Original Paper %J JMIR Neurotech %G English %X Background: Implementing automated seizure detection in long-term electroencephalography (EEG) analysis enables the remote monitoring of patients with epilepsy, thereby improving their quality of life. Objective: The objective of this study was to explore an mHealth (mobile health) solution by investigating the feasibility of smartphones for processing large EEG recordings for the remote monitoring of patients with epilepsy. Methods: We developed a mobile app to automatically analyze and classify epileptic seizures using EEG. We used the cross-database model developed in our previous study, incorporating successive decomposition index and matrix determinant as features, adaptive median feature baseline correction for overcoming interdatabase feature variation, and postprocessing-based support vector machine for classification using 5 different EEG databases. The Sezect (Seizure Detect) Android app was built using the Chaquopy software development kit, which uses the Python language in Android Studio. Various durations of EEG signals were tested on different smartphones to check the feasibility of the Sezect app. Results: We observed a sensitivity of 93.5%, a specificity of 97.5%, and a false detection rate of 1.5 per hour for EEG recordings using the Sezect app. The various mobile phones did not differ substantially in processing time, which indicates a range of phone models can be used for implementation. The computational time required to process real-time EEG data via smartphones and the classification results suggests that our mHealth app could be a valuable asset for monitoring patients with epilepsy. Conclusions: Smartphones have multipurpose use in health care, offering tools that can improve the quality of patients’ lives. %R 10.2196/50660 %U https://neuro.jmir.org/2023/1/e50660 %U https://doi.org/10.2196/50660 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e45770 %T The Evolution of Artificial Intelligence in Biomedicine: Bibliometric Analysis %A Gu,Jiasheng %A Gao,Chongyang %A Wang,Lili %+ Department of Computer Science, Dartmouth College, 15 Thayer Dive, Hanover, NH, 03755, United States, 1 516 888 6691, lili.wang.gr@dartmouth.edu %K bibliometrics %K trend forecasting %K AI in medicine %K Word2Vec %K regression models %K agglomerative clustering %K usage %K artificial intelligence %K utilization %K biomedical %K effectiveness %K AI trends %K predictive model %K development %D 2023 %7 19.12.2023 %9 Original Paper %J JMIR AI %G English %X Background: The utilization of artificial intelligence (AI) technologies in the biomedical field has attracted increasing attention in recent decades. Studying how past AI technologies have found their way into medicine over time can help to predict which current (and future) AI technologies have the potential to be utilized in medicine in the coming years, thereby providing a helpful reference for future research directions. Objective: The aim of this study was to predict the future trend of AI technologies used in different biomedical domains based on past trends of related technologies and biomedical domains. Methods: We collected a large corpus of articles from the PubMed database pertaining to the intersection of AI and biomedicine. Initially, we attempted to use regression on the extracted keywords alone; however, we found that this approach did not provide sufficient information. Therefore, we propose a method called “background-enhanced prediction” to expand the knowledge utilized by the regression algorithm by incorporating both the keywords and their surrounding context. This method of data construction resulted in improved performance across the six regression models evaluated. Our findings were confirmed through experiments on recurrent prediction and forecasting. Results: In our analysis using background information for prediction, we found that a window size of 3 yielded the best results, outperforming the use of keywords alone. Furthermore, utilizing data only prior to 2017, our regression projections for the period of 2017-2021 exhibited a high coefficient of determination (R2), which reached up to 0.78, demonstrating the effectiveness of our method in predicting long-term trends. Based on the prediction, studies related to proteins and tumors will be pushed out of the top 20 and become replaced by early diagnostics, tomography, and other detection technologies. These are certain areas that are well-suited to incorporate AI technology. Deep learning, machine learning, and neural networks continue to be the dominant AI technologies in biomedical applications. Generative adversarial networks represent an emerging technology with a strong growth trend. Conclusions: In this study, we explored AI trends in the biomedical field and developed a predictive model to forecast future trends. Our findings were confirmed through experiments on current trends. %M 38875563 %R 10.2196/45770 %U https://ai.jmir.org/2023/1/e45770 %U https://doi.org/10.2196/45770 %U http://www.ncbi.nlm.nih.gov/pubmed/38875563 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e50342 %T Existing Barriers Faced by and Future Design Recommendations for Direct-to-Consumer Health Care Artificial Intelligence Apps: Scoping Review %A He,Xin %A Zheng,Xi %A Ding,Huiyuan %+ School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Luoyu Road 1037, Hongshan District, Wuhan, 430074, China, 86 18707149470, xinh@hust.edu.cn %K artificial intelligence %K medical %K health care %K consumer %K consumers %K app %K apps %K application %K applications %K DTC %K direct to consumer %K barrier %K barriers %K implementation %K design %K scoping %K review methods %K review methodology %D 2023 %7 18.12.2023 %9 Review %J J Med Internet Res %G English %X Background: Direct-to-consumer (DTC) health care artificial intelligence (AI) apps hold the potential to bridge the spatial and temporal disparities in health care resources, but they also come with individual and societal risks due to AI errors. Furthermore, the manner in which consumers interact directly with health care AI is reshaping traditional physician-patient relationships. However, the academic community lacks a systematic comprehension of the research overview for such apps. Objective: This paper systematically delineated and analyzed the characteristics of included studies, identified existing barriers and design recommendations for DTC health care AI apps mentioned in the literature and also provided a reference for future design and development. Methods: This scoping review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews guidelines and was conducted according to Arksey and O’Malley’s 5-stage framework. Peer-reviewed papers on DTC health care AI apps published until March 27, 2023, in Web of Science, Scopus, the ACM Digital Library, IEEE Xplore, PubMed, and Google Scholar were included. The papers were analyzed using Braun and Clarke’s reflective thematic analysis approach. Results: Of the 2898 papers retrieved, 32 (1.1%) covering this emerging field were included. The included papers were recently published (2018-2023), and most (23/32, 72%) were from developed countries. The medical field was mostly general practice (8/32, 25%). In terms of users and functionalities, some apps were designed solely for single-consumer groups (24/32, 75%), offering disease diagnosis (14/32, 44%), health self-management (8/32, 25%), and health care information inquiry (4/32, 13%). Other apps connected to physicians (5/32, 16%), family members (1/32, 3%), nursing staff (1/32, 3%), and health care departments (2/32, 6%), generally to alert these groups to abnormal conditions of consumer users. In addition, 8 barriers and 6 design recommendations related to DTC health care AI apps were identified. Some more subtle obstacles that are particularly worth noting and corresponding design recommendations in consumer-facing health care AI systems, including enhancing human-centered explainability, establishing calibrated trust and addressing overtrust, demonstrating empathy in AI, improving the specialization of consumer-grade products, and expanding the diversity of the test population, were further discussed. Conclusions: The booming DTC health care AI apps present both risks and opportunities, which highlights the need to explore their current status. This paper systematically summarized and sorted the characteristics of the included studies, identified existing barriers faced by, and made future design recommendations for such apps. To the best of our knowledge, this is the first study to systematically summarize and categorize academic research on these apps. Future studies conducting the design and development of such systems could refer to the results of this study, which is crucial to improve the health care services provided by DTC health care AI apps. %M 38109173 %R 10.2196/50342 %U https://www.jmir.org/2023/1/e50342 %U https://doi.org/10.2196/50342 %U http://www.ncbi.nlm.nih.gov/pubmed/38109173 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e45515 %T Clinical Timing-Sequence Warning Models for Serious Bacterial Infections in Adults Based on Machine Learning: Retrospective Study %A Liu,Jian %A Chen,Jia %A Dong,Yongquan %A Lou,Yan %A Tian,Yu %A Sun,Huiyao %A Jin,Yuqing %A Li,Jingsong %A Qiu,Yunqing %+ State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, the First Affiliated Hospital, College of Medicine, Zhejiang University, NO.79 Qingchun Road, Hangzhou, 310003, China, 86 13588189339, qiuyq@zju.edu.cn %K clinical timing-sequence warning models %K machine learning %K serious bacterial infection %K nomogram %D 2023 %7 18.12.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Serious bacterial infections (SBIs) are linked to unplanned hospital admissions and a high mortality rate. The early identification of SBIs is crucial in clinical practice. Objective: This study aims to establish and validate clinically applicable models designed to identify SBIs in patients with infective fever. Methods: Clinical data from 945 patients with infective fever, encompassing demographic and laboratory indicators, were retrospectively collected from a 2200-bed teaching hospital between January 2013 and December 2020. The data were randomly divided into training and test sets at a ratio of 7:3. Various machine learning (ML) algorithms, including Boruta, Lasso (least absolute shrinkage and selection operator), and recursive feature elimination, were utilized for feature filtering. The selected features were subsequently used to construct models predicting SBIs using logistic regression (LR), random forest (RF), and extreme gradient boosting (XGBoost) with 5-fold cross-validation. Performance metrics, including the receiver operating characteristic (ROC) curve and area under the ROC curve (AUC), accuracy, sensitivity, and other relevant parameters, were used to assess model performance. Considering both model performance and clinical needs, 2 clinical timing-sequence warning models were ultimately confirmed using LR analysis. The corresponding predictive nomograms were then plotted for clinical use. Moreover, a physician, blinded to the study, collected additional data from the same center involving 164 patients during 2021. The nomograms developed in the study were then applied in clinical practice to further validate their clinical utility. Results: In total, 69.9% (661/945) of the patients developed SBIs. Age, hemoglobin, neutrophil-to-lymphocyte ratio, fibrinogen, and C-reactive protein levels were identified as important features by at least two ML algorithms. Considering the collection sequence of these indicators and clinical demands, 2 timing-sequence models predicting the SBI risk were constructed accordingly: the early admission model (model 1) and the model within 24 hours of admission (model 2). LR demonstrated better stability than RF and XGBoost in both models and performed the best in model 2, with an AUC, accuracy, and sensitivity of 0.780 (95% CI 0.720-841), 0.754 (95% CI 0.698-804), and 0.776 (95% CI 0.711-832), respectively. XGBoost had an advantage over LR in AUC (0.708, 95% CI 0.641-775 vs 0.686, 95% CI 0.617-754), while RF achieved better accuracy (0.729, 95% CI 0.673-780) and sensitivity (0.790, 95% CI 0.728-844) than the other 2 approaches in model 1. Two SBI-risk prediction nomograms were developed for clinical use based on LR, and they exhibited good performance with an accuracy of 0.707 and 0.750 and a sensitivity of 0.729 and 0.927 in clinical application. Conclusions: The clinical timing-sequence warning models demonstrated efficacy in predicting SBIs in patients suspected of having infective fever and in clinical application, suggesting good potential in clinical decision-making. Nevertheless, additional prospective and multicenter studies are necessary to further confirm their clinical utility. %M 38109177 %R 10.2196/45515 %U https://www.jmir.org/2023/1/e45515 %U https://doi.org/10.2196/45515 %U http://www.ncbi.nlm.nih.gov/pubmed/38109177 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e49023 %T Practical Considerations and Applied Examples of Cross-Validation for Model Development and Evaluation in Health Care: Tutorial %A Wilimitis,Drew %A Walsh,Colin G %+ Vanderbilt University Medical Center, Vanderbilt University, 2525 W End Ave, Suite 1475, Nashville, TN, 37203, United States, 1 6159365684, colin.walsh@vumc.org %K predictive modeling %K cross-validation %K tutorial %K model development %K risk detection %K clinical decision-making %K electronic health care %K eHealth data %K health care data %K data validation %K artificial intelligence %K AI %D 2023 %7 18.12.2023 %9 Tutorial %J JMIR AI %G English %X Cross-validation remains a popular means of developing and validating artificial intelligence for health care. Numerous subtypes of cross-validation exist. Although tutorials on this validation strategy have been published and some with applied examples, we present here a practical tutorial comparing multiple forms of cross-validation using a widely accessible, real-world electronic health care data set: Medical Information Mart for Intensive Care-III (MIMIC-III). This tutorial explored methods such as K-fold cross-validation and nested cross-validation, highlighting their advantages and disadvantages across 2 common predictive modeling use cases: classification (mortality) and regression (length of stay). We aimed to provide readers with reproducible notebooks and best practices for modeling with electronic health care data. We also described sets of useful recommendations as we demonstrated that nested cross-validation reduces optimistic bias but comes with additional computational challenges. This tutorial might improve the community’s understanding of these important methods while catalyzing the modeling community to apply these guides directly in their work using the published code. %M 38875530 %R 10.2196/49023 %U https://ai.jmir.org/2023/1/e49023 %U https://doi.org/10.2196/49023 %U http://www.ncbi.nlm.nih.gov/pubmed/38875530 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e44119 %T Performance Test of a Well-Trained Model for Meningioma Segmentation in Health Care Centers: Secondary Analysis Based on Four Retrospective Multicenter Data Sets %A Chen,Chaoyue %A Teng,Yuen %A Tan,Shuo %A Wang,Zizhou %A Zhang,Lei %A Xu,Jianguo %+ Neurosurgery Department, West China Hospital, Sichuan University, West China Hosptial, No 37, GuoXue Alley, Chengdu, 610041, China, 86 18980602049, drjianguoxu@gmail.com %K meningioma segmentation %K magnetic resonance imaging %K MRI %K convolutional neural network %K model test and verification %K CNN %K radiographic image interpretation %D 2023 %7 15.12.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Convolutional neural networks (CNNs) have produced state-of-the-art results in meningioma segmentation on magnetic resonance imaging (MRI). However, images obtained from different institutions, protocols, or scanners may show significant domain shift, leading to performance degradation and challenging model deployment in real clinical scenarios. Objective: This research aims to investigate the realistic performance of a well-trained meningioma segmentation model when deployed across different health care centers and verify the methods to enhance its generalization. Methods: This study was performed in four centers. A total of 606 patients with 606 MRIs were enrolled between January 2015 and December 2021. Manual segmentations, determined through consensus readings by neuroradiologists, were used as the ground truth mask. The model was previously trained using a standard supervised CNN called Deeplab V3+ and was deployed and tested separately in four health care centers. To determine the appropriate approach to mitigating the observed performance degradation, two methods were used: unsupervised domain adaptation and supervised retraining. Results: The trained model showed a state-of-the-art performance in tumor segmentation in two health care institutions, with a Dice ratio of 0.887 (SD 0.108, 95% CI 0.903-0.925) in center A and a Dice ratio of 0.874 (SD 0.800, 95% CI 0.854-0.894) in center B. Whereas in the other health care institutions, the performance declined, with Dice ratios of 0.631 (SD 0.157, 95% CI 0.556-0.707) in center C and 0.649 (SD 0.187, 95% CI 0.566-0.732) in center D, as they obtained the MRI using different scanning protocols. The unsupervised domain adaptation showed a significant improvement in performance scores, with Dice ratios of 0.842 (SD 0.073, 95% CI 0.820-0.864) in center C and 0.855 (SD 0.097, 95% CI 0.826-0.886) in center D. Nonetheless, it did not overperform the supervised retraining, which achieved Dice ratios of 0.899 (SD 0.026, 95% CI 0.889-0.906) in center C and 0.886 (SD 0.046, 95% CI 0.870-0.903) in center D. Conclusions: Deploying the trained CNN model in different health care institutions may show significant performance degradation due to the domain shift of MRIs. Under this circumstance, the use of unsupervised domain adaptation or supervised retraining should be considered, taking into account the balance between clinical requirements, model performance, and the size of the available data. %M 38100181 %R 10.2196/44119 %U https://www.jmir.org/2023/1/e44119 %U https://doi.org/10.2196/44119 %U http://www.ncbi.nlm.nih.gov/pubmed/38100181 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e46929 %T Identifying Existing Evidence to Potentially Develop a Machine Learning Diagnostic Algorithm for Cough in Primary Care Settings: Scoping Review %A Cummerow,Julia %A Wienecke,Christin %A Engler,Nicola %A Marahrens,Philip %A Gruening,Philipp %A Steinhäuser,Jost %+ Institute of Family Medicine, University Medical Centre Schleswig-Holstein, Campus Lübeck, Ratzeburger Allee 160, Lübeck, 23538, Germany, 49 451 3101 8016, julia.cummerow@uni-luebeck.de %K cough %K predictor %K differential diagnosis %K primary health care %K artificial intelligence %D 2023 %7 14.12.2023 %9 Review %J J Med Internet Res %G English %X Background: Primary care is known to be one of the most complex health care settings because of the high number of theoretically possible diagnoses. Therefore, the process of clinical decision-making in primary care includes complex analytical and nonanalytical factors such as gut feelings and dealing with uncertainties. Artificial intelligence is also mandated to offer support in finding valid diagnoses. Nevertheless, to translate some aspects of what occurs during a consultation into a machine-based diagnostic algorithm, the probabilities for the underlying diagnoses (odds ratios) need to be determined. Objective: Cough is one of the most common reasons for a consultation in general practice, the core discipline in primary care. The aim of this scoping review was to identify the available data on cough as a predictor of various diagnoses encountered in general practice. In the context of an ongoing project, we reflect on this database as a possible basis for a machine-based diagnostic algorithm. Furthermore, we discuss the applicability of such an algorithm against the background of the specifics of general practice. Methods: The PubMed, Scopus, Web of Science, and Cochrane Library databases were searched with defined search terms, supplemented by the search for gray literature via the German Journal of Family Medicine until April 20, 2023. The inclusion criterion was the explicit analysis of cough as a predictor of any conceivable disease. Exclusion criteria were articles that did not provide original study results, articles in languages other than English or German, and articles that did not mention cough as a diagnostic predictor. Results: In total, 1458 records were identified for screening, of which 35 articles met our inclusion criteria. Most of the results (11/35, 31%) were found for chronic obstructive pulmonary disease. The others were distributed among the diagnoses of asthma or unspecified obstructive airway disease, various infectious diseases, bronchogenic carcinoma, dyspepsia or gastroesophageal reflux disease, and adverse effects of angiotensin-converting enzyme inhibitors. Positive odds ratios were found for cough as a predictor of chronic obstructive pulmonary disease, influenza, COVID-19 infections, and bronchial carcinoma, whereas the results for cough as a predictor of asthma and other nonspecified obstructive airway diseases were inconsistent. Conclusions: Reliable data on cough as a predictor of various diagnoses encountered in general practice are scarce. The example of cough does not provide a sufficient database to contribute odds to a machine learning–based diagnostic algorithm in a meaningful way. %M 38096024 %R 10.2196/46929 %U https://www.jmir.org/2023/1/e46929 %U https://doi.org/10.2196/46929 %U http://www.ncbi.nlm.nih.gov/pubmed/38096024 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 6 %N %P e49889 %T The Accuracy and Appropriateness of ChatGPT Responses on Nonmelanoma Skin Cancer Information Using Zero-Shot Chain of Thought Prompting %A O'Hagan,Ross %A Poplausky,Dina %A Young,Jade N %A Gulati,Nicholas %A Levoska,Melissa %A Ungar,Benjamin %A Ungar,Jonathan %+ Department of Dermatology, Icahn School of Medicine at Mount Sinai, 5th Floor, 5 East 98th Street, New York, NY, 10029, United States, 1 212 241 3288, jonathan.ungar@mountsinai.org %K ChatGPT %K artificial intelligence %K large language models %K nonmelanoma skin %K skin cancer %K cell carcinoma %K chatbot %K dermatology %K dermatologist %K epidermis %K dermis %K oncology %K cancer %D 2023 %7 14.12.2023 %9 Research Letter %J JMIR Dermatol %G English %X %M 38096013 %R 10.2196/49889 %U https://derma.jmir.org/2023/1/e49889 %U https://doi.org/10.2196/49889 %U http://www.ncbi.nlm.nih.gov/pubmed/38096013 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e48351 %T Development of Risk Prediction Models for Severe Periodontitis in a Thai Population: Statistical and Machine Learning Approaches %A Teza,Htun %A Pattanateepapon,Anuchate %A Lertpimonchai,Attawood %A Vathesatogkit,Prin %A J McKay,Gareth %A Attia,John %A Thakkinstian,Ammarin %+ Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, 270 RAMA VI Road, Phayathai, Bangkok, 10400, Thailand, 66 2 201 1269, anuchate.gab@mahidol.ac.th %K periodontitis %K prediction %K machine learning %K repeated measures %K panel data %D 2023 %7 14.12.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Severe periodontitis affects 26% of Thai adults and 11.2% of adults globally and is characterized by the loss of alveolar bone height. Full-mouth examination by periodontal probing is the gold standard for diagnosis but is time- and resource-intensive. A screening model to identify those at high risk of severe periodontitis would offer a targeted approach and aid in reducing the workload for dentists. While statistical modelling by a logistic regression is commonly applied, optimal performance depends on feature selections and engineering. Machine learning has been recently gaining favor given its potential discriminatory power and ability to deal with multiway interactions without the requirements of linear assumptions. Objective: We aim to compare the performance of screening models developed using statistical and machine learning approaches for the risk prediction of severe periodontitis. Methods: This study used data from the prospective Electricity Generating Authority of Thailand cohort. Dental examinations were performed for the 2008 and 2013 surveys. Oral examinations (ie, number of teeth and oral hygiene index and plaque scores), periodontal pocket depth, and gingival recession were performed by dentists. The outcome of interest was severe periodontitis diagnosed by the Centre for Disease Control–American Academy of Periodontology, defined as 2 or more interproximal sites with a clinical attachment level ≥6 mm (on different teeth) and 1 or more interproximal sites with a periodontal pocket depth ≥5 mm. Risk prediction models were developed using mixed-effects logistic regression (MELR), recurrent neural network, mixed-effects support vector machine, and mixed-effects decision tree models. A total of 21 features were considered as predictive features, including 4 demographic characteristics, 2 physical examinations, 4 underlying diseases, 1 medication, 2 risk behaviors, 2 oral features, and 6 laboratory features. Results: A total of 3883 observations from 2086 participants were split into development (n=3112, 80.1%) and validation (n=771, 19.9%) sets with prevalences of periodontitis of 34.4% (n=1070) and 34.1% (n=263), respectively. The final MELR model contained 6 features (gender, education, smoking, diabetes mellitus, number of teeth, and plaque score) with an area under the curve (AUC) of 0.983 (95% CI 0.977-0.989) and positive likelihood ratio (LR+) of 11.9 (95% CI 8.8-16.3). Machine learning yielded lower performance than the MELR model, with AUC (95% CI) and LR+ (95% CI) values of 0.712 (0.669-0.754) and 2.1 (1.8-2.6), respectively, for the recurrent neural network model; 0.698 (0.681-0.734) and 2.1 (1.7-2.6), respectively, for the mixed-effects support vector machine model; and 0.662 (0.621-0.702) and 2.4 (1.9-3.0), respectively, for the mixed-effects decision tree model. Conclusions: The MELR model might be more useful than machine learning for large-scale screening to identify those at high risk of severe periodontitis for periodontal evaluation. External validation using data from other centers is required to evaluate the generalizability of the model. %M 38096008 %R 10.2196/48351 %U https://formative.jmir.org/2023/1/e48351 %U https://doi.org/10.2196/48351 %U http://www.ncbi.nlm.nih.gov/pubmed/38096008 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e45979 %T A Machine Learning Algorithm Predicting Acute Kidney Injury in Intensive Care Unit Patients (NAVOY Acute Kidney Injury): Proof-of-Concept Study %A Persson,Inger %A Grünwald,Adam %A Morvan,Ludivine %A Becedas,David %A Arlbrandt,Martin %+ Department of Statistics, Uppsala University, Box 513, Uppsala, SE 751 20, Sweden, 46 738275861, inger.persson@statistik.uu.se %K acute kidney injury %K AKI %K algorithm %K early detection %K electronic health records %K ICU %K intensive care unit %K machine learning %K nephrology %K prediction %K software as a medical device %D 2023 %7 14.12.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Acute kidney injury (AKI) represents a significant global health challenge, leading to increased patient distress and financial health care burdens. The development of AKI in intensive care unit (ICU) settings is linked to prolonged ICU stays, a heightened risk of long-term renal dysfunction, and elevated short- and long-term mortality rates. The current diagnostic approach for AKI is based on late indicators, such as elevated serum creatinine and decreased urine output, which can only detect AKI after renal injury has transpired. There are no treatments to reverse or restore renal function once AKI has developed, other than supportive care. Early prediction of AKI enables proactive management and may improve patient outcomes. Objective: The primary aim was to develop a machine learning algorithm, NAVOY Acute Kidney Injury, capable of predicting the onset of AKI in ICU patients using data routinely collected in ICU electronic health records. The ultimate goal was to create a clinical decision support tool that empowers ICU clinicians to proactively manage AKI and, consequently, enhance patient outcomes. Methods: We developed the NAVOY Acute Kidney Injury algorithm using a hybrid ensemble model, which combines the strengths of both a Random Forest (Leo Breiman and Adele Cutler) and an XGBoost model (Tianqi Chen). To ensure the accuracy of predictions, the algorithm used 22 clinical variables for hourly predictions of AKI as defined by the Kidney Disease: Improving Global Outcomes guidelines. Data for algorithm development were sourced from the Massachusetts Institute of Technology Lab for Computational Physiology Medical Information Mart for Intensive Care IV clinical database, focusing on ICU patients aged 18 years or older. Results: The developed algorithm, NAVOY Acute Kidney Injury, uses 4 hours of input and can, with high accuracy, predict patients with a high risk of developing AKI 12 hours before onset. The prediction performance compares well with previously published prediction algorithms designed to predict AKI onset in accordance with Kidney Disease: Improving Global Outcomes diagnosis criteria, with an impressive area under the receiver operating characteristics curve (AUROC) of 0.91 and an area under the precision-recall curve (AUPRC) of 0.75. The algorithm’s predictive performance was externally validated on an independent hold-out test data set, confirming its ability to predict AKI with exceptional accuracy. Conclusions: NAVOY Acute Kidney Injury is an important development in the field of critical care medicine. It offers the ability to predict the onset of AKI with high accuracy using only 4 hours of data routinely collected in ICU electronic health records. This early detection capability has the potential to strengthen patient monitoring and management, ultimately leading to improved patient outcomes. Furthermore, NAVOY Acute Kidney Injury has been granted Conformite Europeenne (CE)–marking, marking a significant milestone as the first CE-marked AKI prediction algorithm for commercial use in European ICUs. %M 38096015 %R 10.2196/45979 %U https://formative.jmir.org/2023/1/e45979 %U https://doi.org/10.2196/45979 %U http://www.ncbi.nlm.nih.gov/pubmed/38096015 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e44358 %T Using Conversational AI to Facilitate Mental Health Assessments and Improve Clinical Efficiency Within Psychotherapy Services: Real-World Observational Study %A Rollwage,Max %A Habicht,Johanna %A Juechems,Keno %A Carrington,Ben %A Viswanathan,Sruthi %A Stylianou,Mona %A Hauser,Tobias U %A Harper,Ross %+ Limbic Limited, Kemp House, 160 City Road, London, EC1V 2NX, United Kingdom, 44 07491263783, max@limbic.ai %K artificial intelligence %K National Health Service %K NHS %K Improving Access to Psychological Therapies %K IAPT %K mental health %K mental health assessment %K triage %K decision-support %K referral %K chatbot %K psychotherapy %K conversational agent %K assessment %K Talking Therapies %D 2023 %7 13.12.2023 %9 Original Paper %J JMIR AI %G English %X Background: Most mental health care providers face the challenge of increased demand for psychotherapy in the absence of increased funding or staffing. To overcome this supply-demand imbalance, care providers must increase the efficiency of service delivery. Objective: In this study, we examined whether artificial intelligence (AI)–enabled digital solutions can help mental health care practitioners to use their time more efficiently, and thus reduce strain on services and improve patient outcomes. Methods: In this study, we focused on the use of an AI solution (Limbic Access) to support initial patient referral and clinical assessment within the UK’s National Health Service. Data were collected from 9 Talking Therapies services across England, comprising 64,862 patients. Results: We showed that the use of this AI solution improves clinical efficiency by reducing the time clinicians spend on mental health assessments. Furthermore, we found improved outcomes for patients using the AI solution in several key metrics, such as reduced wait times, reduced dropout rates, improved allocation to appropriate treatment pathways, and, most importantly, improved recovery rates. When investigating the mechanism by which the AI solution achieved these improvements, we found that the provision of clinically relevant information ahead of clinical assessment was critical for these observed effects. Conclusions: Our results emphasize the utility of using AI solutions to support the mental health workforce, further highlighting the potential of AI solutions to increase the efficiency of care delivery and improve clinical outcomes for patients. %M 38875569 %R 10.2196/44358 %U https://ai.jmir.org/2023/1/e44358 %U https://doi.org/10.2196/44358 %U http://www.ncbi.nlm.nih.gov/pubmed/38875569 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e50797 %T Transformation and Articulation of Clinical Data to Understand Students’ and Health Professionals’ Clinical Reasoning: Protocol for a Scoping Review %A Deschênes,Marie-France %A Fernandez,Nicolas %A Lechasseur,Kathleen %A Caty,Marie-Ève %A Azimzadeh,Dina %A Mai,Tue-Chieu %A Lavoie,Patrick %+ Faculté des sciences infirmières, Université de Montréal, C. P. 6128, succ. Centre-Ville, Montréal, QC, H3C 3J7, Canada, 1 514 343 6111 ext 6879, marie-france.deschenes@umontreal.ca %K clinical reasoning %K semantic qualifiers %K discourse %K linguistics %K education %K natural language processing %K scoping review %K clinical data %K educational strategy %K student %K health care professional %K semantic transformation %D 2023 %7 13.12.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: There are still unanswered questions regarding effective educational strategies to promote the transformation and articulation of clinical data while teaching and learning clinical reasoning. Additionally, understanding how this process can be analyzed and assessed is crucial, particularly considering the rapid growth of natural language processing in artificial intelligence. Objective: The aim of this study is to map educational strategies to promote the transformation and articulation of clinical data among students and health care professionals and to explore the methods used to assess these individuals’ transformation and articulation of clinical data. Methods: This scoping review follows the Joanna Briggs Institute framework for scoping reviews and the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) checklist for the analysis. A literature search was performed in November 2022 using 5 databases: CINAHL (EBSCOhost), MEDLINE (Ovid), Embase (Ovid), PsycINFO (Ovid), and Web of Science (Clarivate). The protocol was registered on the Open Science Framework in November 2023. The scoping review will follow the 9-step framework proposed by Peters and colleagues of the Joanna Briggs Institute. A data extraction form has been developed using key themes from the research questions. Results: After removing duplicates, the initial search yielded 6656 results, and study selection is underway. The extracted data will be qualitatively analyzed and presented in a diagrammatic or tabular form alongside a narrative summary. The review will be completed by February 2024. Conclusions: By synthesizing the evidence on semantic transformation and articulation of clinical data during clinical reasoning education, this review aims to contribute to the refinement of educational strategies and assessment methods used in academic and continuing education programs. The insights gained from this review will help educators develop more effective semantic approaches for teaching or learning clinical reasoning, as opposed to fragmented, purely symptom-based or probabilistic approaches. Besides, the results may suggest some ways to address challenges related to the assessment of clinical reasoning and ensure that the assessment tasks accurately reflect learners’ developing competencies and educational progress. International Registered Report Identifier (IRRID): DERR1-10.2196/50797 %M 38090795 %R 10.2196/50797 %U https://www.researchprotocols.org/2023/1/e50797 %U https://doi.org/10.2196/50797 %U http://www.ncbi.nlm.nih.gov/pubmed/38090795 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e45815 %T Mapping the Bibliometrics Landscape of AI in Medicine: Methodological Study %A Shi,Jin %A Bendig,David %A Vollmar,Horst Christian %A Rasche,Peter %+ Institute for Entrepreneurship, University of Münster, Geiststraße 24 - 26, Münster, 48149, Germany, 49 2518323176, jshi1@uni-muenster.de %K artificial intelligence %K AI %K AI in medicine %K medical AI taxonomy %K Python %K latent Dirichlet allocation %K LDA %K topic modeling %K unsupervised machine learning %D 2023 %7 8.12.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI), conceived in the 1950s, has permeated numerous industries, intensifying in tandem with advancements in computing power. Despite the widespread adoption of AI, its integration into medicine trails other sectors. However, medical AI research has experienced substantial growth, attracting considerable attention from researchers and practitioners. Objective: In the absence of an existing framework, this study aims to outline the current landscape of medical AI research and provide insights into its future developments by examining all AI-related studies within PubMed over the past 2 decades. We also propose potential data acquisition and analysis methods, developed using Python (version 3.11) and to be executed in Spyder IDE (version 5.4.3), for future analogous research. Methods: Our dual-pronged approach involved (1) retrieving publication metadata related to AI from PubMed (spanning 2000-2022) via Python, including titles, abstracts, authors, journals, country, and publishing years, followed by keyword frequency analysis and (2) classifying relevant topics using latent Dirichlet allocation, an unsupervised machine learning approach, and defining the research scope of AI in medicine. In the absence of a universal medical AI taxonomy, we used an AI dictionary based on the European Commission Joint Research Centre AI Watch report, which emphasizes 8 domains: reasoning, planning, learning, perception, communication, integration and interaction, service, and AI ethics and philosophy. Results: From 2000 to 2022, a comprehensive analysis of 307,701 AI-related publications from PubMed highlighted a 36-fold increase. The United States emerged as a clear frontrunner, producing 68,502 of these articles. Despite its substantial contribution in terms of volume, China lagged in terms of citation impact. Diving into specific AI domains, as the Joint Research Centre AI Watch report categorized, the learning domain emerged dominant. Our classification analysis meticulously traced the nuanced research trajectories across each domain, revealing the multifaceted and evolving nature of AI’s application in the realm of medicine. Conclusions: The research topics have evolved as the volume of AI studies increases annually. Machine learning remains central to medical AI research, with deep learning expected to maintain its fundamental role. Empowered by predictive algorithms, pattern recognition, and imaging analysis capabilities, the future of AI research in medicine is anticipated to concentrate on medical diagnosis, robotic intervention, and disease management. Our topic modeling outcomes provide a clear insight into the focus of AI research in medicine over the past decades and lay the groundwork for predicting future directions. The domains that have attracted considerable research attention, primarily the learning domain, will continue to shape the trajectory of AI in medicine. Given the observed growing interest, the domain of AI ethics and philosophy also stands out as a prospective area of increased focus. %M 38064255 %R 10.2196/45815 %U https://www.jmir.org/2023/1/e45815 %U https://doi.org/10.2196/45815 %U http://www.ncbi.nlm.nih.gov/pubmed/38064255 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e47847 %T The Adoption of AI in Mental Health Care–Perspectives From Mental Health Professionals: Qualitative Descriptive Study %A Zhang,Melody %A Scandiffio,Jillian %A Younus,Sarah %A Jeyakumar,Tharshini %A Karsan,Inaara %A Charow,Rebecca %A Salhia,Mohammad %A Wiljer,David %+ University Health Network, 190 Elizabeth Street, R Fraser Elliot Building RFE 3S-441, Toronto, ON, M5G 2C4, Canada, 1 416 340 4800 ext 6322, David.wiljer@uhn.ca %K artificial intelligence %K education %K mental health %K behavioral health %K educators %K curriculum %D 2023 %7 7.12.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Artificial intelligence (AI) is transforming the mental health care environment. AI tools are increasingly accessed by clients and service users. Mental health professionals must be prepared not only to use AI but also to have conversations about it when delivering care. Despite the potential for AI to enable more efficient and reliable and higher-quality care delivery, there is a persistent gap among mental health professionals in the adoption of AI. Objective: A needs assessment was conducted among mental health professionals to (1) understand the learning needs of the workforce and their attitudes toward AI and (2) inform the development of AI education curricula and knowledge translation products. Methods: A qualitative descriptive approach was taken to explore the needs of mental health professionals regarding their adoption of AI through semistructured interviews. To reach maximum variation sampling, mental health professionals (eg, psychiatrists, mental health nurses, educators, scientists, and social workers) in various settings across Ontario (eg, urban and rural, public and private sector, and clinical and research) were recruited. Results: A total of 20 individuals were recruited. Participants included practitioners (9/20, 45% social workers and 1/20, 5% mental health nurses), educator scientists (5/20, 25% with dual roles as professors/lecturers and researchers), and practitioner scientists (3/20, 15% with dual roles as researchers and psychiatrists and 2/20, 10% with dual roles as researchers and mental health nurses). Four major themes emerged: (1) fostering practice change and building self-efficacy to integrate AI into patient care; (2) promoting system-level change to accelerate the adoption of AI in mental health; (3) addressing the importance of organizational readiness as a catalyst for AI adoption; and (4) ensuring that mental health professionals have the education, knowledge, and skills to harness AI in optimizing patient care. Conclusions: AI technologies are starting to emerge in mental health care. Although many digital tools, web-based services, and mobile apps are designed using AI algorithms, mental health professionals have generally been slower in the adoption of AI. As indicated by this study’s findings, the implications are 3-fold. At the individual level, digital professionals must see the value in digitally compassionate tools that retain a humanistic approach to care. For mental health professionals, resistance toward AI adoption must be acknowledged through educational initiatives to raise awareness about the relevance, practicality, and benefits of AI. At the organizational level, digital professionals and leaders must collaborate on governance and funding structures to promote employee buy-in. At the societal level, digital and mental health professionals should collaborate in the creation of formal AI training programs specific to mental health to address knowledge gaps. This study promotes the design of relevant and sustainable education programs to support the adoption of AI within the mental health care sphere. %M 38060307 %R 10.2196/47847 %U https://formative.jmir.org/2023/1/e47847 %U https://doi.org/10.2196/47847 %U http://www.ncbi.nlm.nih.gov/pubmed/38060307 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e52091 %T The Impact of Generative Conversational Artificial Intelligence on the Lesbian, Gay, Bisexual, Transgender, and Queer Community: Scoping Review %A Bragazzi,Nicola Luigi %A Crapanzano,Andrea %A Converti,Manlio %A Zerbetto,Riccardo %A Khamisy-Farah,Rola %+ Laboratory for Industrial and Applied Mathematics, Department of Mathematics and Statistics, York University, 4700 Keele Street, Toronto, ON, M3J 1P3, Canada, 1 416 736 2100, robertobragazzi@gmail.com %K generative conversational artificial intelligence %K chatbot %K lesbian, gay, bisexual, transgender, and queer community %K LGBTQ %K scoping review %K mobile phone %D 2023 %7 6.12.2023 %9 Review %J J Med Internet Res %G English %X Background: Despite recent significant strides toward acceptance, inclusion, and equality, members of the lesbian, gay, bisexual, transgender, and queer (LGBTQ) community still face alarming mental health disparities, being almost 3 times more likely to experience depression, anxiety, and suicidal thoughts than their heterosexual counterparts. These unique psychological challenges are due to discrimination, stigmatization, and identity-related struggles and can potentially benefit from generative conversational artificial intelligence (AI). As the latest advancement in AI, conversational agents and chatbots can imitate human conversation and support mental health, fostering diversity and inclusivity, combating stigma, and countering discrimination. In contrast, if not properly designed, they can perpetuate exclusion and inequities. Objective: This study aims to examine the impact of generative conversational AI on the LGBTQ community. Methods: This study was designed as a scoping review. Four electronic scholarly databases (Scopus, Embase, Web of Science, and MEDLINE via PubMed) and gray literature (Google Scholar) were consulted from inception without any language restrictions. Original studies focusing on the LGBTQ community or counselors working with this community exposed to chatbots and AI-enhanced internet-based platforms and exploring the feasibility, acceptance, or effectiveness of AI-enhanced tools were deemed eligible. The findings were reported in accordance with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews). Results: Seven applications (HIVST-Chatbot, TelePrEP Navigator, Amanda Selfie, Crisis Contact Simulator, REALbot, Tough Talks, and Queer AI) were included and reviewed. The chatbots and internet-based assistants identified served various purposes: (1) to identify LGBTQ individuals at risk of suicide or contracting HIV or other sexually transmitted infections, (2) to provide resources to LGBTQ youth from underserved areas, (3) facilitate HIV status disclosure to sex partners, and (4) develop training role-play personas encompassing the diverse experiences and intersecting identities of LGBTQ youth to educate counselors. The use of generative conversational AI for the LGBTQ community is still in its early stages. Initial studies have found that deploying chatbots is feasible and well received, with high ratings for usability and user satisfaction. However, there is room for improvement in terms of the content provided and making conversations more engaging and interactive. Many of these studies used small sample sizes and short-term interventions measuring limited outcomes. Conclusions: Generative conversational AI holds promise, but further development and formal evaluation are needed, including studies with larger samples, longer interventions, and randomized trials to compare different content, delivery methods, and dissemination platforms. In addition, a focus on engagement with behavioral objectives is essential to advance this field. The findings have broad practical implications, highlighting that AI’s impact spans various aspects of people’s lives. Assessing AI’s impact on diverse communities and adopting diversity-aware and intersectional approaches can help shape AI’s positive impact on society as a whole. %M 37864350 %R 10.2196/52091 %U https://www.jmir.org/2023/1/e52091 %U https://doi.org/10.2196/52091 %U http://www.ncbi.nlm.nih.gov/pubmed/37864350 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e48145 %T OpenDeID Pipeline for Unstructured Electronic Health Record Text Notes Based on Rules and Transformers: Deidentification Algorithm Development and Validation Study %A Liu,Jiaxing %A Gupta,Shalini %A Chen,Aipeng %A Wang,Chen-Kai %A Mishra,Pratik %A Dai,Hong-Jie %A Wong,Zoie Shui-Yee %A Jonnagaddala,Jitendra %+ School of Population Health, UNSW Sydney, F25 Samuels Building, Samuel Terry Ave, Kensington, NSW, 2033, Australia, 61 (02) 9385 2517, z3339253@unsw.edu.au %K deidentification %K scrubbing %K anonymization %K surrogate generation %K unstructured EHRs %K electronic health records %K BERT %K Bidirectional Encoder Representations from Transformers %D 2023 %7 6.12.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Electronic health records (EHRs) in unstructured formats are valuable sources of information for research in both the clinical and biomedical domains. However, before such records can be used for research purposes, sensitive health information (SHI) must be removed in several cases to protect patient privacy. Rule-based and machine learning–based methods have been shown to be effective in deidentification. However, very few studies investigated the combination of transformer-based language models and rules. Objective: The objective of this study is to develop a hybrid deidentification pipeline for Australian EHR text notes using rules and transformers. The study also aims to investigate the impact of pretrained word embedding and transformer-based language models. Methods: In this study, we present a hybrid deidentification pipeline called OpenDeID, which is developed using an Australian multicenter EHR-based corpus called OpenDeID Corpus. The OpenDeID corpus consists of 2100 pathology reports with 38,414 SHI entities from 1833 patients. The OpenDeID pipeline incorporates a hybrid approach of associative rules, supervised deep learning, and pretrained language models. Results: The OpenDeID achieved a best F1-score of 0.9659 by fine-tuning the Discharge Summary BioBERT model and incorporating various preprocessing and postprocessing rules. The OpenDeID pipeline has been deployed at a large tertiary teaching hospital and has processed over 8000 unstructured EHR text notes in real time. Conclusions: The OpenDeID pipeline is a hybrid deidentification pipeline to deidentify SHI entities in unstructured EHR text notes. The pipeline has been evaluated on a large multicenter corpus. External validation will be undertaken as part of our future work to evaluate the effectiveness of the OpenDeID pipeline. %M 38055317 %R 10.2196/48145 %U https://www.jmir.org/2023/1/e48145 %U https://doi.org/10.2196/48145 %U http://www.ncbi.nlm.nih.gov/pubmed/38055317 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 11 %N %P e53058 %T Risk Prediction of Emergency Department Visits in Patients With Lung Cancer Using Machine Learning: Retrospective Observational Study %A Lee,Ah Ra %A Park,Hojoon %A Yoo,Aram %A Kim,Seok %A Sunwoo,Leonard %A Yoo,Sooyoung %+ Office of eHealth Research and Business, Seoul National University Bundang Hospital, 172, Dolma-ro, Bundang-gu, Seongnam-si, 13605, Republic of Korea, 82 31 787 8980, yoosoo0@snubh.org %K emergency department %K lung cancer %K risk prediction %K machine learning %K common data model %K emergency %K hospitalization %K hospitalizations %K lung %K cancer %K oncology %K lungs %K pulmonary %K respiratory %K predict %K prediction %K predictions %K predictive %K algorithm %K algorithms %K risk %K risks %K model %K models %D 2023 %7 6.12.2023 %9 Original Paper %J JMIR Med Inform %G English %X Background: Patients with lung cancer are among the most frequent visitors to emergency departments due to cancer-related problems, and the prognosis for those who seek emergency care is dismal. Given that patients with lung cancer frequently visit health care facilities for treatment or follow-up, the ability to predict emergency department visits based on clinical information gleaned from their routine visits would enhance hospital resource utilization and patient outcomes. Objective: This study proposed a machine learning–based prediction model to identify risk factors for emergency department visits by patients with lung cancer. Methods: This was a retrospective observational study of patients with lung cancer diagnosed at Seoul National University Bundang Hospital, a tertiary general hospital in South Korea, between January 2010 and December 2017. The primary outcome was an emergency department visit within 30 days of an outpatient visit. This study developed a machine learning–based prediction model using a common data model. In addition, the importance of features that influenced the decision-making of the model output was analyzed to identify significant clinical factors. Results: The model with the best performance demonstrated an area under the receiver operating characteristic curve of 0.73 in its ability to predict the attendance of patients with lung cancer in emergency departments. The frequency of recent visits to the emergency department and several laboratory test results that are typically collected during cancer treatment follow-up visits were revealed as influencing factors for the model output. Conclusions: This study developed a machine learning–based risk prediction model using a common data model and identified influencing factors for emergency department visits by patients with lung cancer. The predictive model contributes to the efficiency of resource utilization and health care service quality by facilitating the identification and early intervention of high-risk patients. This study demonstrated the possibility of collaborative research among different institutions using the common data model for precision medicine in lung cancer. %M 38055320 %R 10.2196/53058 %U https://medinform.jmir.org/2023/1/e53058 %U https://doi.org/10.2196/53058 %U http://www.ncbi.nlm.nih.gov/pubmed/38055320 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e52888 %T Developing Ethics and Equity Principles, Terms, and Engagement Tools to Advance Health Equity and Researcher Diversity in AI and Machine Learning: Modified Delphi Approach %A Hendricks-Sturrup,Rachele %A Simmons,Malaika %A Anders,Shilo %A Aneni,Kammarauche %A Wright Clayton,Ellen %A Coco,Joseph %A Collins,Benjamin %A Heitman,Elizabeth %A Hussain,Sajid %A Joshi,Karuna %A Lemieux,Josh %A Lovett Novak,Laurie %A Rubin,Daniel J %A Shanker,Anil %A Washington,Talitha %A Waters,Gabriella %A Webb Harris,Joyce %A Yin,Rui %A Wagner,Teresa %A Yin,Zhijun %A Malin,Bradley %+ National Alliance Against Disparities in Patient Health, 2700 Neabsco Common Place, Suite 101, Woodbridge, VA, 22191, United States, 1 (571) 316 5116, hendricks-sturrup@nadph.org %K artificial intelligence %K AI %K Delphi %K disparities %K disparity %K engagement %K equitable %K equities %K equity %K ethic %K ethical %K ethics %K fair %K fairness %K health disparities %K health equity %K humanitarian %K machine learning %K ML %D 2023 %7 6.12.2023 %9 Original Paper %J JMIR AI %G English %X Background: Artificial intelligence (AI) and machine learning (ML) technology design and development continues to be rapid, despite major limitations in its current form as a practice and discipline to address all sociohumanitarian issues and complexities. From these limitations emerges an imperative to strengthen AI and ML literacy in underserved communities and build a more diverse AI and ML design and development workforce engaged in health research. Objective: AI and ML has the potential to account for and assess a variety of factors that contribute to health and disease and to improve prevention, diagnosis, and therapy. Here, we describe recent activities within the Artificial Intelligence/Machine Learning Consortium to Advance Health Equity and Researcher Diversity (AIM-AHEAD) Ethics and Equity Workgroup (EEWG) that led to the development of deliverables that will help put ethics and fairness at the forefront of AI and ML applications to build equity in biomedical research, education, and health care. Methods: The AIM-AHEAD EEWG was created in 2021 with 3 cochairs and 51 members in year 1 and 2 cochairs and ~40 members in year 2. Members in both years included AIM-AHEAD principal investigators, coinvestigators, leadership fellows, and research fellows. The EEWG used a modified Delphi approach using polling, ranking, and other exercises to facilitate discussions around tangible steps, key terms, and definitions needed to ensure that ethics and fairness are at the forefront of AI and ML applications to build equity in biomedical research, education, and health care. Results: The EEWG developed a set of ethics and equity principles, a glossary, and an interview guide. The ethics and equity principles comprise 5 core principles, each with subparts, which articulate best practices for working with stakeholders from historically and presently underrepresented communities. The glossary contains 12 terms and definitions, with particular emphasis on optimal development, refinement, and implementation of AI and ML in health equity research. To accompany the glossary, the EEWG developed a concept relationship diagram that describes the logical flow of and relationship between the definitional concepts. Lastly, the interview guide provides questions that can be used or adapted to garner stakeholder and community perspectives on the principles and glossary. Conclusions: Ongoing engagement is needed around our principles and glossary to identify and predict potential limitations in their uses in AI and ML research settings, especially for institutions with limited resources. This requires time, careful consideration, and honest discussions around what classifies an engagement incentive as meaningful to support and sustain their full engagement. By slowing down to meet historically and presently underresourced institutions and communities where they are and where they are capable of engaging and competing, there is higher potential to achieve needed diversity, ethics, and equity in AI and ML implementation in health research. %M 38875540 %R 10.2196/52888 %U https://ai.jmir.org/2023/1/e52888 %U https://doi.org/10.2196/52888 %U http://www.ncbi.nlm.nih.gov/pubmed/38875540 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e51603 %T How Can the Clinical Aptitude of AI Assistants Be Assayed? %A Thirunavukarasu,Arun James %+ Oxford University Clinical Academic Graduate School, University of Oxford, John Radcliffe Hospital, Level 3, Oxford, OX3 9DU, United Kingdom, 44 1865 289 467, ajt205@cantab.ac.uk %K artificial intelligence %K AI %K validation %K clinical decision aid %K artificial general intelligence %K foundation models %K large language models %K LLM %K language model %K ChatGPT %K chatbot %K chatbots %K conversational agent %K conversational agents %K pitfall %K pitfalls %K pain point %K pain points %K implementation %K barrier %K barriers %K challenge %K challenges %D 2023 %7 5.12.2023 %9 Viewpoint %J J Med Internet Res %G English %X Large language models (LLMs) are exhibiting remarkable performance in clinical contexts, with exemplar results ranging from expert-level attainment in medical examination questions to superior accuracy and relevance when responding to patient queries compared to real doctors replying to queries on social media. The deployment of LLMs in conventional health care settings is yet to be reported, and there remains an open question as to what evidence should be required before such deployment is warranted. Early validation studies use unvalidated surrogate variables to represent clinical aptitude, and it may be necessary to conduct prospective randomized controlled trials to justify the use of an LLM for clinical advice or assistance, as potential pitfalls and pain points cannot be exhaustively predicted. This viewpoint states that as LLMs continue to revolutionize the field, there is an opportunity to improve the rigor of artificial intelligence (AI) research to reward innovation, conferring real benefits to real patients. %M 38051572 %R 10.2196/51603 %U https://www.jmir.org/2023/1/e51603 %U https://doi.org/10.2196/51603 %U http://www.ncbi.nlm.nih.gov/pubmed/38051572 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 10 %N %P e49894 %T Predicting Patients' Satisfaction With Mental Health Drug Treatment Using Their Reviews: Unified Interchangeable Model Fusion Approach %A Wang,Yi %A Yu,Yide %A Liu,Yue %A Ma,Yan %A Pang,Patrick Cheong-Iao %+ Faculty of Applied Sciences, Macao Polytechnic University, Rua de Luís Gonzaga Gomes, Macao, 999078, Macao, 853 85996433, yue.liu@mpu.edu.mo %K artificial intelligence %K AI %K mental disorder %K psychotherapy effectiveness %K deep learning %K machine learning %K natural language processing %K NLP %K data imbalance %K model fusion %D 2023 %7 5.12.2023 %9 Original Paper %J JMIR Ment Health %G English %X Background: After the COVID-19 pandemic, the conflict between limited mental health care resources and the rapidly growing number of patients has become more pronounced. It is necessary for psychologists to borrow artificial intelligence (AI)–based methods to analyze patients’ satisfaction with drug treatment for those undergoing mental illness treatment. Objective: Our goal was to construct highly accurate and transferable models for predicting the satisfaction of patients with mental illness with medication by analyzing their own experiences and comments related to medication intake. Methods: We extracted 41,851 reviews in 20 categories of disorders related to mental illnesses from a large public data set of 161,297 reviews in 16,950 illness categories. To discover a more optimal structure of the natural language processing models, we proposed the Unified Interchangeable Model Fusion to decompose the state-of-the-art Bidirectional Encoder Representations from Transformers (BERT), support vector machine, and random forest (RF) models into 2 modules, the encoder and the classifier, and then reconstruct fused “encoder+classifer” models to accurately evaluate patients’ satisfaction. The fused models were divided into 2 categories in terms of model structures, traditional machine learning–based models and neural network–based models. A new loss function was proposed for those neural network–based models to overcome overfitting and data imbalance. Finally, we fine-tuned the fused models and evaluated their performance comprehensively in terms of F1-score, accuracy, κ coefficient, and training time using 10-fold cross-validation. Results: Through extensive experiments, the transformer bidirectional encoder+RF model outperformed the state-of-the-art BERT, MentalBERT, and other fused models. It became the optimal model for predicting the patients’ satisfaction with drug treatment. It achieved an average graded F1-score of 0.872, an accuracy of 0.873, and a κ coefficient of 0.806. This model is suitable for high-standard users with sufficient computing resources. Alternatively, it turned out that the word-embedding encoder+RF model showed relatively good performance with an average graded F1-score of 0.801, an accuracy of 0.812, and a κ coefficient of 0.695 but with much less training time. It can be deployed in environments with limited computing resources. Conclusions: We analyzed the performance of support vector machine, RF, BERT, MentalBERT, and all fused models and identified the optimal models for different clinical scenarios. The findings can serve as evidence to support that the natural language processing methods can effectively assist psychologists in evaluating the satisfaction of patients with drug treatment programs and provide precise and standardized solutions. The Unified Interchangeable Model Fusion provides a different perspective on building AI models in mental health and has the potential to fuse the strengths of different components of the models into a single model, which may contribute to the development of AI in mental health. %M 38051580 %R 10.2196/49894 %U https://mental.jmir.org/2023/1/e49894 %U https://doi.org/10.2196/49894 %U http://www.ncbi.nlm.nih.gov/pubmed/38051580 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e49183 %T ChatGPT Versus Consultants: Blinded Evaluation on Answering Otorhinolaryngology Case–Based Questions %A Buhr,Christoph Raphael %A Smith,Harry %A Huppertz,Tilman %A Bahr-Hamm,Katharina %A Matthias,Christoph %A Blaikie,Andrew %A Kelsey,Tom %A Kuhn,Sebastian %A Eckrich,Jonas %+ Department of Otorhinolaryngology, University Medical Center of the Johannes Gutenberg-University Mainz, Langenbeckstraße 1, Mainz, 55131, Germany, 49 6131 17 7361, buhrchri@uni-mainz.de %K large language models %K LLMs %K LLM %K artificial intelligence %K AI %K ChatGPT %K otorhinolaryngology %K ORL %K digital health %K chatbots %K global health %K low- and middle-income countries %K telemedicine %K telehealth %K language model %K chatbot %D 2023 %7 5.12.2023 %9 Original Paper %J JMIR Med Educ %G English %X Background: Large language models (LLMs), such as ChatGPT (Open AI), are increasingly used in medicine and supplement standard search engines as information sources. This leads to more “consultations” of LLMs about personal medical symptoms. Objective: This study aims to evaluate ChatGPT’s performance in answering clinical case–based questions in otorhinolaryngology (ORL) in comparison to ORL consultants’ answers. Methods: We used 41 case-based questions from established ORL study books and past German state examinations for doctors. The questions were answered by both ORL consultants and ChatGPT 3. ORL consultants rated all responses, except their own, on medical adequacy, conciseness, coherence, and comprehensibility using a 6-point Likert scale. They also identified (in a blinded setting) if the answer was created by an ORL consultant or ChatGPT. Additionally, the character count was compared. Due to the rapidly evolving pace of technology, a comparison between responses generated by ChatGPT 3 and ChatGPT 4 was included to give an insight into the evolving potential of LLMs. Results: Ratings in all categories were significantly higher for ORL consultants (P<.001). Although inferior to the scores of the ORL consultants, ChatGPT’s scores were relatively higher in semantic categories (conciseness, coherence, and comprehensibility) compared to medical adequacy. ORL consultants identified ChatGPT as the source correctly in 98.4% (121/123) of cases. ChatGPT’s answers had a significantly higher character count compared to ORL consultants (P<.001). Comparison between responses generated by ChatGPT 3 and ChatGPT 4 showed a slight improvement in medical accuracy as well as a better coherence of the answers provided. Contrarily, neither the conciseness (P=.06) nor the comprehensibility (P=.08) improved significantly despite the significant increase in the mean amount of characters by 52.5% (n= (1470-964)/964; P<.001). Conclusions: While ChatGPT provided longer answers to medical problems, medical adequacy and conciseness were significantly lower compared to ORL consultants’ answers. LLMs have potential as augmentative tools for medical care, but their “consultation” for medical problems carries a high risk of misinformation as their high semantic quality may mask contextual deficits. %M 38051578 %R 10.2196/49183 %U https://mededu.jmir.org/2023/1/e49183 %U https://doi.org/10.2196/49183 %U http://www.ncbi.nlm.nih.gov/pubmed/38051578 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e49374 %T Investigating the Impact of Automation on the Health Care Workforce Through Autonomous Telemedicine in the Cataract Pathway: Protocol for a Multicenter Study %A Khavandi,Sarah %A Zaghloul,Fatema %A Higham,Aisling %A Lim,Ernest %A de Pennington,Nick %A Celi,Leo Anthony %+ Ufonia, 104 Gloucester Green, Oxford, OX1 2BU, United Kingdom, 44 07931531022, sk@ufonia.co %K artificial intelligence %K autonomous telemedicine %K clinician burnout %K clinician wellbeing %K conversational agent %K digital health %K health communication %K health information technology %K health services %K healthcare %K medical informatics %K socio-technical system approach %K systems approach %K technology acceptability %D 2023 %7 5.12.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: While digital health innovations are increasingly being adopted by health care organizations, implementation is often carried out without considering the impacts on frontline staff who will be using the technology and who will be affected by its introduction. The enthusiasm surrounding the use of artificial intelligence (AI)–enabled digital solutions in health care is tempered by uncertainty around how it will change the working lives and practices of health care professionals. Digital enablement can be viewed as facilitating enhanced effectiveness and efficiency by improving services and automating cognitive labor, yet the implementation of such AI technology comes with challenges related to changes in work practices brought by automation. This research explores staff experiences before and after care pathway automation with an autonomous clinical conversational assistant, Dora (Ufonia Ltd), that is able to automate routine clinical conversations. Objective: The primary objective is to examine the impact of AI-enabled automation on clinicians, allied health professionals, and administrators who provide or facilitate health care to patients in high-volume, low-complexity care pathways. In the process of transforming care pathways through automation of routine tasks, staff will increasingly “work at the top of their license.” The impact of this fundamental change on the professional identity, well-being, and work practices of the individual is poorly understood at present. Methods: We will adopt a multiple case study approach, combining qualitative and quantitative data collection methods, over 2 distinct phases, namely phase A (preimplementation) and phase B (postimplementation). Results: The analysis is expected to reveal the interrelationship between Dora and those affected by its introduction. This will reveal how tasks and responsibilities have changed or shifted, current tensions and contradictions, ways of working, and challenges, benefits, and opportunities as perceived by those on the frontlines of the health care system. The findings will enable a better understanding of the resistance or susceptibility of different stakeholders within the health care workforce and encourage managerial awareness of differing needs, demands, and uncertainties. Conclusions: The implementation of AI in the health care sector, as well as the body of research on this topic, remain in their infancy. The project’s key contribution will be to understand the impact of AI-enabled automation on the health care workforce and their work practices. International Registered Report Identifier (IRRID): PRR1-10.2196/49374 %M 38051569 %R 10.2196/49374 %U https://www.researchprotocols.org/2023/1/e49374 %U https://doi.org/10.2196/49374 %U http://www.ncbi.nlm.nih.gov/pubmed/38051569 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e49147 %T A Stable and Scalable Digital Composite Neurocognitive Test for Early Dementia Screening Based on Machine Learning: Model Development and Validation Study %A Gu,Dongmei %A Lv,Xiaozhen %A Shi,Chuan %A Zhang,Tianhong %A Liu,Sha %A Fan,Zili %A Tu,Lihui %A Zhang,Ming %A Zhang,Nan %A Chen,Liming %A Wang,Zhijiang %A Wang,Jing %A Zhang,Ying %A Li,Huizi %A Wang,Luchun %A Zhu,Jiahui %A Zheng,Yaonan %A Wang,Huali %A Yu,Xin %A , %+ Clinical Research Division, Dementia Care and Research Center, Peking University Institute of Mental Health (Sixth Hospital), Huayuanbei Road 51, Haidian District, Beijing, 100191, China, 86 10 82801999, yuxin@bjmu.edu.cn %K mild cognitive impairment %K digital cognitive assessment %K machine learning %K neurocognitive test %K cognitive screening %K dementia %D 2023 %7 1.12.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Dementia has become a major public health concern due to its heavy disease burden. Mild cognitive impairment (MCI) is a transitional stage between healthy aging and dementia. Early identification of MCI is an essential step in dementia prevention. Objective: Based on machine learning (ML) methods, this study aimed to develop and validate a stable and scalable panel of cognitive tests for the early detection of MCI and dementia based on the Chinese Neuropsychological Consensus Battery (CNCB) in the Chinese Neuropsychological Normative Project (CN-NORM) cohort. Methods: CN-NORM was a nationwide, multicenter study conducted in China with 871 participants, including an MCI group (n=327, 37.5%), a dementia group (n=186, 21.4%), and a cognitively normal (CN) group (n=358, 41.1%). We used the following 4 algorithms to select candidate variables: the F-score according to the SelectKBest method, the area under the curve (AUC) from logistic regression (LR), P values from the logit method, and backward stepwise elimination. Different models were constructed after considering the administration duration and complexity of combinations of various tests. Receiver operating characteristic curve and AUC metrics were used to evaluate the discriminative ability of the models via stratified sampling cross-validation and LR and support vector classification (SVC) algorithms. This model was further validated in the Alzheimer’s Disease Neuroimaging Initiative phase 3 (ADNI-3) cohort (N=743), which included 416 (56%) CN subjects, 237 (31.9%) patients with MCI, and 90 (12.1%) patients with dementia. Results: Except for social cognition, all other domains in the CNCB differed between the MCI and CN groups (P<.008). In feature selection results regarding discrimination between the MCI and CN groups, the Hopkins Verbal Learning Test-5 minutes Recall had the best performance, with the highest mean AUC of up to 0.80 (SD 0.02) and an F-score of up to 258.70. The scalability of model 5 (Hopkins Verbal Learning Test-5 minutes Recall and Trail Making Test-B) was the lowest. Model 5 achieved a higher level of discrimination than the Hong Kong Brief Cognitive test score in distinguishing between the MCI and CN groups (P<.05). Model 5 also provided the highest sensitivity of up to 0.82 (range 0.72-0.92) and 0.83 (range 0.75-0.91) according to LR and SVC, respectively. This model yielded a similar robust discriminative performance in the ADNI-3 cohort regarding differentiation between the MCI and CN groups, with a mean AUC of up to 0.81 (SD 0) according to both LR and SVC algorithms. Conclusions: We developed a stable and scalable composite neurocognitive test based on ML that could differentiate not only between patients with MCI and controls but also between patients with different stages of cognitive impairment. This composite neurocognitive test is a feasible and practical digital biomarker that can potentially be used in large-scale cognitive screening and intervention studies. %M 38039074 %R 10.2196/49147 %U https://www.jmir.org/2023/1/e49147 %U https://doi.org/10.2196/49147 %U http://www.ncbi.nlm.nih.gov/pubmed/38039074 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e53466 %T Developing Medical Education Curriculum Reform Strategies to Address the Impact of Generative AI: Qualitative Study %A Shimizu,Ikuo %A Kasai,Hajime %A Shikino,Kiyoshi %A Araki,Nobuyuki %A Takahashi,Zaiya %A Onodera,Misaki %A Kimura,Yasuhiko %A Tsukamoto,Tomoko %A Yamauchi,Kazuyo %A Asahina,Mayumi %A Ito,Shoichi %A Kawakami,Eiryo %+ Department of Medical Education, Graduate School of Medicine, Chiba University, 1-8-1 Inohana, Chiba, 2608672, Japan, 81 432262816, qingshuiyufu@gmail.com %K artificial intelligence %K curriculum reform %K generative artificial intelligence %K large language models %K medical education %K qualitative analysis %K strengths-weaknesses-opportunities-threats (SWOT) framework %D 2023 %7 30.11.2023 %9 Original Paper %J JMIR Med Educ %G English %X Background: Generative artificial intelligence (GAI), represented by large language models, have the potential to transform health care and medical education. In particular, GAI’s impact on higher education has the potential to change students’ learning experience as well as faculty’s teaching. However, concerns have been raised about ethical consideration and decreased reliability of the existing examinations. Furthermore, in medical education, curriculum reform is required to adapt to the revolutionary changes brought about by the integration of GAI into medical practice and research. Objective: This study analyzes the impact of GAI on medical education curricula and explores strategies for adaptation. Methods: The study was conducted in the context of faculty development at a medical school in Japan. A workshop involving faculty and students was organized, and participants were divided into groups to address two research questions: (1) How does GAI affect undergraduate medical education curricula? and (2) How should medical school curricula be reformed to address the impact of GAI? The strength, weakness, opportunity, and threat (SWOT) framework was used, and cross-SWOT matrix analysis was used to devise strategies. Further, 4 researchers conducted content analysis on the data generated during the workshop discussions. Results: The data were collected from 8 groups comprising 55 participants. Further, 5 themes about the impact of GAI on medical education curricula emerged: improvement of teaching and learning, improved access to information, inhibition of existing learning processes, problems in GAI, and changes in physicians’ professionality. Positive impacts included enhanced teaching and learning efficiency and improved access to information, whereas negative impacts included concerns about reduced independent thinking and the adaptability of existing assessment methods. Further, GAI was perceived to change the nature of physicians’ expertise. Three themes emerged from the cross-SWOT analysis for curriculum reform: (1) learning about GAI, (2) learning with GAI, and (3) learning aside from GAI. Participants recommended incorporating GAI literacy, ethical considerations, and compliance into the curriculum. Learning with GAI involved improving learning efficiency, supporting information gathering and dissemination, and facilitating patient involvement. Learning aside from GAI emphasized maintaining GAI-free learning processes, fostering higher cognitive domains of learning, and introducing more communication exercises. Conclusions: This study highlights the profound impact of GAI on medical education curricula and provides insights into curriculum reform strategies. Participants recognized the need for GAI literacy, ethical education, and adaptive learning. Further, GAI was recognized as a tool that can enhance efficiency and involve patients in education. The study also suggests that medical education should focus on competencies that GAI hardly replaces, such as clinical experience and communication. Notably, involving both faculty and students in curriculum reform discussions fosters a sense of ownership and ensures broader perspectives are encompassed. %M 38032695 %R 10.2196/53466 %U https://mededu.jmir.org/2023/1/e53466 %U https://doi.org/10.2196/53466 %U http://www.ncbi.nlm.nih.gov/pubmed/38032695 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e51243 %T Can we use ChatGPT for Mental Health and Substance Use Education? Examining Its Quality and Potential Harms %A Spallek,Sophia %A Birrell,Louise %A Kershaw,Stephanie %A Devine,Emma Krogh %A Thornton,Louise %+ The Matilda Centre for Research in Mental Health and Substance Use, The University of Sydney, Level 6, Jane Foss Russell Building (G02), Sydney, 2006, Australia, 61 02 8627 9048, sophia.spallek@sydney.edu.au %K artificial intelligence %K generative artificial intelligence %K large language models %K ChatGPT %K medical education %K health education %K patient education handout %K preventive health services %K educational intervention %K mental health %K substance use %D 2023 %7 30.11.2023 %9 Viewpoint %J JMIR Med Educ %G English %X Background: The use of generative artificial intelligence, more specifically large language models (LLMs), is proliferating, and as such, it is vital to consider both the value and potential harms of its use in medical education. Their efficiency in a variety of writing styles makes LLMs, such as ChatGPT, attractive for tailoring educational materials. However, this technology can feature biases and misinformation, which can be particularly harmful in medical education settings, such as mental health and substance use education. This viewpoint investigates if ChatGPT is sufficient for 2 common health education functions in the field of mental health and substance use: (1) answering users’ direct queries and (2) aiding in the development of quality consumer educational health materials. Objective: This viewpoint includes a case study to provide insight into the accessibility, biases, and quality of ChatGPT’s query responses and educational health materials. We aim to provide guidance for the general public and health educators wishing to utilize LLMs. Methods: We collected real world queries from 2 large-scale mental health and substance use portals and engineered a variety of prompts to use on GPT-4 Pro with the Bing BETA internet browsing plug-in. The outputs were evaluated with tools from the Sydney Health Literacy Lab to determine the accessibility, the adherence to Mindframe communication guidelines to identify biases, and author assessments on quality, including tailoring to audiences, duty of care disclaimers, and evidence-based internet references. Results: GPT-4’s outputs had good face validity, but upon detailed analysis were substandard in comparison to expert-developed materials. Without engineered prompting, the reading level, adherence to communication guidelines, and use of evidence-based websites were poor. Therefore, all outputs still required cautious human editing and oversight. Conclusions: GPT-4 is currently not reliable enough for direct-consumer queries, but educators and researchers can use it for creating educational materials with caution. Materials created with LLMs should disclose the use of generative artificial intelligence and be evaluated on their efficacy with the target audience. %M 38032714 %R 10.2196/51243 %U https://mededu.jmir.org/2023/1/e51243 %U https://doi.org/10.2196/51243 %U http://www.ncbi.nlm.nih.gov/pubmed/38032714 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e48142 %T Developing and Evaluating an AI-Based Computer-Aided Diagnosis System for Retinal Disease: Diagnostic Study for Central Serous Chorioretinopathy %A Yoon,Jeewoo %A Han,Jinyoung %A Ko,Junseo %A Choi,Seong %A Park,Ji In %A Hwang,Joon Seo %A Han,Jeong Mo %A Hwang,Daniel Duck-Jin %+ Department of Ophthalmology, Hangil Eye Hospital, 35 Bupyeong-daero, Bupyeong-gu, Incheon, Incheon, 21388, Republic of Korea, 82 327175808, daniel.dj.hwang@gmail.com %K computer aided diagnosis %K ophthalmology %K deep learning %K artificial intelligence %K computer vision %K imaging informatics %K retinal disease %K central serous chorioretinopathy %K diagnostic study %D 2023 %7 29.11.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Although previous research has made substantial progress in developing high-performance artificial intelligence (AI)–based computer-aided diagnosis (AI-CAD) systems in various medical domains, little attention has been paid to developing and evaluating AI-CAD system in ophthalmology, particularly for diagnosing retinal diseases using optical coherence tomography (OCT) images. Objective: This diagnostic study aimed to determine the usefulness of a proposed AI-CAD system in assisting ophthalmologists with the diagnosis of central serous chorioretinopathy (CSC), which is known to be difficult to diagnose, using OCT images. Methods: For the training and evaluation of the proposed deep learning model, 1693 OCT images were collected and annotated. The data set included 929 and 764 cases of acute and chronic CSC, respectively. In total, 66 ophthalmologists (2 groups: 36 retina and 30 nonretina specialists) participated in the observer performance test. To evaluate the deep learning algorithm used in the proposed AI-CAD system, the training, validation, and test sets were split in an 8:1:1 ratio. Further, 100 randomly sampled OCT images from the test set were used for the observer performance test, and the participants were instructed to select a CSC subtype for each of these images. Each image was provided under different conditions: (1) without AI assistance, (2) with AI assistance with a probability score, and (3) with AI assistance with a probability score and visual evidence heatmap. The sensitivity, specificity, and area under the receiver operating characteristic curve were used to measure the diagnostic performance of the model and ophthalmologists. Results: The proposed system achieved a high detection performance (99% of the area under the curve) for CSC, outperforming the 66 ophthalmologists who participated in the observer performance test. In both groups, ophthalmologists with the support of AI assistance with a probability score and visual evidence heatmap achieved the highest mean diagnostic performance compared with that of those subjected to other conditions (without AI assistance or with AI assistance with a probability score). Nonretina specialists achieved expert-level diagnostic performance with the support of the proposed AI-CAD system. Conclusions: Our proposed AI-CAD system improved the diagnosis of CSC by ophthalmologists, which may support decision-making regarding retinal disease detection and alleviate the workload of ophthalmologists. %M 38019564 %R 10.2196/48142 %U https://www.jmir.org/2023/1/e48142 %U https://doi.org/10.2196/48142 %U http://www.ncbi.nlm.nih.gov/pubmed/38019564 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e50886 %T Automated Machine Learning Analysis of Patients With Chronic Skin Disease Using a Medical Smartphone App: Retrospective Study %A Bibi,Igor %A Schaffert,Daniel %A Blauth,Mara %A Lull,Christian %A von Ahnen,Jan Alwin %A Gross,Georg %A Weigandt,Wanja Alexander %A Knitza,Johannes %A Kuhn,Sebastian %A Benecke,Johannes %A Leipe,Jan %A Schmieder,Astrid %A Olsavszky,Victor %+ Department of Dermatology, Venereology and Allergology, University Medical Center and Medical Faculty Mannheim, Center of Excellence in Dermatology, Heidelberg University, Theodor-Kutzer-Ufer 1-3, Mannheim, 68167, Germany, 49 6213832280, victor.olsavszky@medma.uni-heidelberg.de %K automated machine learning %K psoriasis %K hand and foot eczema %K medical smartphone app %K application %K smartphone %K machine learning %K digitalization %K skin %K skin disease %K use %K hand %K foot %K mobile phone %D 2023 %7 28.11.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Rapid digitalization in health care has led to the adoption of digital technologies; however, limited trust in internet-based health decisions and the need for technical personnel hinder the use of smartphones and machine learning applications. To address this, automated machine learning (AutoML) is a promising tool that can empower health care professionals to enhance the effectiveness of mobile health apps. Objective: We used AutoML to analyze data from clinical studies involving patients with chronic hand and/or foot eczema or psoriasis vulgaris who used a smartphone monitoring app. The analysis focused on itching, pain, Dermatology Life Quality Index (DLQI) development, and app use. Methods: After extensive data set preparation, which consisted of combining 3 primary data sets by extracting common features and by computing new features, a new pseudonymized secondary data set with a total of 368 patients was created. Next, multiple machine learning classification models were built during AutoML processing, with the most accurate models ultimately selected for further data set analysis. Results: Itching development for 6 months was accurately modeled using the light gradient boosted trees classifier model (log loss: 0.9302 for validation, 1.0193 for cross-validation, and 0.9167 for holdout). Pain development for 6 months was assessed using the random forest classifier model (log loss: 1.1799 for validation, 1.1561 for cross-validation, and 1.0976 for holdout). Then, the random forest classifier model (log loss: 1.3670 for validation, 1.4354 for cross-validation, and 1.3974 for holdout) was used again to estimate the DLQI development for 6 months. Finally, app use was analyzed using an elastic net blender model (area under the curve: 0.6567 for validation, 0.6207 for cross-validation, and 0.7232 for holdout). Influential feature correlations were identified, including BMI, age, disease activity, DLQI, and Hospital Anxiety and Depression Scale-Anxiety scores at follow-up. App use increased with BMI >35, was less common in patients aged >47 years and those aged 23 to 31 years, and was more common in those with higher disease activity. A Hospital Anxiety and Depression Scale-Anxiety score >8 had a slightly positive effect on app use. Conclusions: This study provides valuable insights into the relationship between data characteristics and targeted outcomes in patients with chronic eczema or psoriasis, highlighting the potential of smartphone and AutoML techniques in improving chronic disease management and patient care. %M 38015608 %R 10.2196/50886 %U https://www.jmir.org/2023/1/e50886 %U https://doi.org/10.2196/50886 %U http://www.ncbi.nlm.nih.gov/pubmed/38015608 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e47551 %T Security Implications of AI Chatbots in Health Care %A Li,Jingquan %+ Hofstra University, Department of Information Systems and Business Analytics, Frank G. Zarb School of Business, Hempstead, NY, 11549, United States, 1 516 463 8823, Jingquan.Li@hofstra.edu %K security %K privacy %K chatbot %K AI %K artificial intelligence %K health information %K HIPAA %K ChatGPT %K computer program %K natural language processing %K tool %K improvement %K patient care %K care %K data security %K guidelines %K risk %K policy %D 2023 %7 28.11.2023 %9 Viewpoint %J J Med Internet Res %G English %X Artificial intelligence (AI) chatbots like ChatGPT and Google Bard are computer programs that use AI and natural language processing to understand customer questions and generate natural, fluid, dialogue-like responses to their inputs. ChatGPT, an AI chatbot created by OpenAI, has rapidly become a widely used tool on the internet. AI chatbots have the potential to improve patient care and public health. However, they are trained on massive amounts of people’s data, which may include sensitive patient data and business information. The increased use of chatbots introduces data security issues, which should be handled yet remain understudied. This paper aims to identify the most important security problems of AI chatbots and propose guidelines for protecting sensitive health information. It explores the impact of using ChatGPT in health care. It also identifies the principal security risks of ChatGPT and suggests key considerations for security risk mitigation. It concludes by discussing the policy implications of using AI chatbots in health care. %M 38015597 %R 10.2196/47551 %U https://www.jmir.org/2023/1/e47551 %U https://doi.org/10.2196/47551 %U http://www.ncbi.nlm.nih.gov/pubmed/38015597 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 11 %N %P e49886 %T A Large Language Model Screening Tool to Target Patients for Best Practice Alerts: Development and Validation %A Savage,Thomas %A Wang,John %A Shieh,Lisa %+ Division of Hospital Medicine, Department of Medicine, Stanford University, 300 Pasteur Drive, Palo Alto, CA, 94304, United States, 1 6507234000, tsavage@stanford.edu %K large language models %K language models %K language model %K EHR %K health record %K health records %K quality improvement %K Artificial Intelligence %K Natural Language Processing %D 2023 %7 27.11.2023 %9 Original Paper %J JMIR Med Inform %G English %X Background: Best Practice Alerts (BPAs) are alert messages to physicians in the electronic health record that are used to encourage appropriate use of health care resources. While these alerts are helpful in both improving care and reducing costs, BPAs are often broadly applied nonselectively across entire patient populations. The development of large language models (LLMs) provides an opportunity to selectively identify patients for BPAs. Objective: In this paper, we present an example case where an LLM screening tool is used to select patients appropriate for a BPA encouraging the prescription of deep vein thrombosis (DVT) anticoagulation prophylaxis. The artificial intelligence (AI) screening tool was developed to identify patients experiencing acute bleeding and exclude them from receiving a DVT prophylaxis BPA. Methods: Our AI screening tool used a BioMed-RoBERTa (Robustly Optimized Bidirectional Encoder Representations from Transformers Pretraining Approach; AllenAI) model to perform classification of physician notes, identifying patients without active bleeding and thus appropriate for a thromboembolism prophylaxis BPA. The BioMed-RoBERTa model was fine-tuned using 500 history and physical notes of patients from the MIMIC-III (Medical Information Mart for Intensive Care) database who were not prescribed anticoagulation. A development set of 300 MIMIC patient notes was used to determine the model’s hyperparameters, and a separate test set of 300 patient notes was used to evaluate the screening tool. Results: Our MIMIC-III test set population of 300 patients included 72 patients with bleeding (ie, were not appropriate for a DVT prophylaxis BPA) and 228 without bleeding who were appropriate for a DVT prophylaxis BPA. The AI screening tool achieved impressive accuracy with a precision-recall area under the curve of 0.82 (95% CI 0.75-0.89) and a receiver operator curve area under the curve of 0.89 (95% CI 0.84-0.94). The screening tool reduced the number of patients who would trigger an alert by 20% (240 instead of 300 alerts) and increased alert applicability by 14.8% (218 [90.8%] positive alerts from 240 total alerts instead of 228 [76%] positive alerts from 300 total alerts), compared to nonselectively sending alerts for all patients. Conclusions: These results show a proof of concept on how language models can be used as a screening tool for BPAs. We provide an example AI screening tool that uses a HIPAA (Health Insurance Portability and Accountability Act)–compliant BioMed-RoBERTa model deployed with minimal computing power. Larger models (eg, Generative Pre-trained Transformers–3, Generative Pre-trained Transformers–4, and Pathways Language Model) will exhibit superior performance but require data use agreements to be HIPAA compliant. We anticipate LLMs to revolutionize quality improvement in hospital medicine. %M 38010803 %R 10.2196/49886 %U https://medinform.jmir.org/2023/1/e49886 %U https://doi.org/10.2196/49886 %U http://www.ncbi.nlm.nih.gov/pubmed/38010803 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e47762 %T The Readability and Quality of Web-Based Patient Information on Nasopharyngeal Carcinoma: Quantitative Content Analysis %A Tan,Denise Jia Yun %A Ko,Tsz Ki %A Fan,Ka Siu %+ Department of Surgery, Royal Stoke University Hospital, Newcastle Rd, Stoke on Trent, ST4 6QG, United Kingdom, 44 7378977812, tszkiko95@gmail.com %K nasopharyngeal cancer %K internet information %K readability %K Journal of the American Medical Association %K JAMA %K DISCERN %K artificial intelligence %K AI %D 2023 %7 27.11.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Nasopharyngeal carcinoma (NPC) is a rare disease that is strongly associated with exposure to the Epstein-Barr virus and is characterized by the formation of malignant cells in nasopharynx tissues. Early diagnosis of NPC is often difficult owing to the location of initial tumor sites and the nonspecificity of initial symptoms, resulting in a higher frequency of advanced-stage diagnoses and a poorer prognosis. Access to high-quality, readable information could improve the early detection of the disease and provide support to patients during disease management. Objective: This study aims to assess the quality and readability of publicly available web-based information in the English language about NPC, using the most popular search engines. Methods: Key terms relevant to NPC were searched across 3 of the most popular internet search engines: Google, Yahoo, and Bing. The top 25 results from each search engine were included in the analysis. Websites that contained text written in languages other than English, required paywall access, targeted medical professionals, or included nontext content were excluded. Readability for each website was assessed using the Flesch Reading Ease score and the Flesch-Kincaid grade level. Website quality was assessed using the Journal of the American Medical Association (JAMA) and DISCERN tools as well as the presence of a Health on the Net Foundation seal. Results: Overall, 57 suitable websites were included in this study; 26% (15/57) of the websites were academic. The mean JAMA and DISCERN scores of all websites were 2.80 (IQR 3) and 57.60 (IQR 19), respectively, with a median of 3 (IQR 2-4) and 61 (IQR 49-68), respectively. Health care industry websites (n=3) had the highest mean JAMA score of 4 (SD 0). Academic websites (15/57, 26%) had the highest mean DISCERN score of 77.5. The Health on the Net Foundation seal was present on only 1 website, which also achieved a JAMA score of 3 and a DISCERN score of 50. Significant differences were observed between the JAMA score of hospital websites and the scores of industry websites (P=.04), news service websites (P<.048), charity and nongovernmental organization websites (P=.03). Despite being a vital source for patients, general practitioner websites were found to have significantly lower JAMA scores compared with charity websites (P=.05). The overall mean readability scores reflected an average reading age of 14.3 (SD 1.1) years. Conclusions: The results of this study suggest an inconsistent and suboptimal quality of information related to NPC on the internet. On average, websites presented readability challenges, as written information about NPC was above the recommended reading level of sixth grade. As such, web-based information requires improvement in both quality and accessibility, and healthcare providers should be selective about information recommended to patients, ensuring they are reliable and readable. %M 38010802 %R 10.2196/47762 %U https://formative.jmir.org/2023/1/e47762 %U https://doi.org/10.2196/47762 %U http://www.ncbi.nlm.nih.gov/pubmed/38010802 %0 Journal Article %@ 2371-4379 %I JMIR Publications %V 8 %N %P e49113 %T A Machine Learning Web App to Predict Diabetic Blood Glucose Based on a Basic Noninvasive Health Checkup, Sociodemographic Characteristics, and Dietary Information: Case Study %A Sampa,Masuda Begum %A Biswas,Topu %A Rahman,Md Siddikur %A Aziz,Nor Hidayati Binti Abdul %A Hossain,Md Nazmul %A Aziz,Nor Azlina Ab %+ Center for Engineering Computational Intelligence, Faculty of Engineering and Technology, Multimedia University, Jalan Ayer Keroh Lama, Bukit Beruang, Melaka, 75450, Malaysia, 60 132999042, azlina.aziz@mmu.edu.my %K blood glucose prediction %K boosted decision tree regression model %K machine learning %K noncommunicable diseases %K noninvasive %D 2023 %7 24.11.2023 %9 Original Paper %J JMIR Diabetes %G English %X Background: Over the past few decades, diabetes has become a serious public health concern worldwide, particularly in Bangladesh. The advancement of artificial intelligence can be reaped in the prediction of blood glucose levels for better health management. However, the practical validity of machine learning (ML) techniques for predicting health parameters using data from low- and middle-income countries, such as Bangladesh, is very low. Specifically, Bangladesh lacks research using ML techniques to predict blood glucose levels based on basic noninvasive clinical measurements and dietary and sociodemographic information. Objective: To formulate strategies for public health planning and the control of diabetes, this study aimed to develop a personalized ML model that predicts the blood glucose level of urban corporate workers in Bangladesh. Methods: Based on the basic noninvasive health checkup test results, dietary information, and sociodemographic characteristics of 271 employees of the Bangladeshi Grameen Bank complex, 5 well-known ML models, namely, linear regression, boosted decision tree regression, neural network, decision forest regression, and Bayesian linear regression, were used to predict blood glucose levels. Continuous blood glucose data were used in this study to train the model, which then used the trained data to predict new blood glucose values. Results: Boosted decision tree regression demonstrated the greatest predictive performance of all evaluated models (root mean squared error=2.30). This means that, on average, our model’s predicted blood glucose level deviated from the actual blood glucose level by around 2.30 mg/dL. The mean blood glucose value of the population studied was 128.02 mg/dL (SD 56.92), indicating a borderline result for the majority of the samples (normal value: 140 mg/dL). This suggests that the individuals should be monitoring their blood glucose levels regularly. Conclusions: This ML-enabled web application for blood glucose prediction helps individuals to self-monitor their health condition. The application was developed with communities in remote areas of low- and middle-income countries, such as Bangladesh, in mind. These areas typically lack health facilities and have an insufficient number of qualified doctors and nurses. The web-based application is a simple, practical, and effective solution that can be adopted by the community. Use of the web application can save money on medical expenses, time, and health management expenses. The created system also aids in achieving the Sustainable Development Goals, particularly in ensuring that everyone in the community enjoys good health and well-being and lowering total morbidity and mortality. %M 37999944 %R 10.2196/49113 %U https://diabetes.jmir.org/2023/1/e49113 %U https://doi.org/10.2196/49113 %U http://www.ncbi.nlm.nih.gov/pubmed/37999944 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e51873 %T Usability and Efficacy of Artificial Intelligence Chatbots (ChatGPT) for Health Sciences Students: Protocol for a Crossover Randomized Controlled Trial %A Veras,Mirella %A Dyer,Joseph-Omer %A Rooney,Morgan %A Barros Silva,Paulo Goberlânio %A Rutherford,Derek %A Kairy,Dahlia %+ Health Sciences, Carleton University, 1125 Colonel By Drive, Ottawa, ON, K1S 5B6, Canada, 1 613 520 2600, mirella.veras@carleton.ca %K artificial intelligence %K AI %K health sciences %K usability %K learning outcomes %K perceptions %K OpenAI %K ChatGPT %K education %K randomized controlled trial %K RCT %K crossover RCT %D 2023 %7 24.11.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: The integration of artificial intelligence (AI) into health sciences students’ education holds significant importance. The rapid advancement of AI has opened new horizons in scientific writing and has the potential to reshape human-technology interactions. AI in education may impact critical thinking, leading to unintended consequences that need to be addressed. Understanding the implications of AI adoption in education is essential for ensuring its responsible and effective use, empowering health sciences students to navigate AI-driven technologies’ evolving field with essential knowledge and skills. Objective: This study aims to provide details on the study protocol and the methods used to investigate the usability and efficacy of ChatGPT, a large language model. The primary focus is on assessing its role as a supplementary learning tool for improving learning processes and outcomes among undergraduate health sciences students, with a specific emphasis on chronic diseases. Methods: This single-blinded, crossover, randomized, controlled trial is part of a broader mixed methods study, and the primary emphasis of this paper is on the quantitative component of the overall research. A total of 50 students will be recruited for this study. The alternative hypothesis posits that there will be a significant difference in learning outcomes and technology usability between students using ChatGPT (group A) and those using standard web-based tools (group B) to access resources and complete assignments. Participants will be allocated to sequence AB or BA in a 1:1 ratio using computer-generated randomization. Both arms include students’ participation in a writing assignment intervention, with a washout period of 21 days between interventions. The primary outcome is the measure of the technology usability and effectiveness of ChatGPT, whereas the secondary outcome is the measure of students’ perceptions and experiences with ChatGPT as a learning tool. Outcome data will be collected up to 24 hours after the interventions. Results: This study aims to understand the potential benefits and challenges of incorporating AI as an educational tool, particularly in the context of student learning. The findings are expected to identify critical areas that need attention and help educators develop a deeper understanding of AI’s impact on the educational field. By exploring the differences in the usability and efficacy between ChatGPT and conventional web-based tools, this study seeks to inform educators and students on the responsible integration of AI into academic settings, with a specific focus on health sciences education. Conclusions: By exploring the usability and efficacy of ChatGPT compared with conventional web-based tools, this study seeks to inform educators and students about the responsible integration of AI into academic settings. Trial Registration: ClinicalTrails.gov NCT05963802; https://clinicaltrials.gov/study/NCT05963802 International Registered Report Identifier (IRRID): PRR1-10.2196/51873 %M 37999958 %R 10.2196/51873 %U https://www.researchprotocols.org/2023/1/e51873 %U https://doi.org/10.2196/51873 %U http://www.ncbi.nlm.nih.gov/pubmed/37999958 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e49314 %T A Conference (Missingness in Action) to Address Missingness in Data and AI in Health Care: Qualitative Thematic Analysis %A Rose,Christian %A Barber,Rachel %A Preiksaitis,Carl %A Kim,Ireh %A Mishra,Nikesh %A Kayser,Kristen %A Brown,Italo %A Gisondi,Michael %+ Department of Emergency Medicine, Stanford University School of Medicine, 900 Welch Road, Palo Alto, CA, 94304, United States, 1 415 915 9585, ccrose@stanford.edu %K machine learning %K artificial intelligence %K health care data %K data quality %K thematic analysis %K AI %K implementation %K digital conference %K data quality %K trust %K privacy %K predictive model %K health care community %D 2023 %7 23.11.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Missingness in health care data poses significant challenges in the development and implementation of artificial intelligence (AI) and machine learning solutions. Identifying and addressing these challenges is critical to ensuring the continued growth and accuracy of these models as well as their equitable and effective use in health care settings. Objective: This study aims to explore the challenges, opportunities, and potential solutions related to missingness in health care data for AI applications through the conduct of a digital conference and thematic analysis of conference proceedings. Methods: A digital conference was held in September 2022, attracting 861 registered participants, with 164 (19%) attending the live event. The conference featured presentations and panel discussions by experts in AI, machine learning, and health care. Transcripts of the event were analyzed using the stepwise framework of Braun and Clark to identify key themes related to missingness in health care data. Results: Three principal themes—data quality and bias, human input in model development, and trust and privacy—emerged from the analysis. Topics included the accuracy of predictive models, lack of inclusion of underrepresented communities, partnership with physicians and other populations, challenges with sensitive health care data, and fostering trust with patients and the health care community. Conclusions: Addressing the challenges of data quality, human input, and trust is vital when devising and using machine learning algorithms in health care. Recommendations include expanding data collection efforts to reduce gaps and biases, involving medical professionals in the development and implementation of AI models, and developing clear ethical guidelines to safeguard patient privacy. Further research and ongoing discussions are needed to ensure these conclusions remain relevant as health care and AI continue to evolve. %M 37995113 %R 10.2196/49314 %U https://www.jmir.org/2023/1/e49314 %U https://doi.org/10.2196/49314 %U http://www.ncbi.nlm.nih.gov/pubmed/37995113 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e46089 %T Guidelines, Consensus Statements, and Standards for the Use of Artificial Intelligence in Medicine: Systematic Review %A Wang,Ying %A Li,Nian %A Chen,Lingmin %A Wu,Miaomiao %A Meng,Sha %A Dai,Zelei %A Zhang,Yonggang %A Clarke,Mike %+ Department of Periodical Press, National Clinical Research Center for Geriatrics, Chinese Evidence-based Medicine Center, Nursing Key Laboratory of Sichuan Province, West China Hospital, Sichuan University, No. 37 Guoxue Lane, Chengdu, 610041, China, 86 28 85421729, jebm_zhang@yahoo.com %K artificial intelligence %K clinical practice %K guidelines %K consensus statements %K standards %K systematic review %D 2023 %7 22.11.2023 %9 Review %J J Med Internet Res %G English %X Background: The application of artificial intelligence (AI) in the delivery of health care is a promising area, and guidelines, consensus statements, and standards on AI regarding various topics have been developed. Objective: We performed this study to assess the quality of guidelines, consensus statements, and standards in the field of AI for medicine and to provide a foundation for recommendations about the future development of AI guidelines. Methods: We searched 7 electronic databases from database establishment to April 6, 2022, and screened articles involving AI guidelines, consensus statements, and standards for eligibility. The AGREE II (Appraisal of Guidelines for Research & Evaluation II) and RIGHT (Reporting Items for Practice Guidelines in Healthcare) tools were used to assess the methodological and reporting quality of the included articles. Results: This systematic review included 19 guideline articles, 14 consensus statement articles, and 3 standard articles published between 2019 and 2022. Their content involved disease screening, diagnosis, and treatment; AI intervention trial reporting; AI imaging development and collaboration; AI data application; and AI ethics governance and applications. Our quality assessment revealed that the average overall AGREE II score was 4.0 (range 2.2-5.5; 7-point Likert scale) and the mean overall reporting rate of the RIGHT tool was 49.4% (range 25.7%-77.1%). Conclusions: The results indicated important differences in the quality of different AI guidelines, consensus statements, and standards. We made recommendations for improving their methodological and reporting quality. Trial Registration: PROSPERO International Prospective Register of Systematic Reviews (CRD42022321360); https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=321360 %M 37991819 %R 10.2196/46089 %U https://www.jmir.org/2023/1/e46089 %U https://doi.org/10.2196/46089 %U http://www.ncbi.nlm.nih.gov/pubmed/37991819 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e46779 %T Predicting Adherence to Behavior Change Support Systems Using Machine Learning: Systematic Review %A Ekpezu,Akon Obu %A Wiafe,Isaac %A Oinas-Kukkonen,Harri %+ Oulu Advanced Research on Service and Information Systems, Department of Information Processing Science, University of Oulu, Pentti Kaiteran Katu 1 Linnanmaa, Oulu, 90570, Finland, 358 468860704, akon.ekpezu@oulu.fi %K adherence %K compliance %K behavior change support systems %K persuasive systems %K persuasive technology %K machine learning %D 2023 %7 22.11.2023 %9 Review %J JMIR AI %G English %X Background: There is a dearth of knowledge on reliable adherence prediction measures in behavior change support systems (BCSSs). Existing reviews have predominately focused on self-reporting measures of adherence. These measures are susceptible to overestimation or underestimation of adherence behavior. Objective: This systematic review seeks to identify and summarize trends in the use of machine learning approaches to predict adherence to BCSSs. Methods: Systematic literature searches were conducted in the Scopus and PubMed electronic databases between January 2011 and August 2022. The initial search retrieved 2182 journal papers, but only 11 of these papers were eligible for this review. Results: A total of 4 categories of adherence problems in BCSSs were identified: adherence to digital cognitive and behavioral interventions, medication adherence, physical activity adherence, and diet adherence. The use of machine learning techniques for real-time adherence prediction in BCSSs is gaining research attention. A total of 13 unique supervised learning techniques were identified and the majority of them were traditional machine learning techniques (eg, support vector machine). Long short-term memory, multilayer perception, and ensemble learning are currently the only advanced learning techniques. Despite the heterogeneity in the feature selection approaches, most prediction models achieved good classification accuracies. This indicates that the features or predictors used were a good representation of the adherence problem. Conclusions: Using machine learning algorithms to predict the adherence behavior of a BCSS user can facilitate the reinforcement of adherence behavior. This can be achieved by developing intelligent BCSSs that can provide users with more personalized, tailored, and timely suggestions. %M 38875538 %R 10.2196/46779 %U https://ai.jmir.org/2023/1/e46779 %U https://doi.org/10.2196/46779 %U http://www.ncbi.nlm.nih.gov/pubmed/38875538 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e40965 %T Real-Time Classification of Causes of Death Using AI: Sensitivity Analysis %A Pita Ferreira,Patrícia %A Godinho Simões,Diogo %A Pinto de Carvalho,Constança %A Duarte,Francisco %A Fernandes,Eugénia %A Casaca Carvalho,Pedro %A Loff,José Francisco %A Soares,Ana Paula %A Albuquerque,Maria João %A Pinto-Leite,Pedro %A Peralta-Santos,André %+ Direção de Serviços de Informação e Análise, Direção-Geral da Saúde, Alameda D. Afonso Henriques, 45, Lisbon, 1049-005, Portugal, 351 218430500, ppita.ferreira@gmail.com %K artificial intelligence %K AI %K mortality %K deep neural networks %K evaluation %K machine learning %K deep learning %K mortality statistics %K underlying cause of death %D 2023 %7 22.11.2023 %9 Original Paper %J JMIR AI %G English %X Background: In 2021, the European Union reported >270,000 excess deaths, including >16,000 in Portugal. The Portuguese Directorate-General of Health developed a deep neural network, AUTOCOD, which determines the primary causes of death by analyzing the free text of physicians’ death certificates (DCs). Although AUTOCOD’s performance has been established, it remains unclear whether its performance remains consistent over time, particularly during periods of excess mortality. Objective: This study aims to assess the sensitivity and other performance metrics of AUTOCOD in classifying underlying causes of death compared with manual coding to identify specific causes of death during periods of excess mortality. Methods: We included all DCs between 2016 and 2019. AUTOCOD’s performance was evaluated by calculating various performance metrics, such as sensitivity, specificity, positive predictive value (PPV), and F1-score, using a confusion matrix. This compared International Statistical Classification of Diseases and Health-Related Problems, 10th Revision (ICD-10), classifications of DCs by AUTOCOD with those by human coders at the Directorate-General of Health (gold standard). Subsequently, we compared periods without excess mortality with periods of excess, severe, and extreme excess mortality. We defined excess mortality as 2 consecutive days with a Z score above the 95% baseline limit, severe excess mortality as 2 consecutive days with a Z score >4 SDs, and extreme excess mortality as 2 consecutive days with a Z score >6 SDs. Finally, we repeated the analyses for the 3 most common ICD-10 chapters focusing on block-level classification. Results: We analyzed a large data set comprising 330,098 DCs classified by both human coders and AUTOCOD. AUTOCOD demonstrated high sensitivity (≥0.75) for 10 ICD-10 chapters examined, with values surpassing 0.90 for the more prevalent chapters (chapter II—“Neoplasms,” chapter IX—“Diseases of the circulatory system,” and chapter X—“Diseases of the respiratory system”), accounting for 67.69% (223,459/330,098) of all human-coded causes of death. No substantial differences were observed in these high-sensitivity values when comparing periods without excess mortality with periods of excess, severe, and extreme excess mortality. The same holds for specificity, which exceeded 0.96 for all chapters examined, and for PPV, which surpassed 0.75 in 9 chapters, including the more prevalent ones. When considering block classification within the 3 most common ICD-10 chapters, AUTOCOD maintained a high performance, demonstrating high sensitivity (≥0.75) for 13 ICD-10 blocks, high PPV for 9 blocks, and specificity of >0.98 in all blocks, with no significant differences between periods without excess mortality and those with excess mortality. Conclusions: Our findings indicate that, during periods of excess and extreme excess mortality, AUTOCOD’s performance remains unaffected by potential text quality degradation because of pressure on health services. Consequently, AUTOCOD can be dependably used for real-time cause-specific mortality surveillance even in extreme excess mortality situations. %M 38875558 %R 10.2196/40965 %U https://ai.jmir.org/2023/1/e40965 %U https://doi.org/10.2196/40965 %U http://www.ncbi.nlm.nih.gov/pubmed/38875558 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e47274 %T The Intersection of ChatGPT, Clinical Medicine, and Medical Education %A Wong,Rebecca Shin-Yee %A Ming,Long Chiau %A Raja Ali,Raja Affendi %+ School of Medical and Life Sciences, Sunway University, No 5, Jalan Universiti, Bandar Sunway, Selangor, 47500, Malaysia, 60 374918622 ext 7452, longchiauming@gmail.com %K ChatGPT %K clinical research %K large language model %K artificial intelligence %K ethical considerations %K AI %K OpenAI %D 2023 %7 21.11.2023 %9 Viewpoint %J JMIR Med Educ %G English %X As we progress deeper into the digital age, the robust development and application of advanced artificial intelligence (AI) technology, specifically generative language models like ChatGPT (OpenAI), have potential implications in all sectors including medicine. This viewpoint article aims to present the authors’ perspective on the integration of AI models such as ChatGPT in clinical medicine and medical education. The unprecedented capacity of ChatGPT to generate human-like responses, refined through Reinforcement Learning with Human Feedback, could significantly reshape the pedagogical methodologies within medical education. Through a comprehensive review and the authors’ personal experiences, this viewpoint article elucidates the pros, cons, and ethical considerations of using ChatGPT within clinical medicine and notably, its implications for medical education. This exploration is crucial in a transformative era where AI could potentially augment human capability in the process of knowledge creation and dissemination, potentially revolutionizing medical education and clinical practice. The importance of maintaining academic integrity and professional standards is highlighted. The relevance of establishing clear guidelines for the responsible and ethical use of AI technologies in clinical medicine and medical education is also emphasized. %M 37988149 %R 10.2196/47274 %U https://mededu.jmir.org/2023/1/e47274 %U https://doi.org/10.2196/47274 %U http://www.ncbi.nlm.nih.gov/pubmed/37988149 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 11 %N %P e47833 %T Machine Learning Models for Blood Glucose Level Prediction in Patients With Diabetes Mellitus: Systematic Review and Network Meta-Analysis %A Liu,Kui %A Li,Linyi %A Ma,Yifei %A Jiang,Jun %A Liu,Zhenhua %A Ye,Zichen %A Liu,Shuang %A Pu,Chen %A Chen,Changsheng %A Wan,Yi %+ Department of Health Service, Air Force Medical University, No 169, Changle West Road, Xincheng District, Xi'an, Shaanxi, 710032, China, 86 17391928966, wanyi@fmmu.edu.cn %K machine learning %K diabetes %K hypoglycemia %K blood glucose %K blood glucose management %D 2023 %7 20.11.2023 %9 Original Paper %J JMIR Med Inform %G English %X Background: Machine learning (ML) models provide more choices to patients with diabetes mellitus (DM) to more properly manage blood glucose (BG) levels. However, because of numerous types of ML algorithms, choosing an appropriate model is vitally important. Objective: In a systematic review and network meta-analysis, this study aimed to comprehensively assess the performance of ML models in predicting BG levels. In addition, we assessed ML models used to detect and predict adverse BG (hypoglycemia) events by calculating pooled estimates of sensitivity and specificity. Methods: PubMed, Embase, Web of Science, and Institute of Electrical and Electronics Engineers Explore databases were systematically searched for studies on predicting BG levels and predicting or detecting adverse BG events using ML models, from inception to November 2022. Studies that assessed the performance of different ML models in predicting or detecting BG levels or adverse BG events of patients with DM were included. Studies with no derivation or performance metrics of ML models were excluded. The Quality Assessment of Diagnostic Accuracy Studies tool was applied to assess the quality of included studies. Primary outcomes were the relative ranking of ML models for predicting BG levels in different prediction horizons (PHs) and pooled estimates of the sensitivity and specificity of ML models in detecting or predicting adverse BG events. Results: In total, 46 eligible studies were included for meta-analysis. Regarding ML models for predicting BG levels, the means of the absolute root mean square error (RMSE) in a PH of 15, 30, 45, and 60 minutes were 18.88 (SD 19.71), 21.40 (SD 12.56), 21.27 (SD 5.17), and 30.01 (SD 7.23) mg/dL, respectively. The neural network model (NNM) showed the highest relative performance in different PHs. Furthermore, the pooled estimates of the positive likelihood ratio and the negative likelihood ratio of ML models were 8.3 (95% CI 5.7-12.0) and 0.31 (95% CI 0.22-0.44), respectively, for predicting hypoglycemia and 2.4 (95% CI 1.6-3.7) and 0.37 (95% CI 0.29-0.46), respectively, for detecting hypoglycemia. Conclusions: Statistically significant high heterogeneity was detected in all subgroups, with different sources of heterogeneity. For predicting precise BG levels, the RMSE increases with a rise in the PH, and the NNM shows the highest relative performance among all the ML models. Meanwhile, current ML models have sufficient ability to predict adverse BG events, while their ability to detect adverse BG events needs to be enhanced. Trial Registration: PROSPERO CRD42022375250; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=375250 %M 37983072 %R 10.2196/47833 %U https://medinform.jmir.org/2023/1/e47833 %U https://doi.org/10.2196/47833 %U http://www.ncbi.nlm.nih.gov/pubmed/37983072 %0 Journal Article %@ 2369-2529 %I JMIR Publications %V 10 %N %P e50438 %T Validating the Safe and Effective Use of a Neurorehabilitation System (InTandem) to Improve Walking in the Chronic Stroke Population: Usability Study %A Smayda,Kirsten Elisabeth %A Cooper,Sarah Hodsdon %A Leyden,Katie %A Ulaszek,Jackie %A Ferko,Nicole %A Dobrin,Annamaria %+ MedRhythms, 183 Middle Street, Portland, ME, 04101, United States, 1 1 207 233 2373, ksmayda@medrhythms.com %K chronic stroke %K walking %K InTandem %K MR-001 %K neurorehabilitation %K human factors engineering %K usability %K rhythmic auditory stimulation %K validation %K neurotherapeutic %D 2023 %7 20.11.2023 %9 Original Paper %J JMIR Rehabil Assist Technol %G English %X Background: Persistent walking impairment following a stroke is common. Although rehabilitative interventions exist, few exist for use at home in the chronic phase of stroke recovery. InTandem (MedRhythms, Inc) is a neurorehabilitation system intended to improve walking and community ambulation in adults with chronic stroke walking impairment. Objective: Using design best practices and human factors engineering principles, the research presented here was conducted to validate the safe and effective use of InTandem. Methods: In total, 15 participants in the chronic phase of stroke recovery (≥6 months after stroke) participated in this validation study. Participants were scored on 8 simulated use tasks, 4 knowledge assessments, and 7 comprehension assessments in a simulated home environment. The number and types of use errors, close calls, and operational difficulties were evaluated. Analyses of task performances, participant behaviors, and follow-up interviews were conducted to determine the root cause of use errors and difficulties. Results: During this validation study, 93% (14/15) of participants were able to successfully complete the critical tasks associated with the simulated use of the InTandem system. Following simulated use task assessments, participants’ knowledge and comprehension of the instructions for use and key safety information were evaluated. Overall, participants were able to find and correctly interpret information in the materials in order to answer the knowledge assessment questions. During the comprehension assessment, participants understood warning statements associated with critical tasks presented in the instructions for use. Across the entire study, 3 “use errors” and 1 “success with difficulty” were recorded. No adverse events, including slips, trips, or falls, occurred in this study. Conclusions: In this validation study, people in the chronic phase of stroke recovery were able to safely and effectively use InTandem in the intended use environment. This validation study contributes to the overall understanding of residual use–related risks of InTandem in consideration of the established benefits. %M 37983080 %R 10.2196/50438 %U https://rehab.jmir.org/2023/1/e50438 %U https://doi.org/10.2196/50438 %U http://www.ncbi.nlm.nih.gov/pubmed/37983080 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e42545 %T Selection Bias in Digital Conversations on Depression Before and During COVID-19 %A Lee,Edward %A Agustines,Davin %A Woo,Benjamin K P %+ College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, 4385 Ocean View Blvd, Montrose, CA, 91020, United States, 1 8189260488, edward.lee@westernu.edu %K depression %K COVID-19 %K treatment %K race %K ethnicity %K digital conversations %K health belief model %K artificial intelligence %K AI %K natural language processing %K NLP %D 2023 %7 20.11.2023 %9 Letter to the Editor %J JMIR Form Res %G English %X %M 37983077 %R 10.2196/42545 %U https://formative.jmir.org/2023/1/e42545 %U https://doi.org/10.2196/42545 %U http://www.ncbi.nlm.nih.gov/pubmed/37983077 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 11 %N %P e47445 %T Artificial Intelligence–Based Methods for Integrating Local and Global Features for Brain Cancer Imaging: Scoping Review %A Ali,Hazrat %A Qureshi,Rizwan %A Shah,Zubair %+ College of Science and Engineering, Hamad Bin Khalifa University, Al Luqta St, Ar-Rayyan, Doha, 34110, Qatar, 974 50744851, zshah@hbku.edu.qa %K artificial intelligence %K AI %K brain cancer %K brain tumor %K medical imaging %K segmentation %K vision transformers %D 2023 %7 17.11.2023 %9 Review %J JMIR Med Inform %G English %X Background: Transformer-based models are gaining popularity in medical imaging and cancer imaging applications. Many recent studies have demonstrated the use of transformer-based models for brain cancer imaging applications such as diagnosis and tumor segmentation. Objective: This study aims to review how different vision transformers (ViTs) contributed to advancing brain cancer diagnosis and tumor segmentation using brain image data. This study examines the different architectures developed for enhancing the task of brain tumor segmentation. Furthermore, it explores how the ViT-based models augmented the performance of convolutional neural networks for brain cancer imaging. Methods: This review performed the study search and study selection following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. The search comprised 4 popular scientific databases: PubMed, Scopus, IEEE Xplore, and Google Scholar. The search terms were formulated to cover the interventions (ie, ViTs) and the target application (ie, brain cancer imaging). The title and abstract for study selection were performed by 2 reviewers independently and validated by a third reviewer. Data extraction was performed by 2 reviewers and validated by a third reviewer. Finally, the data were synthesized using a narrative approach. Results: Of the 736 retrieved studies, 22 (3%) were included in this review. These studies were published in 2021 and 2022. The most commonly addressed task in these studies was tumor segmentation using ViTs. No study reported early detection of brain cancer. Among the different ViT architectures, Shifted Window transformer–based architectures have recently become the most popular choice of the research community. Among the included architectures, UNet transformer and TransUNet had the highest number of parameters and thus needed a cluster of as many as 8 graphics processing units for model training. The brain tumor segmentation challenge data set was the most popular data set used in the included studies. ViT was used in different combinations with convolutional neural networks to capture both the global and local context of the input brain imaging data. Conclusions: It can be argued that the computational complexity of transformer architectures is a bottleneck in advancing the field and enabling clinical transformations. This review provides the current state of knowledge on the topic, and the findings of this review will be helpful for researchers in the field of medical artificial intelligence and its applications in brain cancer. %M 37976086 %R 10.2196/47445 %U https://medinform.jmir.org/2023/1/e47445 %U https://doi.org/10.2196/47445 %U http://www.ncbi.nlm.nih.gov/pubmed/37976086 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 6 %N %P e49280 %T Evaluation of ChatGPT Dermatology Responses to Common Patient Queries %A Ferreira,Alana L %A Chu,Brian %A Grant-Kels,Jane M %A Ogunleye,Temitayo %A Lipoff,Jules B %+ Department of Dermatology, Lewis Katz School of Medicine, Temple University, 525 Jamestown Avenue, Suite #206, Philadelphia, PA, 19128, United States, 1 215 482 7546, jules.lipoff@temple.edu %K ChatGPT %K dermatology %K dermatologist %K artificial intelligence %K AI %K medical advice %K GPT-4 %K patient queries %K information resource %K response evaluation %K skin condition %K skin %K tool %K AI tool %D 2023 %7 17.11.2023 %9 Research Letter %J JMIR Dermatol %G English %X %M 37976093 %R 10.2196/49280 %U https://derma.jmir.org/2023/1/e49280 %U https://doi.org/10.2196/49280 %U http://www.ncbi.nlm.nih.gov/pubmed/37976093 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e47609 %T Developer Perspectives on Potential Harms of Machine Learning Predictive Analytics in Health Care: Qualitative Analysis %A Nichol,Ariadne A %A Sankar,Pamela L %A Halley,Meghan C %A Federico,Carole A %A Cho,Mildred K %+ Department of Medical Ethics & Health Policy, University of Pennsylvania, 423 Guardian Drive, Philadelphia, PA, 19104, United States, 1 2158987136, sankarp@pennmedicine.upenn.edu %K machine learning %K ML %K algorithms %K health care quality %K responsibility %K ethics %K machine learning predictive analytics %K MLPA %K developers %D 2023 %7 16.11.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Machine learning predictive analytics (MLPA) is increasingly used in health care to reduce costs and improve efficacy; it also has the potential to harm patients and trust in health care. Academic and regulatory leaders have proposed a variety of principles and guidelines to address the challenges of evaluating the safety of machine learning–based software in the health care context, but accepted practices do not yet exist. However, there appears to be a shift toward process-based regulatory paradigms that rely heavily on self-regulation. At the same time, little research has examined the perspectives about the harms of MLPA developers themselves, whose role will be essential in overcoming the “principles-to-practice” gap. Objective: The objective of this study was to understand how MLPA developers of health care products perceived the potential harms of those products and their responses to recognized harms. Methods: We interviewed 40 individuals who were developing MLPA tools for health care at 15 US-based organizations, including data scientists, software engineers, and those with mid- and high-level management roles. These 15 organizations were selected to represent a range of organizational types and sizes from the 106 that we previously identified. We asked developers about their perspectives on the potential harms of their work, factors that influence these harms, and their role in mitigation. We used standard qualitative analysis of transcribed interviews to identify themes in the data. Results: We found that MLPA developers recognized a range of potential harms of MLPA to individuals, social groups, and the health care system, such as issues of privacy, bias, and system disruption. They also identified drivers of these harms related to the characteristics of machine learning and specific to the health care and commercial contexts in which the products are developed. MLPA developers also described strategies to respond to these drivers and potentially mitigate the harms. Opportunities included balancing algorithm performance goals with potential harms, emphasizing iterative integration of health care expertise, and fostering shared company values. However, their recognition of their own responsibility to address potential harms varied widely. Conclusions: Even though MLPA developers recognized that their products can harm patients, public, and even health systems, robust procedures to assess the potential for harms and the need for mitigation do not exist. Our findings suggest that, to the extent that new oversight paradigms rely on self-regulation, they will face serious challenges if harms are driven by features that developers consider inescapable in health care and business environments. Furthermore, effective self-regulation will require MLPA developers to accept responsibility for safety and efficacy and know how to act accordingly. Our results suggest that, at the very least, substantial education will be necessary to fill the “principles-to-practice” gap. %M 37971798 %R 10.2196/47609 %U https://www.jmir.org/2023/1/e47609 %U https://doi.org/10.2196/47609 %U http://www.ncbi.nlm.nih.gov/pubmed/37971798 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 10 %N %P e49936 %T The Potential Influence of AI on Population Mental Health %A Ettman,Catherine K %A Galea,Sandro %+ Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, 624 N Broadway Street, Baltimore, MD, 21205, United States, 1 410 516 8000, cettman1@jhu.edu %K mental health %K artificial intelligence %K AI %K policy %K policies %K population health %K population %K ChatGPT %K generative %K tools %K digital mental health %D 2023 %7 16.11.2023 %9 Viewpoint %J JMIR Ment Health %G English %X The integration of artificial intelligence (AI) into everyday life has galvanized a global conversation on the possibilities and perils of AI on human health. In particular, there is a growing need to anticipate and address the potential impact of widely accessible, enhanced, and conversational AI on mental health. We propose 3 considerations to frame how AI may influence population mental health: through the advancement of mental health care; by altering social and economic contexts; and through the policies that shape the adoption, use, and potential abuse of AI-enhanced tools. %M 37971803 %R 10.2196/49936 %U https://mental.jmir.org/2023/1/e49936 %U https://doi.org/10.2196/49936 %U http://www.ncbi.nlm.nih.gov/pubmed/37971803 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e49368 %T A SWOT (Strengths, Weaknesses, Opportunities, and Threats) Analysis of ChatGPT in the Medical Literature: Concise Review %A Gödde,Daniel %A Nöhl,Sophia %A Wolf,Carina %A Rupert,Yannick %A Rimkus,Lukas %A Ehlers,Jan %A Breuckmann,Frank %A Sellmann,Timur %+ Department of Pathology and Molecularpathology, Helios University Hospital Wuppertal, Witten/Herdecke University, Alfred-Herrhausen-Straße 50, Witten, 58455, Germany, 49 202 896 2541, daniel.goedde@helios-gesundheit.de %K ChatGPT %K chatbot %K artificial intelligence %K education technology %K medical education %K machine learning %K chatbots %K concise review %K review methods %K review methodology %K SWOT %D 2023 %7 16.11.2023 %9 Review %J J Med Internet Res %G English %X Background: ChatGPT is a 175-billion-parameter natural language processing model that is already involved in scientific content and publications. Its influence ranges from providing quick access to information on medical topics, assisting in generating medical and scientific articles and papers, performing medical data analyses, and even interpreting complex data sets. Objective: The future role of ChatGPT remains uncertain and a matter of debate already shortly after its release. This review aimed to analyze the role of ChatGPT in the medical literature during the first 3 months after its release. Methods: We performed a concise review of literature published in PubMed from December 1, 2022, to March 31, 2023. To find all publications related to ChatGPT or considering ChatGPT, the search term was kept simple (“ChatGPT” in AllFields). All publications available as full text in German or English were included. All accessible publications were evaluated according to specifications by the author team (eg, impact factor, publication modus, article type, publication speed, and type of ChatGPT integration or content). The conclusions of the articles were used for later SWOT (strengths, weaknesses, opportunities, and threats) analysis. All data were analyzed on a descriptive basis. Results: Of 178 studies in total, 160 met the inclusion criteria and were evaluated. The average impact factor was 4.423 (range 0-96.216), and the average publication speed was 16 (range 0-83) days. Among the articles, there were 77 editorials (48,1%), 43 essays (26.9%), 21 studies (13.1%), 6 reviews (3.8%), 6 case reports (3.8%), 6 news (3.8%), and 1 meta-analysis (0.6%). Of those, 54.4% (n=87) were published as open access, with 5% (n=8) provided on preprint servers. Over 400 quotes with information on strengths, weaknesses, opportunities, and threats were detected. By far, most (n=142, 34.8%) were related to weaknesses. ChatGPT excels in its ability to express ideas clearly and formulate general contexts comprehensibly. It performs so well that even experts in the field have difficulty identifying abstracts generated by ChatGPT. However, the time-limited scope and the need for corrections by experts were mentioned as weaknesses and threats of ChatGPT. Opportunities include assistance in formulating medical issues for nonnative English speakers, as well as the possibility of timely participation in the development of such artificial intelligence tools since it is in its early stages and can therefore still be influenced. Conclusions: Artificial intelligence tools such as ChatGPT are already part of the medical publishing landscape. Despite their apparent opportunities, policies and guidelines must be implemented to ensure benefits in education, clinical practice, and research and protect against threats such as scientific misconduct, plagiarism, and inaccuracy. %M 37865883 %R 10.2196/49368 %U https://www.jmir.org/2023/1/e49368 %U https://doi.org/10.2196/49368 %U http://www.ncbi.nlm.nih.gov/pubmed/37865883 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e49016 %T Risk Factors and Predictive Models for Peripherally Inserted Central Catheter Unplanned Extubation in Patients With Cancer: Prospective, Machine Learning Study %A Zhang,Jinghui %A Ma,Guiyuan %A Peng,Sha %A Hou,Jianmei %A Xu,Ran %A Luo,Lingxia %A Hu,Jiaji %A Yao,Nian %A Wang,Jiaan %A Huang,Xin %+ Teaching and Research Section of Clinical Nursing, Xiangya Hospital of Central South University, Number 87 Xiangya Road, Kaifu District, Changsha, Hunan, 410008, China, 86 13026179120, mmgy0906@163.com %K cancer %K PICC %K unplanned extubation %K predictive model %K logistic %K support vector machine %K random forest %D 2023 %7 16.11.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Cancer indeed represents a significant public health challenge, and unplanned extubation of peripherally inserted central catheter (PICC-UE) is a critical concern in patient safety. Identifying independent risk factors and implementing high-quality assessment tools for early detection in high-risk populations can play a crucial role in reducing the incidence of PICC-UE among patients with cancer. Precise prevention and treatment strategies are essential to improve patient outcomes and safety in clinical settings. Objective: This study aims to identify the independent risk factors associated with PICC-UE in patients with cancer and to construct a predictive model tailored to this group, offering a theoretical framework for anticipating and preventing PICC-UE in these patients. Methods: Prospective data were gathered from January to December 2022, encompassing patients with cancer with PICC at Xiangya Hospital, Central South University. Each patient underwent continuous monitoring until the catheter’s removal. The patients were categorized into 2 groups: the UE group (n=3107) and the non-UE group (n=284). Independent risk factors were identified through univariate analysis, the least absolute shrinkage and selection operator (LASSO) algorithm, and multivariate analysis. Subsequently, the 3391 patients were classified into a train set and a test set in a 7:3 ratio. Utilizing the identified predictors, 3 predictive models were constructed using the logistic regression, support vector machine, and random forest algorithms. The ultimate model was selected based on the receiver operating characteristic (ROC) curve and TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) synthesis analysis. To further validate the model, we gathered prospective data from 600 patients with cancer at the Affiliated Hospital of Qinghai University and Hainan Provincial People’s Hospital from June to December 2022. We assessed the model’s performance using the area under the curve of the ROC to evaluate differentiation, the calibration curve for calibration capability, and decision curve analysis (DCA) to gauge the model’s clinical applicability. Results: Independent risk factors for PICC-UE in patients with cancer were identified, including impaired physical mobility (odds ratio [OR] 2.775, 95% CI 1.951-3.946), diabetes (OR 1.754, 95% CI 1.134-2.712), surgical history (OR 1.734, 95% CI 1.313-2.290), elevated D-dimer concentration (OR 2.376, 95% CI 1.778-3.176), targeted therapy (OR 1.441, 95% CI 1.104-1.881), surgical treatment (OR 1.543, 95% CI 1.152-2.066), and more than 1 catheter puncture (OR 1.715, 95% CI 1.121-2.624). Protective factors were normal BMI (OR 0.449, 95% CI 0.342-0.590), polyurethane catheter material (OR 0.305, 95% CI 0.228-0.408), and valved catheter (OR 0.639, 95% CI 0.480-0.851). The TOPSIS synthesis analysis results showed that in the train set, the composite index (Ci) values were 0.00 for the logistic model, 0.82 for the support vector machine model, and 0.85 for the random forest model. In the test set, the Ci values were 0.00 for the logistic model, 1.00 for the support vector machine model, and 0.81 for the random forest model. The optimal model, constructed based on the support vector machine, was obtained and validated externally. The ROC curve, calibration curve, and DCA curve demonstrated that the model exhibited excellent accuracy, stability, generalizability, and clinical applicability. Conclusions: In summary, this study identified 10 independent risk factors for PICC-UE in patients with cancer. The predictive model developed using the support vector machine algorithm demonstrated excellent clinical applicability and was validated externally, providing valuable support for the early prediction of PICC-UE in patients with cancer. %M 37971792 %R 10.2196/49016 %U https://www.jmir.org/2023/1/e49016 %U https://doi.org/10.2196/49016 %U http://www.ncbi.nlm.nih.gov/pubmed/37971792 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e47664 %T Development and Validation of Machine Learning–Based Models to Predict In-Hospital Mortality in Life-Threatening Ventricular Arrhythmias: Retrospective Cohort Study %A Li,Le %A Ding,Ligang %A Zhang,Zhuxin %A Zhou,Likun %A Zhang,Zhenhao %A Xiong,Yulong %A Hu,Zhao %A Yao,Yan %+ National Center for Cardiovascular Diseases, Fu Wai Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beilishi Road 167, Beijing, 100037, China, 86 88322405, ianyao@263.net.cn %K life-threatening ventricular arrhythmia %K mortality %K prediction model %K machine learning %K critical care %K cardiac %K mortality %D 2023 %7 15.11.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Life-threatening ventricular arrhythmias (LTVAs) are main causes of sudden cardiac arrest and are highly associated with an increased risk of mortality. A prediction model that enables early identification of the high-risk individuals is still lacking. Objective: We aimed to build machine learning (ML)–based models to predict in-hospital mortality in patients with LTVA. Methods: A total of 3140 patients with LTVA were randomly divided into training (n=2512, 80%) and internal validation (n=628, 20%) sets. Moreover, data of 2851 patients from another database were collected as the external validation set. The primary output was the probability of in-hospital mortality. The discriminatory ability was evaluated by the area under the receiver operating characteristic curve (AUC). The prediction performances of 5 ML algorithms were compared with 2 conventional scoring systems, namely, the simplified acute physiology score (SAPS-II) and the logistic organ dysfunction system (LODS). Results: The prediction performance of the 5 ML algorithms significantly outperformed the traditional models in predicting in-hospital mortality. CatBoost showed the highest AUC of 90.5% (95% CI 87.5%-93.5%), followed by LightGBM with an AUC of 90.1% (95% CI 86.8%-93.4%). Conversely, the predictive values of SAPS-II and LODS were unsatisfactory, with AUCs of 78.0% (95% CI 71.7%-84.3%) and 74.9% (95% CI 67.2%-82.6%), respectively. The superiority of ML-based models was also shown in the external validation set. Conclusions: ML-based models could improve the predictive values of in-hospital mortality prediction for patients with LTVA compared with traditional scoring systems. %M 37966870 %R 10.2196/47664 %U https://www.jmir.org/2023/1/e47664 %U https://doi.org/10.2196/47664 %U http://www.ncbi.nlm.nih.gov/pubmed/37966870 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e50998 %T Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation %A Yu,Shirui %A Wang,Ziyang %A Nan,Jiale %A Li,Aihua %A Yang,Xuemei %A Tang,Xiaoli %+ Institute of Medical Information, Chinese Academy of Medical Sciences, No 69 Dongdan North Street, Beijing, 100005, China, 86 10 52328902, tang.xiaoli@imicams.ac.cn %K disease gene prediction %K metagraph %K protein representations %K schizophrenia %K keyword network %D 2023 %7 15.11.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Schizophrenia is a serious mental disease. With increased research funding for this disease, schizophrenia has become one of the key areas of focus in the medical field. Searching for associations between diseases and genes is an effective approach to study complex diseases, which may enhance research on schizophrenia pathology and lead to the identification of new treatment targets. Objective: The aim of this study was to identify potential schizophrenia risk genes by employing machine learning methods to extract topological characteristics of proteins and their functional roles in a protein-protein interaction (PPI)-keywords (PPIK) network and understand the complex disease–causing property. Consequently, a PPIK-based metagraph representation approach is proposed. Methods: To enrich the PPI network, we integrated keywords describing protein properties and constructed a PPIK network. We extracted features that describe the topology of this network through metagraphs. We further transformed these metagraphs into vectors and represented proteins with a series of vectors. We then trained and optimized our model using random forest (RF), extreme gradient boosting, light gradient boosting machine, and logistic regression models. Results: Comprehensive experiments demonstrated the good performance of our proposed method with an area under the receiver operating characteristic curve (AUC) value between 0.72 and 0.76. Our model also outperformed baseline methods for overall disease protein prediction, including the random walk with restart, average commute time, and Katz models. Compared with the PPI network constructed from the baseline models, complementation of keywords in the PPIK network improved the performance (AUC) by 0.08 on average, and the metagraph-based method improved the AUC by 0.30 on average compared with that of the baseline methods. According to the comprehensive performance of the four models, RF was selected as the best model for disease protein prediction, with precision, recall, F1-score, and AUC values of 0.76, 0.73, 0.72, and 0.76, respectively. We transformed these proteins to their encoding gene IDs and identified the top 20 genes as the most probable schizophrenia-risk genes, including the EYA3, CNTN4, HSPA8, LRRK2, and AFP genes. We further validated these outcomes against metagraph features and evidence from the literature, performed a features analysis, and exploited evidence from the literature to interpret the correlation between the predicted genes and diseases. Conclusions: The metagraph representation based on the PPIK network framework was found to be effective for potential schizophrenia risk genes identification. The results are quite reliable as evidence can be found in the literature to support our prediction. Our approach can provide more biological insights into the pathogenesis of schizophrenia. %M 37966892 %R 10.2196/50998 %U https://formative.jmir.org/2023/1/e50998 %U https://doi.org/10.2196/50998 %U http://www.ncbi.nlm.nih.gov/pubmed/37966892 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e50193 %T Classifying Schizophrenia Cases by Artificial Neural Network Using Japanese Web-Based Survey Data: Case-Control Study %A He,Yupeng %A Matsunaga,Masaaki %A Li,Yuanying %A Kishi,Taro %A Tanihara,Shinichi %A Iwata,Nakao %A Tabuchi,Takahiro %A Ota,Atsuhiko %+ Department of Public Health, Fujita Health University School of Medicine, 1-98 Dengakugakubo, Kutsukake-cho, Toyoake, 470-1192, Japan, 81 562 93 2476, yupeng.he@fujita-hu.ac.jp %K artificial neural network %K schizophrenia %K prevalence %K Japan %K web-based survey %K mental health %K psychosis %K machine learning %K epidemiology %D 2023 %7 15.11.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: In Japan, challenges were reported in accurately estimating the prevalence of schizophrenia among the general population. Retrieving previous studies, we investigated that patients with schizophrenia were more likely to experience poor subjective well-being and various physical, psychiatric, and social comorbidities. These factors might have great potential for precisely classifying schizophrenia cases in order to estimate the prevalence. Machine learning has shown a positive impact on many fields, including epidemiology, due to its high-precision modeling capability. It has been applied in research on mental disorders. However, few studies have applied machine learning technology to the precise classification of schizophrenia cases by variables of demographic and health-related backgrounds, especially using large-scale web-based surveys. Objective: The aim of the study is to construct an artificial neural network (ANN) model that can accurately classify schizophrenia cases from large-scale Japanese web-based survey data and to verify the generalizability of the model. Methods: Data were obtained from a large Japanese internet research pooled panel (Rakuten Insight, Inc) in 2021. A total of 223 individuals, aged 20-75 years, having schizophrenia, and 1776 healthy controls were included. Answers to the questions in a web-based survey were formatted as 1 response variable (self-report diagnosed with schizophrenia) and multiple feature variables (demographic, health-related backgrounds, physical comorbidities, psychiatric comorbidities, and social comorbidities). An ANN was applied to construct a model for classifying schizophrenia cases. Logistic regression (LR) was used as a reference. The performances of the models and algorithms were then compared. Results: The model trained by the ANN performed better than LR in terms of area under the receiver operating characteristic curve (0.86 vs 0.78), accuracy (0.93 vs 0.91), and specificity (0.96 vs 0.94), while the model trained by LR showed better sensitivity (0.63 vs 0.56). Comparing the performances of the ANN and LR, the ANN was better in terms of area under the receiver operating characteristic curve (bootstrapping: 0.847 vs 0.773 and cross-validation: 0.81 vs 0.72), while LR performed better in terms of accuracy (0.894 vs 0.856). Sleep medication use, age, household income, and employment type were the top 4 variables in terms of importance. Conclusions: This study constructed an ANN model to classify schizophrenia cases using web-based survey data. Our model showed a high internal validity. The findings are expected to provide evidence for estimating the prevalence of schizophrenia in the Japanese population and informing future epidemiological studies. %M 37966882 %R 10.2196/50193 %U https://formative.jmir.org/2023/1/e50193 %U https://doi.org/10.2196/50193 %U http://www.ncbi.nlm.nih.gov/pubmed/37966882 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e45660 %T Exploring Perceptions About Paracetamol, Tramadol, and Codeine on Twitter Using Machine Learning: Quantitative and Qualitative Observational Study %A Carabot,Federico %A Donat-Vargas,Carolina %A Santoma-Vilaclara,Javier %A Ortega,Miguel A %A García-Montero,Cielo %A Fraile-Martínez,Oscar %A Zaragoza,Cristina %A Monserrat,Jorge %A Alvarez-Mon,Melchor %A Alvarez-Mon,Miguel Angel %+ Department of Medicine and Medical Specialities, University of Alcalá, Campus Universitario – C/ 19, Av de Madrid, Km 33, 600, Alcalá de Henares, 28871, Spain, 34 622816335, fcarabot@gmail.com %K awareness %K codeine %K machine learning %K pain %K painkiller %K perception %K recreational use %K social media %K twitter %D 2023 %7 14.11.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Paracetamol, codeine, and tramadol are commonly used to manage mild pain, and their availability without prescription or medical consultation raises concerns about potential opioid addiction. Objective: This study aims to explore the perceptions and experiences of Twitter users concerning these drugs. Methods: We analyzed the tweets in English or Spanish mentioning paracetamol, tramadol, or codeine posted between January 2019 and December 2020. Out of 152,056 tweets collected, 49,462 were excluded. The content was categorized using a codebook, distinguishing user types (patients, health care professionals, and institutions), and classifying medical content based on efficacy and adverse effects. Scientific accuracy and nonmedical content themes (commercial, economic, solidarity, and trivialization) were also assessed. A total of 1000 tweets for each drug were manually classified to train, test, and validate machine learning classifiers. Results: Of classifiable tweets, 42,840 mentioned paracetamol and 42,131 mentioned weak opioids (tramadol or codeine). Patients accounted for 73.10% (60,771/83,129) of the tweets, while health care professionals and institutions received the highest like-tweet and tweet-retweet ratios. Medical content distribution significantly differed for each drug (P<.001). Nonmedical content dominated opioid tweets (23,871/32,307, 73.9%), while paracetamol tweets had a higher prevalence of medical content (33,943/50,822, 66.8%). Among medical content tweets, 80.8% (41,080/50,822) mentioned drug efficacy, with only 6.9% (3501/50,822) describing good or sufficient efficacy. Nonmedical content distribution also varied significantly among the different drugs (P<.001). Conclusions: Patients seeking relief from pain are highly interested in the effectiveness of drugs rather than potential side effects. Alarming trends include a significant number of tweets trivializing drug use and recreational purposes, along with a lack of awareness regarding side effects. Monitoring conversations related to analgesics on social media is essential due to common illegal web-based sales and purchases without prescriptions. %M 37962927 %R 10.2196/45660 %U https://www.jmir.org/2023/1/e45660 %U https://doi.org/10.2196/45660 %U http://www.ncbi.nlm.nih.gov/pubmed/37962927 %0 Journal Article %@ 2561-9128 %I JMIR Publications %V 6 %N %P e50188 %T A New Index for the Quantitative Evaluation of Surgical Invasiveness Based on Perioperative Patients’ Behavior Patterns: Machine Learning Approach Using Triaxial Acceleration %A Nakanishi,Kozo %A Goto,Hidenori %+ Department of General Thoracic Surgery, National Hospital Organization Saitama Hospital, 2-1 Suwa, Wako Saitama, 351-0102, Japan, 81 48 462 1101, nakanishi.kozo.tf@mail.hosp.go.jp %K surgery %K invasiveness %K triaxial acceleration %K machine learning %K human activity recognition %K patient-oriented outcome %K video-assisted thoracoscopic surgery %K VATS %K postoperative recovery %K perioperative management %K artificial intelligence %K AI %K mobile phone %D 2023 %7 14.11.2023 %9 Original Paper %J JMIR Perioper Med %G English %X Background: The minimally invasive nature of thoracoscopic surgery is well recognized; however, the absence of a reliable evaluation method remains challenging. We hypothesized that the postoperative recovery speed is closely linked to surgical invasiveness, where recovery signifies the patient’s behavior transition back to their preoperative state during the perioperative period. Objective: This study aims to determine whether machine learning using triaxial acceleration data can effectively capture perioperative behavior changes and establish a quantitative index for quantifying variations in surgical invasiveness. Methods: We trained 7 distinct machine learning models using a publicly available human acceleration data set as supervised data. The 3 top-performing models were selected to predict patient actions, as determined by the Matthews correlation coefficient scores. Two patients who underwent different levels of invasive thoracoscopic surgery were selected as participants. Acceleration data were collected via chest sensors for 8 hours during the preoperative and postoperative hospitalization days. These data were categorized into 4 actions (walking, standing, sitting, and lying down) using the selected models. The actions predicted by the model with intermediate results were adopted as the actions of the participants. The daily appearance probability was calculated for each action. The 2 differences between 2 appearance probabilities (sitting vs standing and lying down vs walking) were calculated using 2 coordinates on the x- and y-axes. A 2D vector composed of coordinate values was defined as the index of behavior pattern (iBP) for the day. All daily iBPs were graphed, and the enclosed area and distance between points were calculated and compared between participants to assess the relationship between changes in the indices and invasiveness. Results: Patients 1 and 2 underwent lung lobectomy and incisional tumor biopsy, respectively. The selected predictive model was a light-gradient boosting model (mean Matthews correlation coefficient 0.98, SD 0.0027; accuracy: 0.98). The acceleration data yielded 548,466 points for patient 1 and 466,407 points for patient 2. The iBPs of patient 1 were [(0.32, 0.19), (–0.098, 0.46), (–0.15, 0.13), (–0.049, 0.22)] and those of patient 2 were [(0.55, 0.30), (0.77, 0.21), (0.60, 0.25), (0.61, 0.31)]. The enclosed areas were 0.077 and 0.0036 for patients 1 and 2, respectively. Notably, the distances for patient 1 were greater than those for patient 2 ({0.44, 0.46, 0.37, 0.26} vs {0.23, 0.0065, 0.059}; P=.03 [Mann-Whitney U test]). Conclusions: The selected machine learning model effectively predicted the actions of the surgical patients with high accuracy. The temporal distribution of action times revealed changes in behavior patterns during the perioperative phase. The proposed index may facilitate the recognition and visualization of perioperative changes in patients and differences in surgical invasiveness. %M 37962919 %R 10.2196/50188 %U https://periop.jmir.org/2023/1/e50188 %U https://doi.org/10.2196/50188 %U http://www.ncbi.nlm.nih.gov/pubmed/37962919 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 6 %N %P e50409 %T Assessing the Accuracy and Comprehensiveness of ChatGPT in Offering Clinical Guidance for Atopic Dermatitis and Acne Vulgaris %A Lakdawala,Nehal %A Channa,Leelakrishna %A Gronbeck,Christian %A Lakdawala,Nikita %A Weston,Gillian %A Sloan,Brett %A Feng,Hao %+ Department of Dermatology, University of Connecticut Health Center, 21 South Rd, Farmington, CT, 06032, United States, 1 8606794600, haofeng625@gmail.com %K ChatGPT %K artificial intelligence %K dermatology %K clinical guidance %K counseling %K atopic dermatitis %K acne vulgaris %K skin %K acne %K dermatitis %K NLP %K natural language processing %K dermatologic %K dermatological %K recommendation %K recommendations %K guidance %K advise %K counsel %K response %K responses %K chatbot %K chatbots %K conversational agent %K conversational agents %K answer %K answers %K computer generated %K automated %D 2023 %7 14.11.2023 %9 Research Letter %J JMIR Dermatol %G English %X %M 37962920 %R 10.2196/50409 %U https://derma.jmir.org/2023/1/e50409 %U https://doi.org/10.2196/50409 %U http://www.ncbi.nlm.nih.gov/pubmed/37962920 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e44763 %T Machine Learning Algorithms Predict Successful Weaning From Mechanical Ventilation Before Intubation: Retrospective Analysis From the Medical Information Mart for Intensive Care IV Database %A Kim,Jinchul %A Kim,Yun Kwan %A Kim,Hyeyeon %A Jung,Hyojung %A Koh,Soonjeong %A Kim,Yujeong %A Yoon,Dukyong %A Yi,Hahn %A Kim,Hyung-Jun %+ Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Seoul National University Bundang Hospital, 82 Gumi-ro 173beon-gil, Bundang-gu, Seongnam, 13620, Republic of Korea, 82 31 787 7844, dr.hjkim@snubh.org %K algorithms %K clinical decision-making %K intensive care units %K noninvasive ventilation %K organ dysfunction scores %D 2023 %7 14.11.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: The prediction of successful weaning from mechanical ventilation (MV) in advance of intubation can facilitate discussions regarding end-of-life care before unnecessary intubation. Objective: We aimed to develop a machine learning–based model that predicts successful weaning from ventilator support based on routine clinical and laboratory data taken before or immediately after intubation. Methods: We used the Medical Information Mart for Intensive Care IV database, which is an open-access database covering 524,740 admissions of 382,278 patients in Beth Israel Deaconess Medical Center, United States, from 2008 to 2019. We selected adult patients who underwent MV in the intensive care unit (ICU). Clinical and laboratory variables that are considered relevant to the prognosis of the patient in the ICU were selected. Data collected before or within 24 hours of intubation were used to develop machine learning models that predict the probability of successful weaning within 14 days of ventilator support. Developed models were integrated into an ensemble model. Performance metrics were calculated by 5-fold cross-validation for each model, and a permutation feature importance and Shapley additive explanations analysis was conducted to better understand the impacts of individual variables on outcome prediction. Results: Of the 23,242 patients, 19,025 (81.9%) patients were successfully weaned from MV within 14 days. Using the preselected 46 clinical and laboratory variables, the area under the receiver operating characteristic curve of CatBoost classifier, random forest classifier, and regularized logistic regression classifier models were 0.860 (95% CI 0.852-0.868), 0.855 (95% CI 0.848-0.863), and 0.823 (95% CI 0.813-0.832), respectively. Using the ensemble voting classifier using the 3 models above, the final model revealed the area under the receiver operating characteristic curve of 0.861 (95% CI 0.853-0.869), which was significantly better than that of Simplified Acute Physiology Score II (0.749, 95% CI 0.742-0.756) and Sequential Organ Failure Assessment (0.588, 95% CI 0.566-0.609). The top features included lactate and anion gap. The model’s performance achieved a plateau with approximately the top 21 variables. Conclusions: We developed machine learning algorithms that can predict successful weaning from MV in advance to intubation in the ICU. Our models can aid the appropriate management for patients who hesitate to decide on ventilator support or meaningless end-of-life care. %M 37962939 %R 10.2196/44763 %U https://formative.jmir.org/2023/1/e44763 %U https://doi.org/10.2196/44763 %U http://www.ncbi.nlm.nih.gov/pubmed/37962939 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e42259 %T Development and Validation of a Prognostic Classification Model Predicting Postoperative Adverse Outcomes in Older Surgical Patients Using a Machine Learning Algorithm: Retrospective Observational Network Study %A Choi,Jung-Yeon %A Yoo,Sooyoung %A Song,Wongeun %A Kim,Seok %A Baek,Hyunyoung %A Lee,Jun Suh %A Yoon,Yoo-Seok %A Yoon,Seonghae %A Lee,Hae-Young %A Kim,Kwang-il %+ Departmentof Internal Medicine, Seoul National University Bundang Hospital, 82 Gumi-ro, 173 Beon-gil, Bundang-gu, Seongnam-si, 13620, Republic of Korea, 82 31 787 7032, kikim907@snu.ac.kr %K CDM %K common data model %K patient-level prediction %K OHDSI %K Observational Health Data Sciences and Informatics %K postoperative outcome %K postoperative %K surgery %K elderly %K elder %K predict %K adverse event %K adverse outcome %K geriatric %K older adult %K ageing %K model %K algorithm %D 2023 %7 13.11.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Older adults are at an increased risk of postoperative morbidity. Numerous risk stratification tools exist, but effort and manpower are required. Objective: This study aimed to develop a predictive model of postoperative adverse outcomes in older patients following general surgery with an open-source, patient-level prediction from the Observational Health Data Sciences and Informatics for internal and external validation. Methods: We used the Observational Medical Outcomes Partnership common data model and machine learning algorithms. The primary outcome was a composite of 90-day postoperative all-cause mortality and emergency department visits. Secondary outcomes were postoperative delirium, prolonged postoperative stay (≥75th percentile), and prolonged hospital stay (≥21 days). An 80% versus 20% split of the data from the Seoul National University Bundang Hospital (SNUBH) and Seoul National University Hospital (SNUH) common data model was used for model training and testing versus external validation. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC) with a 95% CI. Results: Data from 27,197 (SNUBH) and 32,857 (SNUH) patients were analyzed. Compared to the random forest, Adaboost, and decision tree models, the least absolute shrinkage and selection operator logistic regression model showed good internal discriminative accuracy (internal AUC 0.723, 95% CI 0.701-0.744) and transportability (external AUC 0.703, 95% CI 0.692-0.714) for the primary outcome. The model also possessed good internal and external AUCs for postoperative delirium (internal AUC 0.754, 95% CI 0.713-0.794; external AUC 0.750, 95% CI 0.727-0.772), prolonged postoperative stay (internal AUC 0.813, 95% CI 0.800-0.825; external AUC 0.747, 95% CI 0.741-0.753), and prolonged hospital stay (internal AUC 0.770, 95% CI 0.749-0.792; external AUC 0.707, 95% CI 0.696-0.718). Compared with age or the Charlson comorbidity index, the model showed better prediction performance. Conclusions: The derived model shall assist clinicians and patients in understanding the individualized risks and benefits of surgery. %M 37955965 %R 10.2196/42259 %U https://www.jmir.org/2023/1/e42259 %U https://doi.org/10.2196/42259 %U http://www.ncbi.nlm.nih.gov/pubmed/37955965 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e50328 %T A Mobile App That Addresses Interpretability Challenges in Machine Learning–Based Diabetes Predictions: Survey-Based User Study %A Hendawi,Rasha %A Li,Juan %A Roy,Souradip %+ North Dakota State University, 1340 Administration Ave, Fargo, ND, 58105, United States, 1 (701) 231 8011, j.li@ndsu.edu %K disease prediction %K explainable AI %K artificial intelligence %K knowledge graph %K machine learning %K ontology %K diabetes %D 2023 %7 13.11.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Machine learning approaches, including deep learning, have demonstrated remarkable effectiveness in the diagnosis and prediction of diabetes. However, these approaches often operate as opaque black boxes, leaving health care providers in the dark about the reasoning behind predictions. This opacity poses a barrier to the widespread adoption of machine learning in diabetes and health care, leading to confusion and eroding trust. Objective: This study aimed to address this critical issue by developing and evaluating an explainable artificial intelligence (AI) platform, XAI4Diabetes, designed to empower health care professionals with a clear understanding of AI-generated predictions and recommendations for diabetes care. XAI4Diabetes not only delivers diabetes risk predictions but also furnishes easily interpretable explanations for complex machine learning models and their outcomes. Methods: XAI4Diabetes features a versatile multimodule explanation framework that leverages machine learning, knowledge graphs, and ontologies. The platform comprises the following four essential modules: (1) knowledge base, (2) knowledge matching, (3) prediction, and (4) interpretation. By harnessing AI techniques, XAI4Diabetes forecasts diabetes risk and provides valuable insights into the prediction process and outcomes. A structured, survey-based user study assessed the app’s usability and influence on participants’ comprehension of machine learning predictions in real-world patient scenarios. Results: A prototype mobile app was meticulously developed and subjected to thorough usability studies and satisfaction surveys. The evaluation study findings underscore the substantial improvement in medical professionals’ comprehension of key aspects, including the (1) diabetes prediction process, (2) data sets used for model training, (3) data features used, and (4) relative significance of different features in prediction outcomes. Most participants reported heightened understanding of and trust in AI predictions following their use of XAI4Diabetes. The satisfaction survey results further revealed a high level of overall user satisfaction with the tool. Conclusions: This study introduces XAI4Diabetes, a versatile multi-model explainable prediction platform tailored to diabetes care. By enabling transparent diabetes risk predictions and delivering interpretable insights, XAI4Diabetes empowers health care professionals to comprehend the AI-driven decision-making process, thereby fostering transparency and trust. These advancements hold the potential to mitigate biases and facilitate the broader integration of AI in diabetes care. %M 37955948 %R 10.2196/50328 %U https://formative.jmir.org/2023/1/e50328 %U https://doi.org/10.2196/50328 %U http://www.ncbi.nlm.nih.gov/pubmed/37955948 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e49877 %T ChatGPT Interactive Medical Simulations for Early Clinical Education: Case Study %A Scherr,Riley %A Halaseh,Faris F %A Spina,Aidin %A Andalib,Saman %A Rivera,Ronald %+ Irvine School of Medicine, University of California, 1001 Health Sciences Rd, Irvine, CA, 92617, United States, 1 949 824 6119, rscherr@hs.uci.edu %K ChatGPT %K medical school simulations %K preclinical curriculum %K artificial intelligence %K AI %K AI in medical education %K medical education %K simulation %K generative %K curriculum %K clinical education %K simulations %D 2023 %7 10.11.2023 %9 Original Paper %J JMIR Med Educ %G English %X Background: The transition to clinical clerkships can be difficult for medical students, as it requires the synthesis and application of preclinical information into diagnostic and therapeutic decisions. ChatGPT—a generative language model with many medical applications due to its creativity, memory, and accuracy—can help students in this transition. Objective: This paper models ChatGPT 3.5’s ability to perform interactive clinical simulations and shows this tool’s benefit to medical education. Methods: Simulation starting prompts were refined using ChatGPT 3.5 in Google Chrome. Starting prompts were selected based on assessment format, stepwise progression of simulation events and questions, free-response question type, responsiveness to user inputs, postscenario feedback, and medical accuracy of the feedback. The chosen scenarios were advanced cardiac life support and medical intensive care (for sepsis and pneumonia). Results: Two starting prompts were chosen. Prompt 1 was developed through 3 test simulations and used successfully in 2 simulations. Prompt 2 was developed through 10 additional test simulations and used successfully in 1 simulation. Conclusions: ChatGPT is capable of creating simulations for early clinical education. These simulations let students practice novel parts of the clinical curriculum, such as forming independent diagnostic and therapeutic impressions over an entire patient encounter. Furthermore, the simulations can adapt to user inputs in a way that replicates real life more accurately than premade question bank clinical vignettes. Finally, ChatGPT can create potentially unlimited free simulations with specific feedback, which increases access for medical students with lower socioeconomic status and underresourced medical schools. However, no tool is perfect, and ChatGPT is no exception; there are concerns about simulation accuracy and replicability that need to be addressed to further optimize ChatGPT’s performance as an educational resource. %M 37948112 %R 10.2196/49877 %U https://mededu.jmir.org/2023/1/e49877 %U https://doi.org/10.2196/49877 %U http://www.ncbi.nlm.nih.gov/pubmed/37948112 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e49459 %T Strengths and Weaknesses of ChatGPT Models for Scientific Writing About Medical Vitamin B12: Mixed Methods Study %A Abuyaman,Omar %+ Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, FAMS Bldg, 2nd fl, Zarqa, 13133, Jordan, 962 781074280, o.abuyaman@gmail.com %K AI %K ChatGPT %K GPT-4 %K GPT-3.5 %K vitamin B12 %K artificial intelligence %K language editing %K wide range information %K AI solutions %K scientific content %D 2023 %7 10.11.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: ChatGPT is a large language model developed by OpenAI designed to generate human-like responses to prompts. Objective: This study aims to evaluate the ability of GPT-4 to generate scientific content and assist in scientific writing using medical vitamin B12 as the topic. Furthermore, the study will compare the performance of GPT-4 to its predecessor, GPT-3.5. Methods: The study examined responses from GPT-4 and GPT-3.5 to vitamin B12–related prompts, focusing on their quality and characteristics and comparing them to established scientific literature. Results: The results indicated that GPT-4 can potentially streamline scientific writing through its ability to edit language and write abstracts, keywords, and abbreviation lists. However, significant limitations of ChatGPT were revealed, including its inability to identify and address bias, inability to include recent information, lack of transparency, and inclusion of inaccurate information. Additionally, it cannot check for plagiarism or provide proper references. The accuracy of GPT-4’s answers was found to be superior to GPT-3.5. Conclusions: ChatGPT can be considered a helpful assistant in the writing process but not a replacement for a scientist’s expertise. Researchers must remain aware of its limitations and use it appropriately. The improvements in consecutive ChatGPT versions suggest the possibility of overcoming some present limitations in the near future. %M 37948100 %R 10.2196/49459 %U https://formative.jmir.org/2023/1/e49459 %U https://doi.org/10.2196/49459 %U http://www.ncbi.nlm.nih.gov/pubmed/37948100 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e51300 %T An AI Dietitian for Type 2 Diabetes Mellitus Management Based on Large Language and Image Recognition Models: Preclinical Concept Validation Study %A Sun,Haonan %A Zhang,Kai %A Lan,Wei %A Gu,Qiufeng %A Jiang,Guangxiang %A Yang,Xue %A Qin,Wanli %A Han,Dongran %+ School of Life Science, Beijing University of Chinese Medicine, Scientific Research Building #542, Beijing, 102401, China, 86 13466590473, 18811570951@163.com %K ChatGPT %K artificial intelligence %K AI %K diabetes %K diabetic %K nutrition %K nutritional %K diet %K dietary %K dietician %K medical nutrition therapy %K ingredient recognition %K digital health %K language model %K image recognition %K machine learning %K deep learning %K NLP %K natural language processing %K meal %K recommendation %K meals %K food %K GPT 4.0 %D 2023 %7 9.11.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Nutritional management for patients with diabetes in China is a significant challenge due to the low supply of registered clinical dietitians. To address this, an artificial intelligence (AI)–based nutritionist program that uses advanced language and image recognition models was created. This program can identify ingredients from images of a patient’s meal and offer nutritional guidance and dietary recommendations. Objective: The primary objective of this study is to evaluate the competence of the models that support this program. Methods: The potential of an AI nutritionist program for patients with type 2 diabetes mellitus (T2DM) was evaluated through a multistep process. First, a survey was conducted among patients with T2DM and endocrinologists to identify knowledge gaps in dietary practices. ChatGPT and GPT 4.0 were then tested through the Chinese Registered Dietitian Examination to assess their proficiency in providing evidence-based dietary advice. ChatGPT’s responses to common questions about medical nutrition therapy were compared with expert responses by professional dietitians to evaluate its proficiency. The model’s food recommendations were scrutinized for consistency with expert advice. A deep learning–based image recognition model was developed for food identification at the ingredient level, and its performance was compared with existing models. Finally, a user-friendly app was developed, integrating the capabilities of language and image recognition models to potentially improve care for patients with T2DM. Results: Most patients (182/206, 88.4%) demanded more immediate and comprehensive nutritional management and education. Both ChatGPT and GPT 4.0 passed the Chinese Registered Dietitian examination. ChatGPT’s food recommendations were mainly in line with best practices, except for certain foods like root vegetables and dry beans. Professional dietitians’ reviews of ChatGPT’s responses to common questions were largely positive, with 162 out of 168 providing favorable reviews. The multilabel image recognition model evaluation showed that the Dino V2 model achieved an average F1 score of 0.825, indicating high accuracy in recognizing ingredients. Conclusions: The model evaluations were promising. The AI-based nutritionist program is now ready for a supervised pilot study. %M 37943581 %R 10.2196/51300 %U https://www.jmir.org/2023/1/e51300 %U https://doi.org/10.2196/51300 %U http://www.ncbi.nlm.nih.gov/pubmed/37943581 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e48521 %T Clinical Prediction Models for Hospital-Induced Delirium Using Structured and Unstructured Electronic Health Record Data: Protocol for a Development and Validation Study %A Ser,Sarah E %A Shear,Kristen %A Snigurska,Urszula A %A Prosperi,Mattia %A Wu,Yonghui %A Magoc,Tanja %A Bjarnadottir,Ragnhildur I %A Lucero,Robert J %+ Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, 2004 Mowry Road, Gainesville, FL, 32610, United States, 1 3522735468, sser@ufl.edu %K big data %K machine learning %K data science %K hospital-acquired condition %K hospital induced %K hospital acquired %K predict %K predictive %K prediction %K model %K models %K natural language processing %K risk factors %K delirium %K risk %K unstructured %K structured %K free text %K clinical text %K text data %D 2023 %7 9.11.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: Hospital-induced delirium is one of the most common and costly iatrogenic conditions, and its incidence is predicted to increase as the population of the United States ages. An academic and clinical interdisciplinary systems approach is needed to reduce the frequency and impact of hospital-induced delirium. Objective: The long-term goal of our research is to enhance the safety of hospitalized older adults by reducing iatrogenic conditions through an effective learning health system. In this study, we will develop models for predicting hospital-induced delirium. In order to accomplish this objective, we will create a computable phenotype for our outcome (hospital-induced delirium), design an expert-based traditional logistic regression model, leverage machine learning techniques to generate a model using structured data, and use machine learning and natural language processing to produce an integrated model with components from both structured data and text data. Methods: This study will explore text-based data, such as nursing notes, to improve the predictive capability of prognostic models for hospital-induced delirium. By using supervised and unsupervised text mining in addition to structured data, we will examine multiple types of information in electronic health record data to predict medical-surgical patient risk of developing delirium. Development and validation will be compliant to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement. Results: Work on this project will take place through March 2024. For this study, we will use data from approximately 332,230 encounters that occurred between January 2012 to May 2021. Findings from this project will be disseminated at scientific conferences and in peer-reviewed journals. Conclusions: Success in this study will yield a durable, high-performing research-data infrastructure that will process, extract, and analyze clinical text data in near real time. This model has the potential to be integrated into the electronic health record and provide point-of-care decision support to prevent harm and improve quality of care. International Registered Report Identifier (IRRID): DERR1-10.2196/48521 %M 37943599 %R 10.2196/48521 %U https://www.researchprotocols.org/2023/1/e48521 %U https://doi.org/10.2196/48521 %U http://www.ncbi.nlm.nih.gov/pubmed/37943599 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 10 %N %P e47913 %T Usability and Overall Perception of a Health Bot for Nutrition-Related Questions for Patients Receiving Bariatric Care: Mixed Methods Study %A Beyeler,Marina %A Légeret,Corinne %A Kiwitz,Fabian %A van der Horst,Klazine %+ Nutrition and Dietetics, School of Health Professions, Bern University of Applied Sciences, Murtenstrasse 10, Bern, 3008, Switzerland, 41 799576535, klazine.vanderhorst@bfh.ch %K bariatric surgery %K nutrition information %K usability %K satisfaction %K artificial intelligence %K health bot %K mobile phone %D 2023 %7 8.11.2023 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Currently, over 4000 bariatric procedures are performed annually in Switzerland. To improve outcomes, patients need to have good knowledge regarding postoperative nutrition. To potentially provide them with knowledge between dietetic consultations, a health bot (HB) was created. The HB can answer bariatric nutrition questions in writing based on artificial intelligence. Objective: This study aims to evaluate the usability and perception of the HB among patients receiving bariatric care. Methods: Patients before or after bariatric surgery tested the HB. A mixed methods approach was used, which consisted of a questionnaire and qualitative interviews before and after testing the HB. The dimensions usability of, usefulness of, satisfaction with, and ease of use of the HB, among others, were measured. Data were analyzed using R Studio (R Studio Inc) and Excel (Microsoft Corp). The interviews were transcribed and a summary inductive content analysis was performed. Results: A total of 12 patients (female: n=8, 67%; male: n=4, 33%) were included. The results showed excellent usability with a mean usability score of 87 (SD 12.5; range 57.5-100) out of 100. Other dimensions of acceptability included usefulness (mean 5.28, SD 2.02 out of 7), satisfaction (mean 5.75, SD 1.68 out of 7), and learnability (mean 6.26, SD 1.5 out of 7). The concept of the HB and availability of reliable nutrition information were perceived as desirable (mean 5.5, SD 1.64 out of 7). Weaknesses were identified in the response accuracy, limited knowledge, and design of the HB. Conclusions: The HB’s ease of use and usability were evaluated to be positive; response accuracy, topic selection, and design should be optimized in a next step. The perceptions of nutrition professionals and the impact on patient care and the nutrition knowledge of participants need to be examined in further studies. %M 37938894 %R 10.2196/47913 %U https://humanfactors.jmir.org/2023/1/e47913 %U https://doi.org/10.2196/47913 %U http://www.ncbi.nlm.nih.gov/pubmed/37938894 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e50216 %T A Framework to Guide Implementation of AI in Health Care: Protocol for a Cocreation Research Project %A Nilsen,Per %A Svedberg,Petra %A Neher,Margit %A Nair,Monika %A Larsson,Ingrid %A Petersson,Lena %A Nygren,Jens %+ School of Health and Welfare, Halmstad University, Box 823, Halmstad, 30118, Sweden, 46 706341151, per.nilsen@liu.se %K artificial intelligence %K AI %K health care %K implementation %K process models %K frameworks %K framework %K process model %D 2023 %7 8.11.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: Artificial intelligence (AI) has the potential in health care to transform patient care and administrative processes, yet health care has been slow to adopt AI due to many types of barriers. Implementation science has shown the importance of structured implementation processes to overcome implementation barriers. However, there is a lack of knowledge and tools to guide such processes when implementing AI-based applications in health care. Objective: The aim of this protocol is to describe the development, testing, and evaluation of a framework, “Artificial Intelligence-Quality Implementation Framework” (AI-QIF), intended to guide decisions and activities related to the implementation of various AI-based applications in health care. Methods: The paper outlines the development of an AI implementation framework for broad use in health care based on the Quality Implementation Framework (QIF). QIF is a process model developed in implementation science. The model guides the user to consider implementation-related issues in a step-by-step design and plan and perform activities that support implementation. This framework was chosen for its adaptability, usability, broad scope, and detailed guidance concerning important activities and considerations for successful implementation. The development will proceed in 5 phases with primarily qualitative methods being used. The process starts with phase I, in which an AI-adapted version of QIF is created (AI-QIF). Phase II will produce a digital mockup of the AI-QIF. Phase III will involve the development of a prototype of the AI-QIF with an intuitive user interface. Phase IV is dedicated to usability testing of the prototype in health care environments. Phase V will focus on evaluating the usability and effectiveness of the AI-QIF. Cocreation is a guiding principle for the project and is an important aspect in 4 of the 5 development phases. The cocreation process will enable the use of both on research-based and practice-based knowledge. Results: The project is being conducted within the frame of a larger research program, with the overall objective of developing theoretically and empirically informed frameworks to support AI implementation in routine health care. The program was launched in 2021 and has carried out numerous research activities. The development of AI-QIF as a tool to guide the implementation of AI-based applications in health care will draw on knowledge and experience acquired from these activities. The framework is being developed over 2 years, from January 2023 to December 2024. It is under continuous development and refinement. Conclusions: The development of the AI implementation framework, AI-QIF, described in this study protocol aims to facilitate the implementation of AI-based applications in health care based on the premise that implementation processes benefit from being well-prepared and structured. The framework will be coproduced to enhance its relevance, validity, usefulness, and potential value for application in practice. International Registered Report Identifier (IRRID): DERR1-10.2196/50216 %M 37938896 %R 10.2196/50216 %U https://www.researchprotocols.org/2023/1/e50216 %U https://doi.org/10.2196/50216 %U http://www.ncbi.nlm.nih.gov/pubmed/37938896 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e47191 %T Assessing the Performance of ChatGPT in Medical Biochemistry Using Clinical Case Vignettes: Observational Study %A Surapaneni,Krishna Mohan %+ Panimalar Medical College Hospital & Research Institute, Varadharajapuram, Poonamallee, Chennai, 600123, India, 91 9789099989, krishnamohan.surapaneni@gmail.com %K ChatGPT %K artificial intelligence %K medical education %K medical Biochemistry %K biochemistry %K chatbot %K case study %K case scenario %K medical exam %K medical examination %K computer generated %D 2023 %7 7.11.2023 %9 Short Paper %J JMIR Med Educ %G English %X Background: ChatGPT has gained global attention recently owing to its high performance in generating a wide range of information and retrieving any kind of data instantaneously. ChatGPT has also been tested for the United States Medical Licensing Examination (USMLE) and has successfully cleared it. Thus, its usability in medical education is now one of the key discussions worldwide. Objective: The objective of this study is to evaluate the performance of ChatGPT in medical biochemistry using clinical case vignettes. Methods: The performance of ChatGPT was evaluated in medical biochemistry using 10 clinical case vignettes. Clinical case vignettes were randomly selected and inputted in ChatGPT along with the response options. We tested the responses for each clinical case twice. The answers generated by ChatGPT were saved and checked using our reference material. Results: ChatGPT generated correct answers for 4 questions on the first attempt. For the other cases, there were differences in responses generated by ChatGPT in the first and second attempts. In the second attempt, ChatGPT provided correct answers for 6 questions and incorrect answers for 4 questions out of the 10 cases that were used. But, to our surprise, for case 3, different answers were obtained with multiple attempts. We believe this to have happened owing to the complexity of the case, which involved addressing various critical medical aspects related to amino acid metabolism in a balanced approach. Conclusions: According to the findings of our study, ChatGPT may not be considered an accurate information provider for application in medical education to improve learning and assessment. However, our study was limited by a small sample size (10 clinical case vignettes) and the use of the publicly available version of ChatGPT (version 3.5). Although artificial intelligence (AI) has the capability to transform medical education, we emphasize the validation of such data produced by such AI systems for correctness and dependability before it could be implemented in practice. %M 37934568 %R 10.2196/47191 %U https://mededu.jmir.org/2023/1/e47191 %U https://doi.org/10.2196/47191 %U http://www.ncbi.nlm.nih.gov/pubmed/37934568 %0 Journal Article %@ 2561-1011 %I JMIR Publications %V 7 %N %P e44732 %T Physician- and Patient-Elicited Barriers and Facilitators to Implementation of a Machine Learning–Based Screening Tool for Peripheral Arterial Disease: Preimplementation Study With Physician and Patient Stakeholders %A Ho,Vy %A Brown Johnson,Cati %A Ghanzouri,Ilies %A Amal,Saeed %A Asch,Steven %A Ross,Elsie %+ Division of Vascular Surgery, Department of Surgery, Stanford University School of Medicine, 500 Pasteur Drive, Stanford, CA, 94043, United States, 1 6507232185, vivianho@stanford.edu %K artificial intelligence %K cardiovascular disease %K machine learning %K peripheral arterial disease %K preimplementation study %D 2023 %7 6.11.2023 %9 Original Paper %J JMIR Cardio %G English %X Background: Peripheral arterial disease (PAD) is underdiagnosed, partially due to a high prevalence of atypical symptoms and a lack of physician and patient awareness. Implementing clinical decision support tools powered by machine learning algorithms may help physicians identify high-risk patients for diagnostic workup. Objective: This study aims to evaluate barriers and facilitators to the implementation of a novel machine learning–based screening tool for PAD among physician and patient stakeholders using the Consolidated Framework for Implementation Research (CFIR). Methods: We performed semistructured interviews with physicians and patients from the Stanford University Department of Primary Care and Population Health, Division of Cardiology, and Division of Vascular Medicine. Participants answered questions regarding their perceptions toward machine learning and clinical decision support for PAD detection. Rapid thematic analysis was performed using templates incorporating codes from CFIR constructs. Results: A total of 12 physicians (6 primary care physicians and 6 cardiovascular specialists) and 14 patients were interviewed. Barriers to implementation arose from 6 CFIR constructs: complexity, evidence strength and quality, relative priority, external policies and incentives, knowledge and beliefs about intervention, and individual identification with the organization. Facilitators arose from 5 CFIR constructs: intervention source, relative advantage, learning climate, patient needs and resources, and knowledge and beliefs about intervention. Physicians felt that a machine learning–powered diagnostic tool for PAD would improve patient care but cited limited time and authority in asking patients to undergo additional screening procedures. Patients were interested in having their physicians use this tool but raised concerns about such technologies replacing human decision-making. Conclusions: Patient- and physician-reported barriers toward the implementation of a machine learning–powered PAD diagnostic tool followed four interdependent themes: (1) low familiarity or urgency in detecting PAD; (2) concerns regarding the reliability of machine learning; (3) differential perceptions of responsibility for PAD care among primary care versus specialty physicians; and (4) patient preference for physicians to remain primary interpreters of health care data. Facilitators followed two interdependent themes: (1) enthusiasm for clinical use of the predictive model and (2) willingness to incorporate machine learning into clinical care. Implementation of machine learning–powered diagnostic tools for PAD should leverage provider support while simultaneously educating stakeholders on the importance of early PAD diagnosis. High predictive validity is necessary for machine learning models but not sufficient for implementation. %M 37930755 %R 10.2196/44732 %U https://cardio.jmir.org/2023/1/e44732 %U https://doi.org/10.2196/44732 %U http://www.ncbi.nlm.nih.gov/pubmed/37930755 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 10 %N %P e49788 %T Perspectives of Patients With Chronic Diseases on Future Acceptance of AI–Based Home Care Systems: Cross-Sectional Web-Based Survey Study %A Wang,Bijun %A Asan,Onur %A Mansouri,Mo %+ School of Systems and Enterprises, Stevens Institue of Technology, 1 Castle Point Terrace, Hoboken, NJ, 07030, United States, 1 4145264330, oasan@stevens.edu %K consumer informatics %K artificial intelligence %K AI %K technology acceptance model %K adoption %K chronic %K motivation %K cross-sectional %K home care %K perception %K perceptions %K attitude %K attitudes %K intent %K intention %D 2023 %7 6.11.2023 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Artificial intelligence (AI)–based home care systems and devices are being gradually integrated into health care delivery to benefit patients with chronic diseases. However, existing research mainly focuses on the technical and clinical aspects of AI application, with an insufficient investigation of patients’ motivation and intention to adopt such systems. Objective: This study aimed to examine the factors that affect the motivation of patients with chronic diseases to adopt AI-based home care systems and provide empirical evidence for the proposed research hypotheses. Methods: We conducted a cross-sectional web-based survey with 222 patients with chronic diseases based on a hypothetical scenario. Results: The results indicated that patients have an overall positive perception of AI-based home care systems. Their attitudes toward the technology, perceived usefulness, and comfortability were found to be significant factors encouraging adoption, with a clear understanding of accountability being a particularly influential factor in shaping patients’ attitudes toward their motivation to use these systems. However, privacy concerns persist as an indirect factor, affecting the perceived usefulness and comfortability, hence influencing patients’ attitudes. Conclusions: This study is one of the first to examine the motivation of patients with chronic diseases to adopt AI-based home care systems, offering practical insights for policy makers, care or technology providers, and patients. This understanding can facilitate effective policy formulation, product design, and informed patient decision-making, potentially improving the overall health status of patients with chronic diseases. %M 37930780 %R 10.2196/49788 %U https://humanfactors.jmir.org/2023/1/e49788 %U https://doi.org/10.2196/49788 %U http://www.ncbi.nlm.nih.gov/pubmed/37930780 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e52865 %T The Impact of Multimodal Large Language Models on Health Care’s Future %A Meskó,Bertalan %+ The Medical Futurist Intitute, Povl Bang-Jensen u. 2/B1. 4/1., Budapest, XI., 1118, Hungary, 36 703807260, berci@medicalfuturist.com %K artificial intelligence %K ChatGPT %K digital health %K future %K GPT-4 %K Generative Pre-Trained Transformer %K large language models %K multimodality %K technology %K AI %K LLM %D 2023 %7 2.11.2023 %9 Viewpoint %J J Med Internet Res %G English %X When large language models (LLMs) were introduced to the public at large in late 2022 with ChatGPT (OpenAI), the interest was unprecedented, with more than 1 billion unique users within 90 days. Until the introduction of Generative Pre-trained Transformer 4 (GPT-4) in March 2023, these LLMs only contained a single mode—text. As medicine is a multimodal discipline, the potential future versions of LLMs that can handle multimodality—meaning that they could interpret and generate not only text but also images, videos, sound, and even comprehensive documents—can be conceptualized as a significant evolution in the field of artificial intelligence (AI). This paper zooms in on the new potential of generative AI, a new form of AI that also includes tools such as LLMs, through the achievement of multimodal inputs of text, images, and speech on health care’s future. We present several futuristic scenarios to illustrate the potential path forward as multimodal LLMs (M-LLMs) could represent the gateway between health care professionals and using AI for medical purposes. It is important to point out, though, that despite the unprecedented potential of generative AI in the form of M-LLMs, the human touch in medicine remains irreplaceable. AI should be seen as a tool that can augment health care professionals rather than replace them. It is also important to consider the human aspects of health care—empathy, understanding, and the doctor-patient relationship—when deploying AI. %M 37917126 %R 10.2196/52865 %U https://www.jmir.org/2023/1/e52865 %U https://doi.org/10.2196/52865 %U http://www.ncbi.nlm.nih.gov/pubmed/37917126 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e51421 %T Exploring the Possible Use of AI Chatbots in Public Health Education: Feasibility Study %A Baglivo,Francesco %A De Angelis,Luigi %A Casigliani,Virginia %A Arzilli,Guglielmo %A Privitera,Gaetano Pierpaolo %A Rizzo,Caterina %+ Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, Via San Zeno 35, Pisa (PI), 56123, Italy, 39 3288348649, f.baglivo@studenti.unipi.it %K artificial intelligence %K chatbots %K medical education %K vaccination %K public health %K medical students %K large language model %K generative AI %K ChatGPT %K Google Bard %K AI chatbot %K health education %K public health %K health care %K medical training %K educational support tool %K chatbot model %D 2023 %7 1.11.2023 %9 Original Paper %J JMIR Med Educ %G English %X Background: Artificial intelligence (AI) is a rapidly developing field with the potential to transform various aspects of health care and public health, including medical training. During the “Hygiene and Public Health” course for fifth-year medical students, a practical training session was conducted on vaccination using AI chatbots as an educational supportive tool. Before receiving specific training on vaccination, the students were given a web-based test extracted from the Italian National Medical Residency Test. After completing the test, a critical correction of each question was performed assisted by AI chatbots. Objective: The main aim of this study was to identify whether AI chatbots can be considered educational support tools for training in public health. The secondary objective was to assess the performance of different AI chatbots on complex multiple-choice medical questions in the Italian language. Methods: A test composed of 15 multiple-choice questions on vaccination was extracted from the Italian National Medical Residency Test using targeted keywords and administered to medical students via Google Forms and to different AI chatbot models (Bing Chat, ChatGPT, Chatsonic, Google Bard, and YouChat). The correction of the test was conducted in the classroom, focusing on the critical evaluation of the explanations provided by the chatbot. A Mann-Whitney U test was conducted to compare the performances of medical students and AI chatbots. Student feedback was collected anonymously at the end of the training experience. Results: In total, 36 medical students and 5 AI chatbot models completed the test. The students achieved an average score of 8.22 (SD 2.65) out of 15, while the AI chatbots scored an average of 12.22 (SD 2.77). The results indicated a statistically significant difference in performance between the 2 groups (U=49.5, P<.001), with a large effect size (r=0.69). When divided by question type (direct, scenario-based, and negative), significant differences were observed in direct (P<.001) and scenario-based (P<.001) questions, but not in negative questions (P=.48). The students reported a high level of satisfaction (7.9/10) with the educational experience, expressing a strong desire to repeat the experience (7.6/10). Conclusions: This study demonstrated the efficacy of AI chatbots in answering complex medical questions related to vaccination and providing valuable educational support. Their performance significantly surpassed that of medical students in direct and scenario-based questions. The responsible and critical use of AI chatbots can enhance medical education, making it an essential aspect to integrate into the educational system. %M 37910155 %R 10.2196/51421 %U https://mededu.jmir.org/2023/1/e51421 %U https://doi.org/10.2196/51421 %U http://www.ncbi.nlm.nih.gov/pubmed/37910155 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 10 %N %P e48517 %T Using HIPAA (Health Insurance Portability and Accountability Act)–Compliant Transcription Services for Virtual Psychiatric Interviews: Pilot Comparison Study %A Seyedi,Salman %A Griner,Emily %A Corbin,Lisette %A Jiang,Zifan %A Roberts,Kailey %A Iacobelli,Luca %A Milloy,Aaron %A Boazak,Mina %A Bahrami Rad,Ali %A Abbasi,Ahmed %A Cotes,Robert O %A Clifford,Gari D %+ Department of Biomedical Informatics, Emory University, 100 Woodruff Circle, Atlanta, GA, 30322, United States, 1 404 727 4562, sseyedi@emory.edu %K ASR %K automatic speech recognition %K Health Insurance Portability and Accountability Act %K HIPAA %K Linguistic Inquiry and Word Count %K LIWC %K mental health %K psychiatric interview %K speech to text %K WER %K word error rate %D 2023 %7 31.10.2023 %9 Original Paper %J JMIR Ment Health %G English %X Background: Automatic speech recognition (ASR) technology is increasingly being used for transcription in clinical contexts. Although there are numerous transcription services using ASR, few studies have compared the word error rate (WER) between different transcription services among different diagnostic groups in a mental health setting. There has also been little research into the types of words ASR transcriptions mistakenly generate or omit. Objective: This study compared the WER of 3 ASR transcription services (Amazon Transcribe [Amazon.com, Inc], Zoom-Otter AI [Zoom Video Communications, Inc], and Whisper [OpenAI Inc]) in interviews across 2 different clinical categories (controls and participants experiencing a variety of mental health conditions). These ASR transcription services were also compared with a commercial human transcription service, Rev (Rev.Com, Inc). Words that were either included or excluded by the error in the transcripts were systematically analyzed by their Linguistic Inquiry and Word Count categories. Methods: Participants completed a 1-time research psychiatric interview, which was recorded on a secure server. Transcriptions created by the research team were used as the gold standard from which WER was calculated. The interviewees were categorized into either the control group (n=18) or the mental health condition group (n=47) using the Mini-International Neuropsychiatric Interview. The total sample included 65 participants. Brunner-Munzel tests were used for comparing independent sets, such as the diagnostic groupings, and Wilcoxon signed rank tests were used for correlated samples when comparing the total sample between different transcription services. Results: There were significant differences between each ASR transcription service’s WER (P<.001). Amazon Transcribe’s output exhibited significantly lower WERs compared with the Zoom-Otter AI’s and Whisper’s ASR. ASR performances did not significantly differ across the 2 different clinical categories within each service (P>.05). A comparison between the human transcription service output from Rev and the best-performing ASR (Amazon Transcribe) demonstrated a significant difference (P<.001), with Rev having a slightly lower median WER (7.6%, IQR 5.4%-11.35 vs 8.9%, IQR 6.9%-11.6%). Heat maps and spider plots were used to visualize the most common errors in Linguistic Inquiry and Word Count categories, which were found to be within 3 overarching categories: Conversation, Cognition, and Function. Conclusions: Overall, consistent with previous literature, our results suggest that the WER between manual and automated transcription services may be narrowing as ASR services advance. These advances, coupled with decreased cost and time in receiving transcriptions, may make ASR transcriptions a more viable option within health care settings. However, more research is required to determine if errors in specific types of words impact the analysis and usability of these transcriptions, particularly for specific applications and in a variety of populations in terms of clinical diagnosis, literacy level, accent, and cultural origin. %M 37906217 %R 10.2196/48517 %U https://mental.jmir.org/2023/1/e48517 %U https://doi.org/10.2196/48517 %U http://www.ncbi.nlm.nih.gov/pubmed/37906217 %0 Journal Article %@ 2561-1011 %I JMIR Publications %V 7 %N %P e51375 %T AI Algorithm to Predict Acute Coronary Syndrome in Prehospital Cardiac Care: Retrospective Cohort Study %A de Koning,Enrico %A van der Haas,Yvette %A Saguna,Saguna %A Stoop,Esmee %A Bosch,Jan %A Beeres,Saskia %A Schalij,Martin %A Boogers,Mark %+ Cardiology Department, Leiden University Medical Center, Albinusdreef 2, Leiden, 2333 ZA, Netherlands, 31 715269111, j.m.j.boogers@lumc.nl %K cardiology %K acute coronary syndrome %K Hollands Midden Acute Regional Triage–cardiology %K prehospital %K triage %K artificial intelligence %K natural language processing %K angina %K algorithm %K overcrowding %K emergency department %K clinical decision-making %K emergency medical service %K paramedics %D 2023 %7 31.10.2023 %9 Original Paper %J JMIR Cardio %G English %X Background: Overcrowding of hospitals and emergency departments (EDs) is a growing problem. However, not all ED consultations are necessary. For example, 80% of patients in the ED with chest pain do not have an acute coronary syndrome (ACS). Artificial intelligence (AI) is useful in analyzing (medical) data, and might aid health care workers in prehospital clinical decision-making before patients are presented to the hospital. Objective: The aim of this study was to develop an AI model which would be able to predict ACS before patients visit the ED. The model retrospectively analyzed prehospital data acquired by emergency medical services' nurse paramedics. Methods: Patients presenting to the emergency medical services with symptoms suggestive of ACS between September 2018 and September 2020 were included. An AI model using a supervised text classification algorithm was developed to analyze data. Data were analyzed for all 7458 patients (mean 68, SD 15 years, 54% men). Specificity, sensitivity, positive predictive value (PPV), and negative predictive value (NPV) were calculated for control and intervention groups. At first, a machine learning (ML) algorithm (or model) was chosen; afterward, the features needed were selected and then the model was tested and improved using iterative evaluation and in a further step through hyperparameter tuning. Finally, a method was selected to explain the final AI model. Results: The AI model had a specificity of 11% and a sensitivity of 99.5% whereas usual care had a specificity of 1% and a sensitivity of 99.5%. The PPV of the AI model was 15% and the NPV was 99%. The PPV of usual care was 13% and the NPV was 94%. Conclusions: The AI model was able to predict ACS based on retrospective data from the prehospital setting. It led to an increase in specificity (from 1% to 11%) and NPV (from 94% to 99%) when compared to usual care, with a similar sensitivity. Due to the retrospective nature of this study and the singular focus on ACS it should be seen as a proof-of-concept. Other (possibly life-threatening) diagnoses were not analyzed. Future prospective validation is necessary before implementation. %M 37906226 %R 10.2196/51375 %U https://cardio.jmir.org/2023/1/e51375 %U https://doi.org/10.2196/51375 %U http://www.ncbi.nlm.nih.gov/pubmed/37906226 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e47353 %T Insights on the Current State and Future Outlook of AI in Health Care: Expert Interview Study %A Hummelsberger,Pia %A Koch,Timo K %A Rauh,Sabrina %A Dorn,Julia %A Lermer,Eva %A Raue,Martina %A Hudecek,Matthias F C %A Schicho,Andreas %A Colak,Errol %A Ghassemi,Marzyeh %A Gaube,Susanne %+ LMU Center for Leadership and People Management, Department of Psychology, LMU Munich, Geschwister-Scholl-Platz 1, Munich, 80539, Germany, 49 89 2180 9773, P.Hummelsberger@psy.lmu.de %K artificial intelligence %K AI %K machine learning %K health care %K digital health technology %K technology implementation %K expert interviews %K mixed methods %K topic modeling %D 2023 %7 31.10.2023 %9 Original Paper %J JMIR AI %G English %X Background: Artificial intelligence (AI) is often promoted as a potential solution for many challenges health care systems face worldwide. However, its implementation in clinical practice lags behind its technological development. Objective: This study aims to gain insights into the current state and prospects of AI technology from the stakeholders most directly involved in its adoption in the health care sector whose perspectives have received limited attention in research to date. Methods: For this purpose, the perspectives of AI researchers and health care IT professionals in North America and Western Europe were collected and compared for profession-specific and regional differences. In this preregistered, mixed methods, cross-sectional study, 23 experts were interviewed using a semistructured guide. Data from the interviews were analyzed using deductive and inductive qualitative methods for the thematic analysis along with topic modeling to identify latent topics. Results: Through our thematic analysis, four major categories emerged: (1) the current state of AI systems in health care, (2) the criteria and requirements for implementing AI systems in health care, (3) the challenges in implementing AI systems in health care, and (4) the prospects of the technology. Experts discussed the capabilities and limitations of current AI systems in health care in addition to their prevalence and regional differences. Several criteria and requirements deemed necessary for the successful implementation of AI systems were identified, including the technology’s performance and security, smooth system integration and human-AI interaction, costs, stakeholder involvement, and employee training. However, regulatory, logistical, and technical issues were identified as the most critical barriers to an effective technology implementation process. In the future, our experts predicted both various threats and many opportunities related to AI technology in the health care sector. Conclusions: Our work provides new insights into the current state, criteria, challenges, and outlook for implementing AI technology in health care from the perspective of AI researchers and IT professionals in North America and Western Europe. For the full potential of AI-enabled technologies to be exploited and for them to contribute to solving current health care challenges, critical implementation criteria must be met, and all groups involved in the process must work together. %M 38875571 %R 10.2196/47353 %U https://ai.jmir.org/2023/1/e47353 %U https://doi.org/10.2196/47353 %U http://www.ncbi.nlm.nih.gov/pubmed/38875571 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e50448 %T Clinical Decision Support System for All Stages of Gastric Carcinogenesis in Real-Time Endoscopy: Model Establishment and Validation Study %A Gong,Eun Jeong %A Bang,Chang Seok %A Lee,Jae Jun %A Jeong,Hae Min %A Baik,Gwang Ho %A Jeong,Jae Hoon %A Dick,Sigmund %A Lee,Gi Hun %+ Department of Internal Medicine, Hallym University College of Medicine, Sakju-ro 77, Chuncheon, 24253, Republic of Korea, 82 1052657810, csbang@hallym.ac.kr %K atrophy %K intestinal metaplasia %K metaplasia %K deep learning %K endoscopy %K gastric neoplasms %K neoplasm %K neoplasms %K internal medicine %K cancer %K oncology %K decision support %K real time %K gastrointestinal %K gastric %K intestinal %K machine learning %K clinical decision support system %K CDSS %K computer aided %K diagnosis %K diagnostic %K carcinogenesis %D 2023 %7 30.10.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Our research group previously established a deep-learning–based clinical decision support system (CDSS) for real-time endoscopy-based detection and classification of gastric neoplasms. However, preneoplastic conditions, such as atrophy and intestinal metaplasia (IM) were not taken into account, and there is no established model that classifies all stages of gastric carcinogenesis. Objective: This study aims to build and validate a CDSS for real-time endoscopy for all stages of gastric carcinogenesis, including atrophy and IM. Methods: A total of 11,868 endoscopic images were used for training and internal testing. The primary outcomes were lesion classification accuracy (6 classes: advanced gastric cancer, early gastric cancer, dysplasia, atrophy, IM, and normal) and atrophy and IM lesion segmentation rates for the segmentation model. The following tests were carried out to validate the performance of lesion classification accuracy: (1) external testing using 1282 images from another institution and (2) evaluation of the classification accuracy of atrophy and IM in real-world procedures in a prospective manner. To estimate the clinical utility, 2 experienced endoscopists were invited to perform a blind test with the same data set. A CDSS was constructed by combining the established 6-class lesion classification model and the preneoplastic lesion segmentation model with the previously established lesion detection model. Results: The overall lesion classification accuracy (95% CI) was 90.3% (89%-91.6%) in the internal test. For the performance validation, the CDSS achieved 85.3% (83.4%-97.2%) overall accuracy. The per-class external test accuracies for atrophy and IM were 95.3% (92.6%-98%) and 89.3% (85.4%-93.2%), respectively. CDSS-assisted endoscopy showed an accuracy of 92.1% (88.8%-95.4%) for atrophy and 95.5% (92%-99%) for IM in the real-world application of 522 consecutive screening endoscopies. There was no significant difference in the overall accuracy between the invited endoscopists and established CDSS in the prospective real-clinic evaluation (P=.23). The CDSS demonstrated a segmentation rate of 93.4% (95% CI 92.4%-94.4%) for atrophy or IM lesion segmentation in the internal testing. Conclusions: The CDSS achieved high performance in terms of computer-aided diagnosis of all stages of gastric carcinogenesis and demonstrated real-world application potential. %M 37902818 %R 10.2196/50448 %U https://www.jmir.org/2023/1/e50448 %U https://doi.org/10.2196/50448 %U http://www.ncbi.nlm.nih.gov/pubmed/37902818 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e46547 %T Architectural Design of a Blockchain-Enabled, Federated Learning Platform for Algorithmic Fairness in Predictive Health Care: Design Science Study %A Liang,Xueping %A Zhao,Juan %A Chen,Yan %A Bandara,Eranga %A Shetty,Sachin %+ Department of Information Systems and Business Analytics, Florida International University, 11200 SW 8th St, Miami, FL, 33199, United States, 1 305 348 2830, xuliang@fiu.edu %K fairness %K federated learning %K bias %K health care %K blockchain %K software %K proof of concept %K implementation %K privacy %D 2023 %7 30.10.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Developing effective and generalizable predictive models is critical for disease prediction and clinical decision-making, often requiring diverse samples to mitigate population bias and address algorithmic fairness. However, a major challenge is to retrieve learning models across multiple institutions without bringing in local biases and inequity, while preserving individual patients’ privacy at each site. Objective: This study aims to understand the issues of bias and fairness in the machine learning process used in the predictive health care domain. We proposed a software architecture that integrates federated learning and blockchain to improve fairness, while maintaining acceptable prediction accuracy and minimizing overhead costs. Methods: We improved existing federated learning platforms by integrating blockchain through an iterative design approach. We used the design science research method, which involves 2 design cycles (federated learning for bias mitigation and decentralized architecture). The design involves a bias-mitigation process within the blockchain-empowered federated learning framework based on a novel architecture. Under this architecture, multiple medical institutions can jointly train predictive models using their privacy-protected data effectively and efficiently and ultimately achieve fairness in decision-making in the health care domain. Results: We designed and implemented our solution using the Aplos smart contract, microservices, Rahasak blockchain, and Apache Cassandra–based distributed storage. By conducting 20,000 local model training iterations and 1000 federated model training iterations across 5 simulated medical centers as peers in the Rahasak blockchain network, we demonstrated how our solution with an improved fairness mechanism can enhance the accuracy of predictive diagnosis. Conclusions: Our study identified the technical challenges of prediction biases faced by existing predictive models in the health care domain. To overcome these challenges, we presented an innovative design solution using federated learning and blockchain, along with the adoption of a unique distributed architecture for a fairness-aware system. We have illustrated how this design can address privacy, security, prediction accuracy, and scalability challenges, ultimately improving fairness and equity in the predictive health care domain. %M 37902833 %R 10.2196/46547 %U https://www.jmir.org/2023/1/e46547 %U https://doi.org/10.2196/46547 %U http://www.ncbi.nlm.nih.gov/pubmed/37902833 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e48452 %T The Potential of GPT-4 as a Support Tool for Pharmacists: Analytical Study Using the Japanese National Examination for Pharmacists %A Kunitsu,Yuki %+ Department of Pharmacy, Shiga University of Medical Science Hospital, Seta Tukinowacho, Otsu, Shiga, 520-2121, Japan, 81 75 548 2111, ykunitsu@belle.shiga-med.ac.jp %K natural language processing %K generative pretrained transformer %K GPT-4 %K ChatGPT %K artificial intelligence %K AI %K chatbot %K pharmacy %K pharmacist %D 2023 %7 30.10.2023 %9 Original Paper %J JMIR Med Educ %G English %X Background: The advancement of artificial intelligence (AI), as well as machine learning, has led to its application in various industries, including health care. AI chatbots, such as GPT-4, developed by OpenAI, have demonstrated potential in supporting health care professionals by providing medical information, answering examination questions, and assisting in medical education. However, the applicability of GPT-4 in the field of pharmacy remains unexplored. Objective: This study aimed to evaluate GPT-4’s ability to answer questions from the Japanese National Examination for Pharmacists (JNEP) and assess its potential as a support tool for pharmacists in their daily practice. Methods: The question texts and answer choices from the 107th and 108th JNEP, held in February 2022 and February 2023, were input into GPT-4. As GPT-4 cannot process diagrams, questions that included diagram interpretation were not analyzed and were initially given a score of 0. The correct answer rates were calculated and compared with the passing criteria of each examination to evaluate GPT-4’s performance. Results: For the 107th and 108th JNEP, GPT-4 achieved an accuracy rate of 64.5% (222/344) and 62.9% (217/345), respectively, for all questions. When considering only the questions that GPT-4 could answer, the accuracy rates increased to 78.2% (222/284) and 75.3% (217/287), respectively. The accuracy rates tended to be lower for physics, chemistry, and calculation questions. Conclusions: Although GPT-4 demonstrated the potential to answer questions from the JNEP and support pharmacists’ capabilities, it also showed limitations in handling highly specialized questions, calculation questions, and questions requiring diagram recognition. Further evaluation is necessary to explore its applicability in real-world clinical settings, considering the complexities of patient scenarios and collaboration with health care professionals. By addressing these limitations, GPT-4 could become a more reliable tool for pharmacists in their daily practice. %M 37837968 %R 10.2196/48452 %U https://mededu.jmir.org/2023/1/e48452 %U https://doi.org/10.2196/48452 %U http://www.ncbi.nlm.nih.gov/pubmed/37837968 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 10 %N %P e48476 %T Physicians’ Perspectives on AI in Clinical Decision Support Systems: Interview Study of the CURATE.AI Personalized Dose Optimization Platform %A Vijayakumar,Smrithi %A Lee,V Vien %A Leong,Qiao Ying %A Hong,Soo Jung %A Blasiak,Agata %A Ho,Dean %+ The N.1 Institute for Health, National University of Singapore, 28 Medical Dr, Singapore, 117456, Singapore, 65 6601 7766, lsisv@nus.edu.sg %K artificial intelligence %K AI %K clinical decision support system %K CDSS %K adoption %K perception %K decision support %K acceptance %K perception %K perspective %K perspectives %K opinion %K attitude %K qualitative %K focus %K interview %K interviews %D 2023 %7 30.10.2023 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Physicians play a key role in integrating new clinical technology into care practices through user feedback and growth propositions to developers of the technology. As physicians are stakeholders involved through the technology iteration process, understanding their roles as users can provide nuanced insights into the workings of these technologies that are being explored. Therefore, understanding physicians’ perceptions can be critical toward clinical validation, implementation, and downstream adoption. Given the increasing prevalence of clinical decision support systems (CDSSs), there remains a need to gain an in-depth understanding of physicians’ perceptions and expectations toward their downstream implementation. This paper explores physicians’ perceptions of integrating CURATE.AI, a novel artificial intelligence (AI)–based and clinical stage personalized dosing CDSSs, into clinical practice. Objective: This study aims to understand physicians’ perspectives of integrating CURATE.AI for clinical work and to gather insights on considerations of the implementation of AI-based CDSS tools. Methods: A total of 12 participants completed semistructured interviews examining their knowledge, experience, attitudes, risks, and future course of the personalized combination therapy dosing platform, CURATE.AI. Interviews were audio recorded, transcribed verbatim, and coded manually. The data were thematically analyzed. Results: Overall, 3 broad themes and 9 subthemes were identified through thematic analysis. The themes covered considerations that physicians perceived as significant across various stages of new technology development, including trial, clinical implementation, and mass adoption. Conclusions: The study laid out the various ways physicians interpreted an AI-based personalized dosing CDSS, CURATE.AI, for their clinical practice. The research pointed out that physicians’ expectations during the different stages of technology exploration can be nuanced and layered with expectations of implementation that are relevant for technology developers and researchers. %M 37902825 %R 10.2196/48476 %U https://humanfactors.jmir.org/2023/1/e48476 %U https://doi.org/10.2196/48476 %U http://www.ncbi.nlm.nih.gov/pubmed/37902825 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e46934 %T The Price of Explainability in Machine Learning Models for 100-Day Readmission Prediction in Heart Failure: Retrospective, Comparative, Machine Learning Study %A Soliman,Amira %A Agvall,Björn %A Etminani,Kobra %A Hamed,Omar %A Lingman,Markus %+ Center for Applied Intelligent Systems Research, School of Information Technology, Halmstad University, Kristian IV's väg 3, Halmstad, 301 18, Sweden, 46 729773541, amira.soliman@hh.se %K readmission prediction %K heart failure %K machine learning %K explainable artificial intelligence %K deep learning %K shallow learning %D 2023 %7 27.10.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Sensitive and interpretable machine learning (ML) models can provide valuable assistance to clinicians in managing patients with heart failure (HF) at discharge by identifying individual factors associated with a high risk of readmission. In this cohort study, we delve into the factors driving the potential utility of classification models as decision support tools for predicting readmissions in patients with HF. Objective: The primary objective of this study is to assess the trade-off between using deep learning (DL) and traditional ML models to identify the risk of 100-day readmissions in patients with HF. Additionally, the study aims to provide explanations for the model predictions by highlighting important features both on a global scale across the patient cohort and on a local level for individual patients. Methods: The retrospective data for this study were obtained from the Regional Health Care Information Platform in Region Halland, Sweden. The study cohort consisted of patients diagnosed with HF who were over 40 years old and had been hospitalized at least once between 2017 and 2019. Data analysis encompassed the period from January 1, 2017, to December 31, 2019. Two ML models were developed and validated to predict 100-day readmissions, with a focus on the explainability of the model’s decisions. These models were built based on decision trees and recurrent neural architecture. Model explainability was obtained using an ML explainer. The predictive performance of these models was compared against 2 risk assessment tools using multiple performance metrics. Results: The retrospective data set included a total of 15,612 admissions, and within these admissions, readmission occurred in 5597 cases, representing a readmission rate of 35.85%. It is noteworthy that a traditional and explainable model, informed by clinical knowledge, exhibited performance comparable to the DL model and surpassed conventional scoring methods in predicting readmission among patients with HF. The evaluation of predictive model performance was based on commonly used metrics, with an area under the precision-recall curve of 66% for the deep model and 68% for the traditional model on the holdout data set. Importantly, the explanations provided by the traditional model offer actionable insights that have the potential to enhance care planning. Conclusions: This study found that a widely used deep prediction model did not outperform an explainable ML model when predicting readmissions among patients with HF. The results suggest that model transparency does not necessarily compromise performance, which could facilitate the clinical adoption of such models. %M 37889530 %R 10.2196/46934 %U https://www.jmir.org/2023/1/e46934 %U https://doi.org/10.2196/46934 %U http://www.ncbi.nlm.nih.gov/pubmed/37889530 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e44417 %T Predicting Colorectal Cancer Survival Using Time-to-Event Machine Learning: Retrospective Cohort Study %A Yang,Xulin %A Qiu,Hang %A Wang,Liya %A Wang,Xiaodong %+ School of Computer Science and Engineering, University of Electronic Science and Technology of China, No.2006, Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China, 86 28 61830278, qiuhang@uestc.edu.cn %K colorectal cancer %K survival prediction %K machine learning %K time-to-event %K SHAP %K SHapley Additive exPlanations %D 2023 %7 26.10.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Machine learning (ML) methods have shown great potential in predicting colorectal cancer (CRC) survival. However, the ML models introduced thus far have mainly focused on binary outcomes and have not considered the time-to-event nature of this type of modeling. Objective: This study aims to evaluate the performance of ML approaches for modeling time-to-event survival data and develop transparent models for predicting CRC-specific survival. Methods: The data set used in this retrospective cohort study contains information on patients who were newly diagnosed with CRC between December 28, 2012, and December 27, 2019, at West China Hospital, Sichuan University. We assessed the performance of 6 representative ML models, including random survival forest (RSF), gradient boosting machine (GBM), DeepSurv, DeepHit, neural net-extended time-dependent Cox (or Cox-Time), and neural multitask logistic regression (N-MTLR) in predicting CRC-specific survival. Multiple imputation by chained equations method was applied to handle missing values in variables. Multivariable analysis and clinical experience were used to select significant features associated with CRC survival. Model performance was evaluated in stratified 5-fold cross-validation repeated 5 times by using the time-dependent concordance index, integrated Brier score, calibration curves, and decision curves. The SHapley Additive exPlanations method was applied to calculate feature importance. Results: A total of 2157 patients with CRC were included in this study. Among the 6 time-to-event ML models, the DeepHit model exhibited the best discriminative ability (time-dependent concordance index 0.789, 95% CI 0.779-0.799) and the RSF model produced better-calibrated survival estimates (integrated Brier score 0.096, 95% CI 0.094-0.099), but these are not statistically significant. Additionally, the RSF, GBM, DeepSurv, Cox-Time, and N-MTLR models have comparable predictive accuracy to the Cox Proportional Hazards model in terms of discrimination and calibration. The calibration curves showed that all the ML models exhibited good 5-year survival calibration. The decision curves for CRC-specific survival at 5 years showed that all the ML models, especially RSF, had higher net benefits than default strategies of treating all or no patients at a range of clinically reasonable risk thresholds. The SHapley Additive exPlanations method revealed that R0 resection, tumor-node-metastasis staging, and the number of positive lymph nodes were important factors for 5-year CRC-specific survival. Conclusions: This study showed the potential of applying time-to-event ML predictive algorithms to help predict CRC-specific survival. The RSF, GBM, Cox-Time, and N-MTLR algorithms could provide nonparametric alternatives to the Cox Proportional Hazards model in estimating the survival probability of patients with CRC. The transparent time-to-event ML models help clinicians to more accurately predict the survival rate for these patients and improve patient outcomes by enabling personalized treatment plans that are informed by explainable ML models. %M 37883174 %R 10.2196/44417 %U https://www.jmir.org/2023/1/e44417 %U https://doi.org/10.2196/44417 %U http://www.ncbi.nlm.nih.gov/pubmed/37883174 %0 Journal Article %@ 2561-9128 %I JMIR Publications %V 6 %N %P e50895 %T Temporal Generalizability of Machine Learning Models for Predicting Postoperative Delirium Using Electronic Health Record Data: Model Development and Validation Study %A Matsumoto,Koutarou %A Nohara,Yasunobu %A Sakaguchi,Mikako %A Takayama,Yohei %A Fukushige,Syota %A Soejima,Hidehisa %A Nakashima,Naoki %A Kamouchi,Masahiro %+ Biostatistics Center, Kurume University, 67 Asahi-Machi, Kurume, 830-0011, Japan, 81 8033589122, matsumoto_koutarou@kurume-u.ac.jp %K postoperative delirium %K prediction model %K machine learning %K temporal generalizability %K electronic health record data %D 2023 %7 26.10.2023 %9 Original Paper %J JMIR Perioper Med %G English %X Background: Although machine learning models demonstrate significant potential in predicting postoperative delirium, the advantages of their implementation in real-world settings remain unclear and require a comparison with conventional models in practical applications. Objective: The objective of this study was to validate the temporal generalizability of decision tree ensemble and sparse linear regression models for predicting delirium after surgery compared with that of the traditional logistic regression model. Methods: The health record data of patients hospitalized at an advanced emergency and critical care medical center in Kumamoto, Japan, were collected electronically. We developed a decision tree ensemble model using extreme gradient boosting (XGBoost) and a sparse linear regression model using least absolute shrinkage and selection operator (LASSO) regression. To evaluate the predictive performance of the model, we used the area under the receiver operating characteristic curve (AUROC) and the Matthews correlation coefficient (MCC) to measure discrimination and the slope and intercept of the regression between predicted and observed probabilities to measure calibration. The Brier score was evaluated as an overall performance metric. We included 11,863 consecutive patients who underwent surgery with general anesthesia between December 2017 and February 2022. The patients were divided into a derivation cohort before the COVID-19 pandemic and a validation cohort during the COVID-19 pandemic. Postoperative delirium was diagnosed according to the confusion assessment method. Results: A total of 6497 patients (68.5, SD 14.4 years, women n=2627, 40.4%) were included in the derivation cohort, and 5366 patients (67.8, SD 14.6 years, women n=2105, 39.2%) were included in the validation cohort. Regarding discrimination, the XGBoost model (AUROC 0.87-0.90 and MCC 0.34-0.44) did not significantly outperform the LASSO model (AUROC 0.86-0.89 and MCC 0.34-0.41). The logistic regression model (AUROC 0.84-0.88, MCC 0.33-0.40, slope 1.01-1.19, intercept –0.16 to 0.06, and Brier score 0.06-0.07), with 8 predictors (age, intensive care unit, neurosurgery, emergency admission, anesthesia time, BMI, blood loss during surgery, and use of an ambulance) achieved good predictive performance. Conclusions: The XGBoost model did not significantly outperform the LASSO model in predicting postoperative delirium. Furthermore, a parsimonious logistic model with a few important predictors achieved comparable performance to machine learning models in predicting postoperative delirium. %M 37883164 %R 10.2196/50895 %U https://periop.jmir.org/2023/1/e50895 %U https://doi.org/10.2196/50895 %U http://www.ncbi.nlm.nih.gov/pubmed/37883164 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e46905 %T A Hybrid Decision Tree and Deep Learning Approach Combining Medical Imaging and Electronic Medical Records to Predict Intubation Among Hospitalized Patients With COVID-19: Algorithm Development and Validation %A Nguyen,Kim-Anh-Nhi %A Tandon,Pranai %A Ghanavati,Sahar %A Cheetirala,Satya Narayana %A Timsina,Prem %A Freeman,Robert %A Reich,David %A Levin,Matthew A %A Mazumdar,Madhu %A Fayad,Zahi A %A Kia,Arash %+ Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, One Gustave L Levy Place, 1255 5th Ave, Suite C-2, New York, NY, 10029, United States, 1 8572851577, kim-anh-nhi.nguyen@mountsinai.org %K COVID-19 %K medical imaging %K machine learning %K chest radiograph %K mechanical ventilation %K electronic health records %K intubation %K decision trees %K hybrid model %K clinical informatics %D 2023 %7 26.10.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Early prediction of the need for invasive mechanical ventilation (IMV) in patients hospitalized with COVID-19 symptoms can help in the allocation of resources appropriately and improve patient outcomes by appropriately monitoring and treating patients at the greatest risk of respiratory failure. To help with the complexity of deciding whether a patient needs IMV, machine learning algorithms may help bring more prognostic value in a timely and systematic manner. Chest radiographs (CXRs) and electronic medical records (EMRs), typically obtained early in patients admitted with COVID-19, are the keys to deciding whether they need IMV. Objective: We aimed to evaluate the use of a machine learning model to predict the need for intubation within 24 hours by using a combination of CXR and EMR data in an end-to-end automated pipeline. We included historical data from 2481 hospitalizations at The Mount Sinai Hospital in New York City. Methods: CXRs were first resized, rescaled, and normalized. Then lungs were segmented from the CXRs by using a U-Net algorithm. After splitting them into a training and a test set, the training set images were augmented. The augmented images were used to train an image classifier to predict the probability of intubation with a prediction window of 24 hours by retraining a pretrained DenseNet model by using transfer learning, 10-fold cross-validation, and grid search. Then, in the final fusion model, we trained a random forest algorithm via 10-fold cross-validation by combining the probability score from the image classifier with 41 longitudinal variables in the EMR. Variables in the EMR included clinical and laboratory data routinely collected in the inpatient setting. The final fusion model gave a prediction likelihood for the need of intubation within 24 hours as well. Results: At a prediction probability threshold of 0.5, the fusion model provided 78.9% (95% CI 59%-96%) sensitivity, 83% (95% CI 76%-89%) specificity, 0.509 (95% CI 0.34-0.67) F1-score, 0.874 (95% CI 0.80-0.94) area under the receiver operating characteristic curve (AUROC), and 0.497 (95% CI 0.32-0.65) area under the precision recall curve (AUPRC) on the holdout set. Compared to the image classifier alone, which had an AUROC of 0.577 (95% CI 0.44-0.73) and an AUPRC of 0.206 (95% CI 0.08-0.38), the fusion model showed significant improvement (P<.001). The most important predictor variables were respiratory rate, C-reactive protein, oxygen saturation, and lactate dehydrogenase. The imaging probability score ranked 15th in overall feature importance. Conclusions: We show that, when linked with EMR data, an automated deep learning image classifier improved performance in identifying hospitalized patients with severe COVID-19 at risk for intubation. With additional prospective and external validation, such a model may assist risk assessment and optimize clinical decision-making in choosing the best care plan during the critical stages of COVID-19. %M 37883177 %R 10.2196/46905 %U https://formative.jmir.org/2023/1/e46905 %U https://doi.org/10.2196/46905 %U http://www.ncbi.nlm.nih.gov/pubmed/37883177 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e42202 %T Development, Reliability, and Structural Validity of the Scale for Knowledge, Attitude, and Practice in Ethics Implementation Among AI Researchers: Cross-Sectional Study %A Zhang,Xiaobo %A Gu,Ying %A Yin,Jie %A Zhang,Yuejie %A Jin,Cheng %A Wang,Weibing %A Li,Albert Martin %A Wang,Yingwen %A Su,Ling %A Xu,Hong %A Ge,Xiaoling %A Ye,Chengjie %A Tang,Liangfeng %A Shen,Bing %A Fang,Jinwu %A Wang,Daoyang %A Feng,Rui %+ School of Computer Science Fudan University, Room D5011, Interdisciplinary Academic Building, Jiangwan Campus, 2005 Songhu Road, Shanghai, 200438, China, 86 21 51355534, fengrui@fudan.edu.cn %K medical artificial intelligence %K ethics implementation %K Knowledge-Attitude-Practice model %K reliability %K validity %K measure %K artificial intelligence %K development %K attitude %K ethics %D 2023 %7 26.10.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Medical artificial intelligence (AI) has significantly contributed to decision support for disease screening, diagnosis, and management. With the growing number of medical AI developments and applications, incorporating ethics is considered essential to avoiding harm and ensuring broad benefits in the lifecycle of medical AI. One of the premises for effectively implementing ethics in Medical AI research necessitates researchers' comprehensive knowledge, enthusiastic attitude, and practical experience. However, there is currently a lack of an available instrument to measure these aspects. Objective: The aim of this study was to develop a comprehensive scale for measuring the knowledge, attitude, and practice of ethics implementation among medical AI researchers, and to evaluate its measurement properties. Methods: The construct of the Knowledge-Attitude-Practice in Ethics Implementation (KAP-EI) scale was based on the Knowledge-Attitude-Practice (KAP) model, and the evaluation of its measurement properties was in compliance with the COnsensus-based Standards for the selection of health status Measurement INstruments (COSMIN) reporting guidelines for studies on measurement instruments. The study was conducted in 2 phases. The first phase involved scale development through a systematic literature review, qualitative interviews, and item analysis based on a cross-sectional survey. The second phase involved evaluation of structural validity and reliability through another cross-sectional study. Results: The KAP-EI scale had 3 dimensions including knowledge (10 items), attitude (6 items), and practice (7 items). The Cronbach α for the whole scale reached .934. Confirmatory factor analysis showed that the goodness-of-fit indices of the scale were satisfactory (χ2/df ratio:=2.338, comparative fit index=0.949, Tucker Lewis index=0.941, root-mean-square error of approximation=0.064, and standardized root-mean-square residual=0.052). Conclusions: The results show that the scale has good reliability and structural validity; hence, it could be considered an effective instrument. This is the first instrument developed for this purpose. %M 37883175 %R 10.2196/42202 %U https://formative.jmir.org/2023/1/e42202 %U https://doi.org/10.2196/42202 %U http://www.ncbi.nlm.nih.gov/pubmed/37883175 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e47105 %T Guidelines and Standard Frameworks for AI in Medicine: Protocol for a Systematic Literature Review %A Shiferaw,Kirubel Biruk %A Roloff,Moritz %A Waltemath,Dagmar %A Zeleke,Atinkut Alamirrew %+ Department of Medical Informatics, Institute for Community Medicine, University Medicine Greifswald, Walther-Rathenau-Str. 48, Greifswald, D-17475, Germany, 49 1728989478, s-kishif@uni-greifswald.de %K artificial intelligence %K biomedical %K guidelines %K machine learning %K medicine %D 2023 %7 25.10.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: Applications of artificial intelligence (AI) are pervasive in modern biomedical science. In fact, research results suggesting algorithms and AI models for different target diseases and conditions are continuously increasing. While this situation undoubtedly improves the outcome of AI models, health care providers are increasingly unsure which AI model to use due to multiple alternatives for a specific target and the “black box” nature of AI. Moreover, the fact that studies rarely use guidelines in developing and reporting AI models poses additional challenges in trusting and adapting models for practical implementation. Objective: This review protocol describes the planned steps and methods for a review of the synthesized evidence regarding the quality of available guidelines and frameworks to facilitate AI applications in medicine. Methods: We will commence a systematic literature search using medical subject headings terms for medicine, guidelines, and machine learning (ML). All available guidelines, standard frameworks, best practices, checklists, and recommendations will be included, irrespective of the study design. The search will be conducted on web-based repositories such as PubMed, Web of Science, and the EQUATOR (Enhancing the Quality and Transparency of Health Research) network. After removing duplicate results, a preliminary scan for titles will be done by 2 reviewers. After the first scan, the reviewers will rescan the selected literature for abstract review, and any incongruities about whether to include the article for full-text review or not will be resolved by the third and fourth reviewer based on the predefined criteria. A Google Scholar (Google LLC) search will also be performed to identify gray literature. The quality of identified guidelines will be evaluated using the Appraisal of Guidelines, Research, and Evaluation (AGREE II) tool. A descriptive summary and narrative synthesis will be carried out, and the details of critical appraisal and subgroup synthesis findings will be presented. Results: The results will be reported using the PRISMA (Preferred Reporting Items for Systematic Review and Meta-Analyses) reporting guidelines. Data analysis is currently underway, and we anticipate finalizing the review by November 2023. Conclusions: Guidelines and recommended frameworks for developing, reporting, and implementing AI studies have been developed by different experts to facilitate the reliable assessment of validity and consistent interpretation of ML models for medical applications. We postulate that a guideline supports the assessment of an ML model only if the quality and reliability of the guideline are high. Assessing the quality and aspects of available guidelines, recommendations, checklists, and frameworks—as will be done in the proposed review—will provide comprehensive insights into current gaps and help to formulate future research directions. International Registered Report Identifier (IRRID): DERR1-10.2196/47105 %M 37878365 %R 10.2196/47105 %U https://www.researchprotocols.org/2023/1/e47105 %U https://doi.org/10.2196/47105 %U http://www.ncbi.nlm.nih.gov/pubmed/37878365 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e49842 %T Applying AI and Guidelines to Assist Medical Students in Recognizing Patients With Heart Failure: Protocol for a Randomized Trial %A Joo,Hyeon %A Mathis,Michael R %A Tam,Marty %A James,Cornelius %A Han,Peijin %A Mangrulkar,Rajesh S %A Friedman,Charles P %A Vydiswaran,VG Vinod %+ Department of Learning Health Sciences, University of Michigan, 1111 East Catherine Street, Ann Arbor, MI, 48109, United States, 1 7349361644, thejoo@umich.edu %K medical education %K clinical decision support systems %K artificial intelligence %K machine learning %K heart failure %K evidence-based medicine %K guidelines %K digital health interventions %D 2023 %7 24.10.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: The integration of artificial intelligence (AI) into clinical practice is transforming both clinical practice and medical education. AI-based systems aim to improve the efficacy of clinical tasks, enhancing diagnostic accuracy and tailoring treatment delivery. As it becomes increasingly prevalent in health care for high-quality patient care, it is critical for health care providers to use the systems responsibly to mitigate bias, ensure effective outcomes, and provide safe clinical practices. In this study, the clinical task is the identification of heart failure (HF) prior to surgery with the intention of enhancing clinical decision-making skills. HF is a common and severe disease, but detection remains challenging due to its subtle manifestation, often concurrent with other medical conditions, and the absence of a simple and effective diagnostic test. While advanced HF algorithms have been developed, the use of these AI-based systems to enhance clinical decision-making in medical education remains understudied. Objective: This research protocol is to demonstrate our study design, systematic procedures for selecting surgical cases from electronic health records, and interventions. The primary objective of this study is to measure the effectiveness of interventions aimed at improving HF recognition before surgery, the second objective is to evaluate the impact of inaccurate AI recommendations, and the third objective is to explore the relationship between the inclination to accept AI recommendations and their accuracy. Methods: Our study used a 3 × 2 factorial design (intervention type × order of prepost sets) for this randomized trial with medical students. The student participants are asked to complete a 30-minute e-learning module that includes key information about the intervention and a 5-question quiz, and a 60-minute review of 20 surgical cases to determine the presence of HF. To mitigate selection bias in the pre- and posttests, we adopted a feature-based systematic sampling procedure. From a pool of 703 expert-reviewed surgical cases, 20 were selected based on features such as case complexity, model performance, and positive and negative labels. This study comprises three interventions: (1) a direct AI-based recommendation with a predicted HF score, (2) an indirect AI-based recommendation gauged through the area under the curve metric, and (3) an HF guideline-based intervention. Results: As of July 2023, 62 of the enrolled medical students have fulfilled this study’s participation, including the completion of a short quiz and the review of 20 surgical cases. The subject enrollment commenced in August 2022 and will end in December 2023, with the goal of recruiting 75 medical students in years 3 and 4 with clinical experience. Conclusions: We demonstrated a study protocol for the randomized trial, measuring the effectiveness of interventions using AI and HF guidelines among medical students to enhance HF recognition in preoperative care with electronic health record data. International Registered Report Identifier (IRRID): DERR1-10.2196/49842 %M 37874618 %R 10.2196/49842 %U https://www.researchprotocols.org/2023/1/e49842 %U https://doi.org/10.2196/49842 %U http://www.ncbi.nlm.nih.gov/pubmed/37874618 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e47590 %T A Web-Based Calculator to Predict Early Death Among Patients With Bone Metastasis Using Machine Learning Techniques: Development and Validation Study %A Lei,Mingxing %A Wu,Bing %A Zhang,Zhicheng %A Qin,Yong %A Cao,Xuyong %A Cao,Yuncen %A Liu,Baoge %A Su,Xiuyun %A Liu,Yaosheng %+ Department of Orthopedics, The Fifth Medical Center of PLA General Hospital, 8 Fengtaidongda Rd, Fengtai District, Beijing, China, 86 15810069346, liuyaosheng@301hospital.com.cn %K bone metastasis %K early death %K machine learning %K prediction model %K local interpretable model–agnostic explanation %D 2023 %7 23.10.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Patients with bone metastasis often experience a significantly limited survival time, and a life expectancy of <3 months is generally regarded as a contraindication for extensive invasive surgeries. In this context, the accurate prediction of survival becomes very important since it serves as a crucial guide in making clinical decisions. Objective: This study aimed to develop a machine learning–based web calculator that can provide an accurate assessment of the likelihood of early death among patients with bone metastasis. Methods: This study analyzed a large cohort of 118,227 patients diagnosed with bone metastasis between 2010 and 2019 using the data obtained from a national cancer database. The entire cohort of patients was randomly split 9:1 into a training group (n=106,492) and a validation group (n=11,735). Six approaches—logistic regression, extreme gradient boosting machine, decision tree, random forest, neural network, and gradient boosting machine—were implemented in this study. The performance of these approaches was evaluated using 11 measures, and each approach was ranked based on its performance in each measure. Patients (n=332) from a teaching hospital were used as the external validation group, and external validation was performed using the optimal model. Results: In the entire cohort, a substantial proportion of patients (43,305/118,227, 36.63%) experienced early death. Among the different approaches evaluated, the gradient boosting machine exhibited the highest score of prediction performance (54 points), followed by the neural network (52 points) and extreme gradient boosting machine (50 points). The gradient boosting machine demonstrated a favorable discrimination ability, with an area under the curve of 0.858 (95% CI 0.851-0.865). In addition, the calibration slope was 1.02, and the intercept-in-large value was −0.02, indicating good calibration of the model. Patients were divided into 2 risk groups using a threshold of 37% based on the gradient boosting machine. Patients in the high-risk group (3105/4315, 71.96%) were found to be 4.5 times more likely to experience early death compared with those in the low-risk group (1159/7420, 15.62%). External validation of the model demonstrated a high area under the curve of 0.847 (95% CI 0.798-0.895), indicating its robust performance. The model developed by the gradient boosting machine has been deployed on the internet as a calculator. Conclusions: This study develops a machine learning–based calculator to assess the probability of early death among patients with bone metastasis. The calculator has the potential to guide clinical decision-making and improve the care of patients with bone metastasis by identifying those at a higher risk of early death. %M 37870889 %R 10.2196/47590 %U https://www.jmir.org/2023/1/e47590 %U https://doi.org/10.2196/47590 %U http://www.ncbi.nlm.nih.gov/pubmed/37870889 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e51912 %T Objectively Quantifying Pediatric Psychiatric Severity Using Artificial Intelligence, Voice Recognition Technology, and Universal Emotions: Pilot Study for Artificial Intelligence-Enabled Innovation to Address Youth Mental Health Crisis %A Caulley,Desmond %A Alemu,Yared %A Burson,Sedara %A Cárdenas Bautista,Elizabeth %A Abebe Tadesse,Girmaw %A Kottmyer,Christopher %A Aeschbach,Laurent %A Cheungvivatpant,Bryan %A Sezgin,Emre %+ TQIntelligence, Inc, 75 Fifth St NW Suite 2407, Atlanta, GA, 30308, United States, 1 6787709343, yalemu@tqintelligence.com %K pediatric %K trauma %K voice AI %K machine learning %K mental health %K predictive modeling %K artificial intelligence %K social determinants of health %K speech-recognition %K adverse childhood experiences %K trauma and emotional distress %K voice marker %K speech biomarker %K pediatrics %K at-risk youth %D 2023 %7 23.10.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: Providing Psychotherapy, particularly for youth, is a pressing challenge in the health care system. Traditional methods are resource-intensive, and there is a need for objective benchmarks to guide therapeutic interventions. Automated emotion detection from speech, using artificial intelligence, presents an emerging approach to address these challenges. Speech can carry vital information about emotional states, which can be used to improve mental health care services, especially when the person is suffering. Objective: This study aims to develop and evaluate automated methods for detecting the intensity of emotions (anger, fear, sadness, and happiness) in audio recordings of patients’ speech. We also demonstrate the viability of deploying the models. Our model was validated in a previous publication by Alemu et al with limited voice samples. This follow-up study used significantly more voice samples to validate the previous model. Methods: We used audio recordings of patients, specifically children with high adverse childhood experience (ACE) scores; the average ACE score was 5 or higher, at the highest risk for chronic disease and social or emotional problems; only 1 in 6 have a score of 4 or above. The patients’ structured voice sample was collected by reading a fixed script. In total, 4 highly trained therapists classified audio segments based on a scoring process of 4 emotions and their intensity levels for each of the 4 different emotions. We experimented with various preprocessing methods, including denoising, voice-activity detection, and diarization. Additionally, we explored various model architectures, including convolutional neural networks (CNNs) and transformers. We trained emotion-specific transformer-based models and a generalized CNN-based model to predict emotion intensities. Results: The emotion-specific transformer-based model achieved a test-set precision and recall of 86% and 79%, respectively, for binary emotional intensity classification (high or low). In contrast, the CNN-based model, generalized to predict the intensity of 4 different emotions, achieved test-set precision and recall of 83% for each. Conclusions: Automated emotion detection from patients’ speech using artificial intelligence models is found to be feasible, leading to a high level of accuracy. The transformer-based model exhibited better performance in emotion-specific detection, while the CNN-based model showed promise in generalized emotion detection. These models can serve as valuable decision-support tools for pediatricians and mental health providers to triage youth to appropriate levels of mental health care services. International Registered Report Identifier (IRRID): RR1-10.2196/51912 %M 37870890 %R 10.2196/51912 %U https://www.researchprotocols.org/2023/1/e51912 %U https://doi.org/10.2196/51912 %U http://www.ncbi.nlm.nih.gov/pubmed/37870890 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e51712 %T The Potential of Chatbots for Emotional Support and Promoting Mental Well-Being in Different Cultures: Mixed Methods Study %A Chin,Hyojin %A Song,Hyeonho %A Baek,Gumhee %A Shin,Mingi %A Jung,Chani %A Cha,Meeyoung %A Choi,Junghoi %A Cha,Chiyoung %+ College of Nursing and Ewha Research Institute of Nursing Science, System Health & Engineering Major in Graduate School, Ewha Womans University, 52 Ewhayeodae-gil, Seodaemun-gu, Seoul, 03760, Republic of Korea, 82 02 3277 2883, chiyoung@ewha.ac.kr %K chatbot %K depressive mood %K sad %K depressive discourse %K sentiment analysis %K conversational agent %K mental health %K health information %K cultural differences %D 2023 %7 20.10.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence chatbot research has focused on technical advances in natural language processing and validating the effectiveness of human-machine conversations in specific settings. However, real-world chat data remain proprietary and unexplored despite their growing popularity, and new analyses of chatbot uses and their effects on mitigating negative moods are urgently needed. Objective: In this study, we investigated whether and how artificial intelligence chatbots facilitate the expression of user emotions, specifically sadness and depression. We also examined cultural differences in the expression of depressive moods among users in Western and Eastern countries. Methods: This study used SimSimi, a global open-domain social chatbot, to analyze 152,783 conversation utterances containing the terms “depress” and “sad” in 3 Western countries (Canada, the United Kingdom, and the United States) and 5 Eastern countries (Indonesia, India, Malaysia, the Philippines, and Thailand). Study 1 reports new findings on the cultural differences in how people talk about depression and sadness to chatbots based on Linguistic Inquiry and Word Count and n-gram analyses. In study 2, we classified chat conversations into predefined topics using semisupervised classification techniques to better understand the types of depressive moods prevalent in chats. We then identified the distinguishing features of chat-based depressive discourse data and the disparity between Eastern and Western users. Results: Our data revealed intriguing cultural differences. Chatbot users in Eastern countries indicated stronger emotions about depression than users in Western countries (positive: P<.001; negative: P=.01); for example, Eastern users used more words associated with sadness (P=.01). However, Western users were more likely to share vulnerable topics such as mental health (P<.001), and this group also had a greater tendency to discuss sensitive topics such as swear words (P<.001) and death (P<.001). In addition, when talking to chatbots, people expressed their depressive moods differently than on other platforms. Users were more open to expressing emotional vulnerability related to depressive or sad moods to chatbots (74,045/148,590, 49.83%) than on social media (149/1978, 7.53%). Chatbot conversations tended not to broach topics that require social support from others, such as seeking advice on daily life difficulties, unlike on social media. However, chatbot users acted in anticipation of conversational agents that exhibit active listening skills and foster a safe space where they can openly share emotional states such as sadness or depression. Conclusions: The findings highlight the potential of chatbot-assisted mental health support, emphasizing the importance of continued technical and policy-wise efforts to improve chatbot interactions for those in need of emotional assistance. Our data indicate the possibility of chatbots providing helpful information about depressive moods, especially for users who have difficulty communicating emotions to other humans. %M 37862063 %R 10.2196/51712 %U https://www.jmir.org/2023/1/e51712 %U https://doi.org/10.2196/51712 %U http://www.ncbi.nlm.nih.gov/pubmed/37862063 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e48093 %T Digital Marker for Early Screening of Mild Cognitive Impairment Through Hand and Eye Movement Analysis in Virtual Reality Using Machine Learning: First Validation Study %A Kim,Se Young %A Park,Jinseok %A Choi,Hojin %A Loeser,Martin %A Ryu,Hokyoung %A Seo,Kyoungwon %+ Department of Applied Artificial Intelligence, Seoul National University of Science and Technology, Sangsang hall, 4th Fl., Gongneung-ro, Gongneung-dong, Nowon-gu, Seoul, 01811, Republic of Korea, 82 010 5668 8660, kwseo@seoultech.ac.kr %K Alzheimer disease %K biomarkers %K dementia %K digital markers %K eye movement %K hand movement %K machine learning %K mild cognitive impairment %K screening %K virtual reality %D 2023 %7 20.10.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: With the global rise in Alzheimer disease (AD), early screening for mild cognitive impairment (MCI), which is a preclinical stage of AD, is of paramount importance. Although biomarkers such as cerebrospinal fluid amyloid level and magnetic resonance imaging have been studied, they have limitations, such as high cost and invasiveness. Digital markers to assess cognitive impairment by analyzing behavioral data collected from digital devices in daily life can be a new alternative. In this context, we developed a “virtual kiosk test” for early screening of MCI by analyzing behavioral data collected when using a kiosk in a virtual environment. Objective: We aimed to investigate key behavioral features collected from a virtual kiosk test that could distinguish patients with MCI from healthy controls with high statistical significance. Also, we focused on developing a machine learning model capable of early screening of MCI based on these behavioral features. Methods: A total of 51 participants comprising 20 healthy controls and 31 patients with MCI were recruited by 2 neurologists from a university hospital. The participants performed a virtual kiosk test—developed by our group—where we recorded various behavioral data such as hand and eye movements. Based on these time series data, we computed the following 4 behavioral features: hand movement speed, proportion of fixation duration, time to completion, and the number of errors. To compare these behavioral features between healthy controls and patients with MCI, independent-samples 2-tailed t tests were used. Additionally, we used these behavioral features to train and validate a machine learning model for early screening of patients with MCI from healthy controls. Results: In the virtual kiosk test, all 4 behavioral features showed statistically significant differences between patients with MCI and healthy controls. Compared with healthy controls, patients with MCI had slower hand movement speed (t49=3.45; P=.004), lower proportion of fixation duration (t49=2.69; P=.04), longer time to completion (t49=–3.44; P=.004), and a greater number of errors (t49=–3.77; P=.001). All 4 features were then used to train a support vector machine to distinguish between healthy controls and patients with MCI. Our machine learning model achieved 93.3% accuracy, 100% sensitivity, 83.3% specificity, 90% precision, and 94.7% F1-score. Conclusions: Our research preliminarily suggests that analyzing hand and eye movements in the virtual kiosk test holds potential as a digital marker for early screening of MCI. In contrast to conventional biomarkers, this digital marker in virtual reality is advantageous as it can collect ecologically valid data at an affordable cost and in a short period (5-15 minutes), making it a suitable means for early screening of MCI. We call for further studies to confirm the reliability and validity of this approach. %M 37862101 %R 10.2196/48093 %U https://www.jmir.org/2023/1/e48093 %U https://doi.org/10.2196/48093 %U http://www.ncbi.nlm.nih.gov/pubmed/37862101 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e47346 %T Use of Artificial Intelligence in the Identification and Diagnosis of Frailty Syndrome in Older Adults: Scoping Review %A Velazquez-Diaz,Daniel %A Arco,Juan E %A Ortiz,Andres %A Pérez-Cabezas,Verónica %A Lucena-Anton,David %A Moral-Munoz,Jose A %A Galán-Mercant,Alejandro %+ MOVE-IT Research Group, Department of Nursing and Physiotherapy, Faculty of Health Sciences, University of Cádiz, Ana de Viya, 52, Cádiz, 11003, Spain, 34 676 719 119, veronica.perezcabezas@uca.es %K frail older adult %K identification %K diagnosis %K artificial intelligence %K review %K frailty %K older adults %K aging %K biological variability %K detection %K accuracy %K sensitivity %K screening %K tool %D 2023 %7 20.10.2023 %9 Review %J J Med Internet Res %G English %X Background: Frailty syndrome (FS) is one of the most common noncommunicable diseases, which is associated with lower physical and mental capacities in older adults. FS diagnosis is mostly focused on biological variables; however, it is likely that this diagnosis could fail owing to the high biological variability in this syndrome. Therefore, artificial intelligence (AI) could be a potential strategy to identify and diagnose this complex and multifactorial geriatric syndrome. Objective: The objective of this scoping review was to analyze the existing scientific evidence on the use of AI for the identification and diagnosis of FS in older adults, as well as to identify which model provides enhanced accuracy, sensitivity, specificity, and area under the curve (AUC). Methods: A search was conducted using PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines on various databases: PubMed, Web of Science, Scopus, and Google Scholar. The search strategy followed Population/Problem, Intervention, Comparison, and Outcome (PICO) criteria with the population being older adults; intervention being AI; comparison being compared or not to other diagnostic methods; and outcome being FS with reported sensitivity, specificity, accuracy, or AUC values. The results were synthesized through information extraction and are presented in tables. Results: We identified 26 studies that met the inclusion criteria, 6 of which had a data set over 2000 and 3 with data sets below 100. Machine learning was the most widely used type of AI, employed in 18 studies. Moreover, of the 26 included studies, 9 used clinical data, with clinical histories being the most frequently used data type in this category. The remaining 17 studies used nonclinical data, most frequently involving activity monitoring using an inertial sensor in clinical and nonclinical contexts. Regarding the performance of each AI model, 10 studies achieved a value of precision, sensitivity, specificity, or AUC ≥90. Conclusions: The findings of this scoping review clarify the overall status of recent studies using AI to identify and diagnose FS. Moreover, the findings show that the combined use of AI using clinical data along with nonclinical information such as the kinematics of inertial sensors that monitor activities in a nonclinical context could be an appropriate tool for the identification and diagnosis of FS. Nevertheless, some possible limitations of the evidence included in the review could be small sample sizes, heterogeneity of study designs, and lack of standardization in the AI models and diagnostic criteria used across studies. Future research is needed to validate AI systems with diverse data sources for diagnosing FS. AI should be used as a decision support tool for identifying FS, with data quality and privacy addressed, and the tool should be regularly monitored for performance after being integrated in clinical practice. %M 37862082 %R 10.2196/47346 %U https://www.jmir.org/2023/1/e47346 %U https://doi.org/10.2196/47346 %U http://www.ncbi.nlm.nih.gov/pubmed/37862082 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e48785 %T Opportunities, Challenges, and Future Directions of Generative Artificial Intelligence in Medical Education: Scoping Review %A Preiksaitis,Carl %A Rose,Christian %+ Department of Emergency Medicine, Stanford University School of Medicine, 900 Welch Road, Suite 350, Palo Alto, CA, 94304, United States, 1 650 723 6576, cpreiksaitis@stanford.edu %K medical education %K artificial intelligence %K ChatGPT %K Bard %K AI %K educator %K scoping %K review %K learner %K generative %D 2023 %7 20.10.2023 %9 Original Paper %J JMIR Med Educ %G English %X Background: Generative artificial intelligence (AI) technologies are increasingly being utilized across various fields, with considerable interest and concern regarding their potential application in medical education. These technologies, such as Chat GPT and Bard, can generate new content and have a wide range of possible applications. Objective: This study aimed to synthesize the potential opportunities and limitations of generative AI in medical education. It sought to identify prevalent themes within recent literature regarding potential applications and challenges of generative AI in medical education and use these to guide future areas for exploration. Methods: We conducted a scoping review, following the framework by Arksey and O'Malley, of English language articles published from 2022 onward that discussed generative AI in the context of medical education. A literature search was performed using PubMed, Web of Science, and Google Scholar databases. We screened articles for inclusion, extracted data from relevant studies, and completed a quantitative and qualitative synthesis of the data. Results: Thematic analysis revealed diverse potential applications for generative AI in medical education, including self-directed learning, simulation scenarios, and writing assistance. However, the literature also highlighted significant challenges, such as issues with academic integrity, data accuracy, and potential detriments to learning. Based on these themes and the current state of the literature, we propose the following 3 key areas for investigation: developing learners’ skills to evaluate AI critically, rethinking assessment methodology, and studying human-AI interactions. Conclusions: The integration of generative AI in medical education presents exciting opportunities, alongside considerable challenges. There is a need to develop new skills and competencies related to AI as well as thoughtful, nuanced approaches to examine the growing use of generative AI in medical education. %R 10.2196/48785 %U https://mededu.jmir.org/2023/1/e48785/ %U https://doi.org/10.2196/48785 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e42788 %T Predicting the Risk of Total Hip Replacement by Using A Deep Learning Algorithm on Plain Pelvic Radiographs: Diagnostic Study %A Chen,Chih-Chi %A Wu,Cheng-Ta %A Chen,Carl P C %A Chung,Chia-Ying %A Chen,Shann-Ching %A Lee,Mel S %A Cheng,Chi-Tung %A Liao,Chien-Hung %+ Department of Trauma and Emergency Surgery, Chang Gung Memorial Hospital, Trauma Center, 5, Fuxin Street, Kweishiang District, Taoyuan City, Taoyuan, 333, Taiwan, 886 3 3281200 ext 3651, atong89130@gmail.com %K osteoarthritis %K orthopedic procedure %K artificial intelligence %K AI %K deep learning %K machine learning %K orthopedic %K pelvic %K radiograph %K predict %K hip replacement %K surgery %K convolutional neural network %K CNN %K algorithm %K surgical %K medical image %K medical imaging %D 2023 %7 20.10.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Total hip replacement (THR) is considered the gold standard of treatment for refractory degenerative hip disorders. Identifying patients who should receive THR in the short term is important. Some conservative treatments, such as intra-articular injection administered a few months before THR, may result in higher odds of arthroplasty infection. Delayed THR after functional deterioration may result in poorer outcomes and longer waiting times for those who have been flagged as needing THR. Deep learning (DL) in medical imaging applications has recently obtained significant breakthroughs. However, the use of DL in practical wayfinding, such as short-term THR prediction, is still lacking. Objective: In this study, we will propose a DL-based assistant system for patients with pelvic radiographs to identify the need for THR within 3 months. Methods: We developed a convolutional neural network–based DL algorithm to analyze pelvic radiographs, predict the hip region of interest (ROI), and determine whether or not THR is required. The data set was collected from August 2008 to December 2017. The images included 3013 surgical hip ROIs that had undergone THR and 1630 nonsurgical hip ROIs. The images were split, using split-sample validation, into training (n=3903, 80%), validation (n=476, 10%), and testing (n=475, 10%) sets to evaluate the algorithm performance. Results: The algorithm, called SurgHipNet, yielded an area under the receiver operating characteristic curve of 0.994 (95% CI 0.990-0.998). The accuracy, sensitivity, specificity, and F1-score of the model were 0.977, 0.920, 0932, and 0.944, respectively. Conclusions: The proposed approach has demonstrated that SurgHipNet shows the ability and potential to provide efficient support in clinical decision-making; it can assist physicians in promptly determining the optimal timing for THR. %M 37862084 %R 10.2196/42788 %U https://formative.jmir.org/2023/1/e42788 %U https://doi.org/10.2196/42788 %U http://www.ncbi.nlm.nih.gov/pubmed/37862084 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e48249 %T Radiology Residents’ Perceptions of Artificial Intelligence: Nationwide Cross-Sectional Survey Study %A Chen,Yanhua %A Wu,Ziye %A Wang,Peicheng %A Xie,Linbo %A Yan,Mengsha %A Jiang,Maoqing %A Yang,Zhenghan %A Zheng,Jianjun %A Zhang,Jingfeng %A Zhu,Jiming %+ Vanke School of Public Health, Tsinghua University, Haidian District, Beijing, 100084, China, 86 62782199, jimingzhu@tsinghua.edu.cn %K artificial intelligence %K technology acceptance %K radiology %K residency %K perceptions %K health care services %K resident %K residents %K perception %K adoption %K readiness %K acceptance %K cross sectional %K survey %D 2023 %7 19.10.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) is transforming various fields, with health care, especially diagnostic specialties such as radiology, being a key but controversial battleground. However, there is limited research systematically examining the response of “human intelligence” to AI. Objective: This study aims to comprehend radiologists’ perceptions regarding AI, including their views on its potential to replace them, its usefulness, and their willingness to accept it. We examine the influence of various factors, encompassing demographic characteristics, working status, psychosocial aspects, personal experience, and contextual factors. Methods: Between December 1, 2020, and April 30, 2021, a cross-sectional survey was completed by 3666 radiology residents in China. We used multivariable logistic regression models to examine factors and associations, reporting odds ratios (ORs) and 95% CIs. Results: In summary, radiology residents generally hold a positive attitude toward AI, with 29.90% (1096/3666) agreeing that AI may reduce the demand for radiologists, 72.80% (2669/3666) believing AI improves disease diagnosis, and 78.18% (2866/3666) feeling that radiologists should embrace AI. Several associated factors, including age, gender, education, region, eye strain, working hours, time spent on medical images, resilience, burnout, AI experience, and perceptions of residency support and stress, significantly influence AI attitudes. For instance, burnout symptoms were associated with greater concerns about AI replacement (OR 1.89; P<.001), less favorable views on AI usefulness (OR 0.77; P=.005), and reduced willingness to use AI (OR 0.71; P<.001). Moreover, after adjusting for all other factors, perceived AI replacement (OR 0.81; P<.001) and AI usefulness (OR 5.97; P<.001) were shown to significantly impact the intention to use AI. Conclusions: This study profiles radiology residents who are accepting of AI. Our comprehensive findings provide insights for a multidimensional approach to help physicians adapt to AI. Targeted policies, such as digital health care initiatives and medical education, can be developed accordingly. %M 37856181 %R 10.2196/48249 %U https://www.jmir.org/2023/1/e48249 %U https://doi.org/10.2196/48249 %U http://www.ncbi.nlm.nih.gov/pubmed/37856181 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e45085 %T Influenza Epidemic Trend Surveillance and Prediction Based on Search Engine Data: Deep Learning Model Study %A Yang,Liuyang %A Zhang,Ting %A Han,Xuan %A Yang,Jiao %A Sun,Yanxia %A Ma,Libing %A Chen,Jialong %A Li,Yanming %A Lai,Shengjie %A Li,Wei %A Feng,Luzhao %A Yang,Weizhong %+ School of Population Medicine and Public Health, Chinese Academy of Medical Sciences & Peking Union Medical College, 9 Dong Dan San Tiao, Dongcheng District, Beijing, 100730, China, 86 010 65120552, yangweizhong@cams.cn %K early warning %K epidemic intelligence %K infectious disease %K influenza-like illness %K surveillance %D 2023 %7 17.10.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Influenza outbreaks pose a significant threat to global public health. Traditional surveillance systems and simple algorithms often struggle to predict influenza outbreaks in an accurate and timely manner. Big data and modern technology have offered new modalities for disease surveillance and prediction. Influenza-like illness can serve as a valuable surveillance tool for emerging respiratory infectious diseases like influenza and COVID-19, especially when reported case data may not fully reflect the actual epidemic curve. Objective: This study aimed to develop a predictive model for influenza outbreaks by combining Baidu search query data with traditional virological surveillance data. The goal was to improve early detection and preparedness for influenza outbreaks in both northern and southern China, providing evidence for supplementing modern intelligence epidemic surveillance methods. Methods: We collected virological data from the National Influenza Surveillance Network and Baidu search query data from January 2011 to July 2018, totaling 3,691,865 and 1,563,361 respective samples. Relevant search terms related to influenza were identified and analyzed for their correlation with influenza-positive rates using Pearson correlation analysis. A distributed lag nonlinear model was used to assess the lag correlation of the search terms with influenza activity. Subsequently, a predictive model based on the gated recurrent unit and multiple attention mechanisms was developed to forecast the influenza-positive trend. Results: This study revealed a high correlation between specific Baidu search terms and influenza-positive rates in both northern and southern China, except for 1 term. The search terms were categorized into 4 groups: essential facts on influenza, influenza symptoms, influenza treatment and medicine, and influenza prevention, all of which showed correlation with the influenza-positive rate. The influenza prevention and influenza symptom groups had a lag correlation of 1.4-3.2 and 5.0-8.0 days, respectively. The Baidu search terms could help predict the influenza-positive rate 14-22 days in advance in southern China but interfered with influenza surveillance in northern China. Conclusions: Complementing traditional disease surveillance systems with information from web-based data sources can aid in detecting warning signs of influenza outbreaks earlier. However, supplementation of modern surveillance with search engine information should be approached cautiously. This approach provides valuable insights for digital epidemiology and has the potential for broader application in respiratory infectious disease surveillance. Further research should explore the optimization and customization of search terms for different regions and languages to improve the accuracy of influenza prediction models. %M 37847532 %R 10.2196/45085 %U https://www.jmir.org/2023/1/e45085 %U https://doi.org/10.2196/45085 %U http://www.ncbi.nlm.nih.gov/pubmed/37847532 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 10 %N %P e49132 %T A Motivational Interviewing Chatbot With Generative Reflections for Increasing Readiness to Quit Smoking: Iterative Development Study %A Brown,Andrew %A Kumar,Ash Tanuj %A Melamed,Osnat %A Ahmed,Imtihan %A Wang,Yu Hao %A Deza,Arnaud %A Morcos,Marc %A Zhu,Leon %A Maslej,Marta %A Minian,Nadia %A Sujaya,Vidya %A Wolff,Jodi %A Doggett,Olivia %A Iantorno,Mathew %A Ratto,Matt %A Selby,Peter %A Rose,Jonathan %+ The Edward S Rogers Sr Department of Electrical & Computer Engineering, University of Toronto, 10 King's College Rd, Toronto, ON, M5S 3G4, Canada, 1 416 978 6992, jonathan.rose@ece.utoronto.ca %K conversational agents %K chatbots %K behavior change %K smoking cessation %K motivational interviewing %K deep learning %K natural language processing %K transformers %K generative artificial intelligence %K artificial intelligence %K AI %D 2023 %7 17.10.2023 %9 Original Paper %J JMIR Ment Health %G English %X Background: The motivational interviewing (MI) approach has been shown to help move ambivalent smokers toward the decision to quit smoking. There have been several attempts to broaden access to MI through text-based chatbots. These typically use scripted responses to client statements, but such nonspecific responses have been shown to reduce effectiveness. Recent advances in natural language processing provide a new way to create responses that are specific to a client’s statements, using a generative language model. Objective: This study aimed to design, evolve, and measure the effectiveness of a chatbot system that can guide ambivalent people who smoke toward the decision to quit smoking with MI-style generative reflections. Methods: Over time, 4 different MI chatbot versions were evolved, and each version was tested with a separate group of ambivalent smokers. A total of 349 smokers were recruited through a web-based recruitment platform. The first chatbot version only asked questions without reflections on the answers. The second version asked the questions and provided reflections with an initial version of the reflection generator. The third version used an improved reflection generator, and the fourth version added extended interaction on some of the questions. Participants’ readiness to quit was measured before the conversation and 1 week later using an 11-point scale that measured 3 attributes related to smoking cessation: readiness, confidence, and importance. The number of quit attempts made in the week before the conversation and the week after was surveyed; in addition, participants rated the perceived empathy of the chatbot. The main body of the conversation consists of 5 scripted questions, responses from participants, and (for 3 of the 4 versions) generated reflections. A pretrained transformer-based neural network was fine-tuned on examples of high-quality reflections to generate MI reflections. Results: The increase in average confidence using the nongenerative version was 1.0 (SD 2.0; P=.001), whereas for the 3 generative versions, the increases ranged from 1.2 to 1.3 (SD 2.0-2.3; P<.001). The extended conversation with improved generative reflections was the only version associated with a significant increase in average importance (0.7, SD 2.0; P<.001) and readiness (0.4, SD 1.7; P=.01). The enhanced reflection and extended conversations exhibited significantly better perceived empathy than the nongenerative conversation (P=.02 and P=.004, respectively). The number of quit attempts did not significantly change between the week before the conversation and the week after across all 4 conversations. Conclusions: The results suggest that generative reflections increase the impact of a conversation on readiness to quit smoking 1 week later, although a significant portion of the impact seen so far can be achieved by only asking questions without the reflections. These results support further evolution of the chatbot conversation and can serve as a basis for comparison against more advanced versions. %M 37847539 %R 10.2196/49132 %U https://mental.jmir.org/2023/1/e49132 %U https://doi.org/10.2196/49132 %U http://www.ncbi.nlm.nih.gov/pubmed/37847539 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e47014 %T Identifying Potential Lyme Disease Cases Using Self-Reported Worldwide Tweets: Deep Learning Modeling Approach Enhanced With Sentimental Words Through Emojis %A Laison,Elda Kokoe Elolo %A Hamza Ibrahim,Mohamed %A Boligarla,Srikanth %A Li,Jiaxin %A Mahadevan,Raja %A Ng,Austen %A Muthuramalingam,Venkataraman %A Lee,Wee Yi %A Yin,Yijun %A Nasri,Bouchra R %+ Département de médecine sociale et préventive, École de Santé Publique de l’Université de Montréal, Université de Montréal, 7101 Park Ave, Montréal, QC, H3N 1X9, Canada, 1 514 343 7973, bouchra.nasri@umontreal.ca %K Lyme disease %K Twitter %K BERT %K Bidirectional Encoder Representations from Transformers %K emojis %K machine learning %K natural language processing %D 2023 %7 16.10.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Lyme disease is among the most reported tick-borne diseases worldwide, making it a major ongoing public health concern. An effective Lyme disease case reporting system depends on timely diagnosis and reporting by health care professionals, and accurate laboratory testing and interpretation for clinical diagnosis validation. A lack of these can lead to delayed diagnosis and treatment, which can exacerbate the severity of Lyme disease symptoms. Therefore, there is a need to improve the monitoring of Lyme disease by using other data sources, such as web-based data. Objective: We analyzed global Twitter data to understand its potential and limitations as a tool for Lyme disease surveillance. We propose a transformer-based classification system to identify potential Lyme disease cases using self-reported tweets. Methods: Our initial sample included 20,000 tweets collected worldwide from a database of over 1.3 million Lyme disease tweets. After preprocessing and geolocating tweets, tweets in a subset of the initial sample were manually labeled as potential Lyme disease cases or non-Lyme disease cases using carefully selected keywords. Emojis were converted to sentiment words, which were then replaced in the tweets. This labeled tweet set was used for the training, validation, and performance testing of DistilBERT (distilled version of BERT [Bidirectional Encoder Representations from Transformers]), ALBERT (A Lite BERT), and BERTweet (BERT for English Tweets) classifiers. Results: The empirical results showed that BERTweet was the best classifier among all evaluated models (average F1-score of 89.3%, classification accuracy of 90.0%, and precision of 97.1%). However, for recall, term frequency-inverse document frequency and k-nearest neighbors performed better (93.2% and 82.6%, respectively). On using emojis to enrich the tweet embeddings, BERTweet had an increased recall (8% increase), DistilBERT had an increased F1-score of 93.8% (4% increase) and classification accuracy of 94.1% (4% increase), and ALBERT had an increased F1-score of 93.1% (5% increase) and classification accuracy of 93.9% (5% increase). The general awareness of Lyme disease was high in the United States, the United Kingdom, Australia, and Canada, with self-reported potential cases of Lyme disease from these countries accounting for around 50% (9939/20,000) of the collected English-language tweets, whereas Lyme disease–related tweets were rare in countries from Africa and Asia. The most reported Lyme disease–related symptoms in the data were rash, fatigue, fever, and arthritis, while symptoms, such as lymphadenopathy, palpitations, swollen lymph nodes, neck stiffness, and arrythmia, were uncommon, in accordance with Lyme disease symptom frequency. Conclusions: The study highlights the robustness of BERTweet and DistilBERT as classifiers for potential cases of Lyme disease from self-reported data. The results demonstrated that emojis are effective for enrichment, thereby improving the accuracy of tweet embeddings and the performance of classifiers. Specifically, emojis reflecting sadness, empathy, and encouragement can reduce false negatives. %M 37843893 %R 10.2196/47014 %U https://www.jmir.org/2023/1/e47014 %U https://doi.org/10.2196/47014 %U http://www.ncbi.nlm.nih.gov/pubmed/37843893 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e46652 %T Identification of Psycho-Socio-Judicial Trajectories and Factors Associated With Posttraumatic Stress Disorder in People Over 15 Years of Age Who Recently Reported Sexual Assault to a Forensic Medical Center: Protocol for a Multicentric Prospective Study Using Mixed Methods and Artificial Intelligence %A Fedele,Emma %A Trousset,Victor %A Schalk,Thibault %A Oliero,Juliette %A Fovet,Thomas %A Lefevre,Thomas %+ Institute for Interdisciplinary Research on Social Issues (UMR 8156), Campus Condorcet, 5 cours des Humanités, Batiment Recherche Sud, Aubervilliers Cedex, Aubervilliers, 93322, France, 33 1 88 12 11 75, emma.fedele@univ-paris13.fr %K sexual violence %K posttraumatic stress disorder %K functional outcomes %K risk factors %K artificial intelligence %K trajectory %K longitudinal %K mixed methods %K sexual assault %K mental health %K cohort study %K PTSD %K innovative %D 2023 %7 16.10.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: Sexual assault (SA) can lead to a range of adverse effects on physical, sexual, and mental health, as well as on one’s social life, financial stability, and overall quality of life. However, not all people who experience SA will develop negative functional outcomes. Various risk and protective factors can influence psycho-socio-judicial trajectories. However, how these factors influence trauma adaptation and the onset of early posttraumatic stress disorder (PTSD) is not always clear. Objective: Guided by an ecological framework, this project has 3 primary objectives: (1) to describe the 1-year psycho-socio-judicial trajectories of individuals recently exposed to SA who sought consultation with a forensic practitioner; (2) to identify predictive factors for the development of PTSD during the initial forensic examination using artificial intelligence; and (3) to explore the perceptions, needs, and experiences of individuals who have been sexually assaulted. Methods: This longitudinal multicentric cohort study uses a mixed methods approach. Quantitative cohort data are collected through an initial questionnaire completed by the physician during the first forensic examination and through follow-up telephone questionnaires at 6 weeks, 3 months, 6 months, and 1 year after the SA. The questionnaires measure factors associated with PTSD, mental, physical, social, and overall functional outcomes, as well as psycho-socio-judicial trajectories. Cohort participants are recruited through their forensic examination at 1 of the 5 participating centers based in France. Eligible participants are aged 15 or older, have experienced SA in the last 30 days, are fluent in French, and can be reached by phone. Qualitative data are gathered through semistructured interviews with cohort participants, individuals who have experienced SA but are not part of the cohort, and professionals involved in their psycho-socio-judicial care. Results: Bivariate and multivariate analyses will be conducted to examine the associations between each variable and mental, physical, social, and judicial outcomes. Predictive analyses will be performed using multiple prediction algorithms to forecast PTSD. Qualitative data will be integrated with quantitative data to identify psycho-socio-judicial trajectories and enhance the prediction of PTSD. Additionally, data on the perceptions and needs of individuals who have experienced SA will be analyzed independently to gain a deeper understanding of their experiences and requirements. Conclusions: This project will collect extensive qualitative and quantitative data that have never been gathered over such an extended period, leading to unprecedented insights into the psycho-socio-judicial trajectories of individuals who have recently experienced SA. It represents the initial phase of developing a functional artificial intelligence tool that forensic practitioners can use to better guide individuals who have recently experienced SA, with the aim of preventing the onset of PTSD. Furthermore, it will contribute to addressing the existing gap in the literature regarding the accessibility and effectiveness of support services for individuals who have experienced SA in Europe. This comprehensive approach, encompassing the entire psycho-socio-judicial continuum and taking into account the viewpoints of SA survivors, will enable the generation of innovative recommendations for enhancing their care across all stages, starting from the initial forensic examination. International Registered Report Identifier (IRRID): DERR1-10.2196/46652 %M 37843900 %R 10.2196/46652 %U https://www.researchprotocols.org/2023/1/e46652 %U https://doi.org/10.2196/46652 %U http://www.ncbi.nlm.nih.gov/pubmed/37843900 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e50728 %T Knowledge and Perception of the Use of AI and its Implementation in the Field of Radiology: Cross-Sectional Study %A Miró Catalina,Queralt %A Femenia,Joaquim %A Fuster-Casanovas,Aïna %A Marin-Gomez,Francesc X %A Escalé-Besa,Anna %A Solé-Casals,Jordi %A Vidal-Alaball,Josep %+ Data and Signal Processing group, Faculty of Science, Technology and Engineering, University of Vic-Central University of Catalonia, Carrer de la Laura, 13, Vic, 08500, Spain, 34 938 86 12 22, jordi.sole@uvic.cat %K artificial intelligence %K perception %K knowledge %K survey %K digital health %K radiology %K public health %D 2023 %7 13.10.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial Intelligence (AI) has been developing for decades, but in recent years its use in the field of health care has experienced an exponential increase. Currently, there is little doubt that these tools have transformed clinical practice. Therefore, it is important to know how the population perceives its implementation to be able to propose strategies for acceptance and implementation and to improve or prevent problems arising from future applications. Objective: This study aims to describe the population’s perception and knowledge of the use of AI as a health support tool and its application to radiology through a validated questionnaire, in order to develop strategies aimed at increasing acceptance of AI use, reducing possible resistance to change and identifying possible sociodemographic factors related to perception and knowledge. Methods: A cross-sectional observational study was conducted using an anonymous and voluntarily validated questionnaire aimed at the entire population of Catalonia aged 18 years or older. The survey addresses 4 dimensions defined to describe users’ perception of the use of AI in radiology, (1) “distrust and accountability,” (2) “personal interaction,” (3) “efficiency,” and (4) “being informed,” all with questions in a Likert scale format. Results closer to 5 refer to a negative perception of the use of AI, while results closer to 1 express a positive perception. Univariate and bivariate analyses were performed to assess possible associations between the 4 dimensions and sociodemographic characteristics. Results: A total of 379 users responded to the survey, with an average age of 43.9 (SD 17.52) years and 59.8% (n=226) of them identified as female. In addition, 89.8% (n=335) of respondents indicated that they understood the concept of AI. Of the 4 dimensions analyzed, “distrust and accountability” obtained a mean score of 3.37 (SD 0.53), “personal interaction” obtained a mean score of 4.37 (SD 0.60), “efficiency” obtained a mean score of 3.06 (SD 0.73) and “being informed” obtained a mean score of 3.67 (SD 0.57). In relation to the “distrust and accountability” dimension, women, people older than 65 years, the group with university studies, and the population that indicated not understanding the AI concept had significantly more distrust in the use of AI. On the dimension of “being informed,” it was observed that the group with university studies rated access to information more positively and those who indicated not understanding the concept of AI rated it more negatively. Conclusions: The majority of the sample investigated reported being familiar with the concept of AI, with varying degrees of acceptance of its implementation in radiology. It is clear that the most conflictive dimension is “personal interaction,” whereas “efficiency” is where there is the greatest acceptance, being the dimension in which there are the best expectations for the implementation of AI in radiology. %M 37831495 %R 10.2196/50728 %U https://www.jmir.org/2023/1/e50728 %U https://doi.org/10.2196/50728 %U http://www.ncbi.nlm.nih.gov/pubmed/37831495 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e48023 %T Accuracy of ChatGPT on Medical Questions in the National Medical Licensing Examination in Japan: Evaluation Study %A Yanagita,Yasutaka %A Yokokawa,Daiki %A Uchida,Shun %A Tawara,Junsuke %A Ikusaka,Masatomi %+ Department of General Medicine, Chiba University Hospital, 1-8-1 Inohana, Chuo-ku, Chiba, 260-8677, Japan, 81 43 222 7171 ext 6438, y.yanagita@gmail.com %K artificial intelligence %K ChatGPT %K GPT-4 %K AI %K National Medical Licensing Examination %K Japanese %K NMLE %D 2023 %7 13.10.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: ChatGPT (OpenAI) has gained considerable attention because of its natural and intuitive responses. ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers, as stated by OpenAI as a limitation. However, considering that ChatGPT is an interactive AI that has been trained to reduce the output of unethical sentences, the reliability of the training data is high and the usefulness of the output content is promising. Fortunately, in March 2023, a new version of ChatGPT, GPT-4, was released, which, according to internal evaluations, was expected to increase the likelihood of producing factual responses by 40% compared with its predecessor, GPT-3.5. The usefulness of this version of ChatGPT in English is widely appreciated. It is also increasingly being evaluated as a system for obtaining medical information in languages other than English. Although it does not reach a passing score on the national medical examination in Chinese, its accuracy is expected to gradually improve. Evaluation of ChatGPT with Japanese input is limited, although there have been reports on the accuracy of ChatGPT’s answers to clinical questions regarding the Japanese Society of Hypertension guidelines and on the performance of the National Nursing Examination. Objective: The objective of this study is to evaluate whether ChatGPT can provide accurate diagnoses and medical knowledge for Japanese input. Methods: Questions from the National Medical Licensing Examination (NMLE) in Japan, administered by the Japanese Ministry of Health, Labour and Welfare in 2022, were used. All 400 questions were included. Exclusion criteria were figures and tables that ChatGPT could not recognize; only text questions were extracted. We instructed GPT-3.5 and GPT-4 to input the Japanese questions as they were and to output the correct answers for each question. The output of ChatGPT was verified by 2 general practice physicians. In case of discrepancies, they were checked by another physician to make a final decision. The overall performance was evaluated by calculating the percentage of correct answers output by GPT-3.5 and GPT-4. Results: Of the 400 questions, 292 were analyzed. Questions containing charts, which are not supported by ChatGPT, were excluded. The correct response rate for GPT-4 was 81.5% (237/292), which was significantly higher than the rate for GPT-3.5, 42.8% (125/292). Moreover, GPT-4 surpassed the passing standard (>72%) for the NMLE, indicating its potential as a diagnostic and therapeutic decision aid for physicians. Conclusions: GPT-4 reached the passing standard for the NMLE in Japan, entered in Japanese, although it is limited to written questions. As the accelerated progress in the past few months has shown, the performance of the AI will improve as the large language model continues to learn more, and it may well become a decision support system for medical professionals by providing more accurate information. %M 37831496 %R 10.2196/48023 %U https://formative.jmir.org/2023/1/e48023 %U https://doi.org/10.2196/48023 %U http://www.ncbi.nlm.nih.gov/pubmed/37831496 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e49949 %T Democratizing Artificial Intelligence Imaging Analysis With Automated Machine Learning: Tutorial %A Thirunavukarasu,Arun James %A Elangovan,Kabilan %A Gutierrez,Laura %A Li,Yong %A Tan,Iris %A Keane,Pearse A %A Korot,Edward %A Ting,Daniel Shu Wei %+ University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Hills Rd, Cambridge, CB2 0SP, United Kingdom, 44 01223 336700, ajt205@cantab.ac.uk %K machine learning %K automated machine learning %K autoML %K artificial intelligence %K democratization %K autonomous AI %K imaging %K image analysis %K automation %K AI engineering %D 2023 %7 12.10.2023 %9 Tutorial %J J Med Internet Res %G English %X Deep learning–based clinical imaging analysis underlies diagnostic artificial intelligence (AI) models, which can match or even exceed the performance of clinical experts, having the potential to revolutionize clinical practice. A wide variety of automated machine learning (autoML) platforms lower the technical barrier to entry to deep learning, extending AI capabilities to clinicians with limited technical expertise, and even autonomous foundation models such as multimodal large language models. Here, we provide a technical overview of autoML with descriptions of how autoML may be applied in education, research, and clinical practice. Each stage of the process of conducting an autoML project is outlined, with an emphasis on ethical and technical best practices. Specifically, data acquisition, data partitioning, model training, model validation, analysis, and model deployment are considered. The strengths and limitations of available code-free, code-minimal, and code-intensive autoML platforms are considered. AutoML has great potential to democratize AI in medicine, improving AI literacy by enabling “hands-on” education. AutoML may serve as a useful adjunct in research by facilitating rapid testing and benchmarking before significant computational resources are committed. AutoML may also be applied in clinical contexts, provided regulatory requirements are met. The abstraction by autoML of arduous aspects of AI engineering promotes prioritization of data set curation, supporting the transition from conventional model-driven approaches to data-centric development. To fulfill its potential, clinicians must be educated on how to apply these technologies ethically, rigorously, and effectively; this tutorial represents a comprehensive summary of relevant considerations. %M 37824185 %R 10.2196/49949 %U https://www.jmir.org/2023/1/e49949 %U https://doi.org/10.2196/49949 %U http://www.ncbi.nlm.nih.gov/pubmed/37824185 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e44895 %T The Value of Applying Machine Learning in Predicting the Time of Symptom Onset in Stroke Patients: Systematic Review and Meta-Analysis %A Feng,Jing %A Zhang,Qizhi %A Wu,Feng %A Peng,Jinxiang %A Li,Ziwei %A Chen,Zhuang %+ Department of Cardiovascular Medicine, Fifth People’s Hospital of Jinan, 24297 Jingshi Road, Huaiyin District, Jinan, 250000, China, 86 18764026019, humourzhuang@163.com %K machine learning %K ischemic stroke %K onset time %K stroke %D 2023 %7 12.10.2023 %9 Review %J J Med Internet Res %G English %X Background: Machine learning is a potentially effective method for identifying and predicting the time of the onset of stroke. However, the value of applying machine learning in this field remains controversial and debatable. Objective: We aimed to assess the value of applying machine learning in predicting the time of stroke onset. Methods: PubMed, Web of Science, Embase, and Cochrane were comprehensively searched. The C index and sensitivity with 95% CI were used as effect sizes. The risk of bias was evaluated using PROBAST (Prediction Model Risk of Bias Assessment Tool), and meta-analysis was conducted using R (version 4.2.0; R Core Team). Results: Thirteen eligible studies were included in the meta-analysis involving 55 machine learning models with 41 models in the training set and 14 in the validation set. The overall C index was 0.800 (95% CI 0.773-0.826) in the training set and 0.781 (95% CI 0.709-0.852) in the validation set. The sensitivity and specificity were 0.76 (95% CI 0.73-0.80) and 0.79 (95% CI 0.74-0.82) in the training set and 0.81 (95% CI 0.68-0.90) and 0.83 (95% CI 0.73-0.89) in the validation set, respectively. Subgroup analysis revealed that the accuracy of machine learning in predicting the time of stroke onset within 4.5 hours was optimal (training: 0.80, 95% CI 0.77-0.83; validation: 0.79, 95% CI 0.71-0.86). Conclusions: Machine learning has ideal performance in identifying the time of stroke onset. More reasonable image segmentation and texture extraction methods in radiomics should be used to promote the value of applying machine learning in diverse ethnic backgrounds. Trial Registration: PROSPERO CRD42022358898; https://www.crd.york.ac.uk/Prospero/display_record.php?RecordID=358898 %M 37824198 %R 10.2196/44895 %U https://www.jmir.org/2023/1/e44895 %U https://doi.org/10.2196/44895 %U http://www.ncbi.nlm.nih.gov/pubmed/37824198 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 6 %N %P e48413 %T Congenital Telangiectatic Erythema: Scoping Review %A Wojtara,Magda Sara %A Kang,Jayne %A Zaman,Mohammed %+ Department of Human Genetics, University of Michigan Medical School, 1301 Catherine St, Ann Arbor, MI, 48109, United States, 1 248 962 3672, wojtaram@umich.edu %K rare diseases %K rare disease %K artificial intelligence %K AI %K dermatology %K dermatologist %K DNA repair %K teledermatology %K systematic review %K erythema %K deoxyribonucleic acid %K bloom syndrome %K postnatal growth deficiency %K immune abnormality %K cancer %K oncology %K DNA mutation %K heredity %D 2023 %7 5.10.2023 %9 Review %J JMIR Dermatol %G English %X Background: Congenital telangiectatic erythema (CTE), also known as Bloom syndrome, is a rare autosomal recessive disorder characterized by below-average height, a narrow face, a red skin rash occurring on sun-exposed areas of the body, and an increased risk of cancer. CTE is one of many genodermatoses and photodermatoses associated with defects in DNA repair. CTE is caused by a mutation occurring in the BLM gene, which causes abnormal breaks in chromosomes. Objective: We aimed to analyze the existing literature on CTE to provide additional insight into its heredity, the spectrum of clinical presentations, and the management of this disorder. In addition, the gaps in current research and the use of artificial intelligence to streamline clinical diagnosis and the management of CTE are outlined. Methods: A literature search was conducted on PubMed, DOAJ, and Scopus using search terms such as “congenital telangiectatic erythema,” “bloom syndrome,” and “bloom-torre-machacek.” Due to limited current literature, studies published from January 2000 to January 2023 were considered for this review. A total of 49 sources from the literature were analyzed. Results: Through this scoping review, the researchers were able to identify several publications focusing on Bloom syndrome. Some common subject areas included the heredity of CTE, clinical presentations of CTE, and management of CTE. In addition, the literature on rare diseases shows the potential advancements in understanding and treatment with artificial intelligence. Future studies should address the causes of heterogeneity in presentation and examine potential therapeutic candidates for CTE and similarly presenting syndromes. Conclusions: This review illuminated current advances in potential molecular targets or causative pathways in the development of CTE as well as clinical features including erythema, increased cancer risk, and growth abnormalities. Future studies should continue to explore innovations in this space, especially in regard to the use of artificial intelligence, including machine learning and deep learning, for the diagnosis and clinical management of rare diseases such as CTE. %M 37796556 %R 10.2196/48413 %U https://derma.jmir.org/2023/1/e48413 %U https://doi.org/10.2196/48413 %U http://www.ncbi.nlm.nih.gov/pubmed/37796556 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 12 %N %P e48381 %T Three-Dimensional Virtual Reconstructions of Shoulder Movements Using Computed Tomography Images: Model Development %A Kim,Yu-Hee %A Park,In %A Cho,Soo Buem %A Yang,Seoyon %A Kim,Il %A Lee,Kyong-Ha %A Choi,Kwangnam %A Han,Seung-Ho %+ Department of Anatomy, Ewha Womans University College of Medicine, 25 Magokdong-ro 2-gil, Gangseo-gu, Seoul, 07804, Republic of Korea, 82 2 6986 2601, sanford@ewha.ac.kr %K human digital twin %K musculoskeletal twin %K shoulder movement %K visualization application %K digital twin %K musculoskeletal %K visualization %K movement %K joint %K shoulder %K tomography %K development %K animation %K animated %K anatomy %K anatomical %K digital health %K representation %K simulation %K virtual %D 2023 %7 5.10.2023 %9 Research Letter %J Interact J Med Res %G English %X %M 37796554 %R 10.2196/48381 %U https://www.i-jmr.org/2023/1/e48381 %U https://doi.org/10.2196/48381 %U http://www.ncbi.nlm.nih.gov/pubmed/37796554 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e50638 %T Prompt Engineering as an Important Emerging Skill for Medical Professionals: Tutorial %A Meskó,Bertalan %+ The Medical Futurist Institute, Povl Bang-Jensen u. 2/B1. 4/1., Budapest, 1118, Hungary, 36 703807260, berci@medicalfuturist.com %K artificial intelligence %K AI %K digital health %K future %K technology %K ChatGPT %K GPT-4 %K large language models %K language model %K LLM %K prompt %K prompts %K prompt engineering %K AI tool %K engineering %K healthcare professional %K decision-making %K LLMs %K chatbot %K chatbots %K conversational agent %K conversational agents %K NLP %K natural language processing %D 2023 %7 4.10.2023 %9 Tutorial %J J Med Internet Res %G English %X Prompt engineering is a relatively new field of research that refers to the practice of designing, refining, and implementing prompts or instructions that guide the output of large language models (LLMs) to help in various tasks. With the emergence of LLMs, the most popular one being ChatGPT that has attracted the attention of over a 100 million users in only 2 months, artificial intelligence (AI), especially generative AI, has become accessible for the masses. This is an unprecedented paradigm shift not only because of the use of AI becoming more widespread but also due to the possible implications of LLMs in health care. As more patients and medical professionals use AI-based tools, LLMs being the most popular representatives of that group, it seems inevitable to address the challenge to improve this skill. This paper summarizes the current state of research about prompt engineering and, at the same time, aims at providing practical recommendations for the wide range of health care professionals to improve their interactions with LLMs. %M 37792434 %R 10.2196/50638 %U https://www.jmir.org/2023/1/e50638 %U https://doi.org/10.2196/50638 %U http://www.ncbi.nlm.nih.gov/pubmed/37792434 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e49944 %T A Natural Language Processing Model for COVID-19 Detection Based on Dutch General Practice Electronic Health Records by Using Bidirectional Encoder Representations From Transformers: Development and Validation Study %A Homburg,Maarten %A Meijer,Eline %A Berends,Matthijs %A Kupers,Thijmen %A Olde Hartman,Tim %A Muris,Jean %A de Schepper,Evelien %A Velek,Premysl %A Kuiper,Jeroen %A Berger,Marjolein %A Peters,Lilian %+ Department of Primary- and Long-Term Care, University Medical Center Groningen, Home Post Code FA21, PO Box 196, Groningen, 9700 RB, Netherlands, 31 050 3616161, t.m.homburg@umcg.nl %K natural language processing %K primary care %K COVID-19 %K EHR %K electronic health records %K public health %K multidisciplinary %K NLP %K disease identification %K BERT model %K model development %K prediction %D 2023 %7 4.10.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Natural language processing (NLP) models such as bidirectional encoder representations from transformers (BERT) hold promise in revolutionizing disease identification from electronic health records (EHRs) by potentially enhancing efficiency and accuracy. However, their practical application in practice settings demands a comprehensive and multidisciplinary approach to development and validation. The COVID-19 pandemic highlighted challenges in disease identification due to limited testing availability and challenges in handling unstructured data. In the Netherlands, where general practitioners (GPs) serve as the first point of contact for health care, EHRs generated by these primary care providers contain a wealth of potentially valuable information. Nonetheless, the unstructured nature of free-text entries in EHRs poses challenges in identifying trends, detecting disease outbreaks, or accurately pinpointing COVID-19 cases. Objective: This study aims to develop and validate a BERT model for detecting COVID-19 consultations in general practice EHRs in the Netherlands. Methods: The BERT model was initially pretrained on Dutch language data and fine-tuned using a comprehensive EHR data set comprising confirmed COVID-19 GP consultations and non–COVID-19–related consultations. The data set was partitioned into a training and development set, and the model’s performance was evaluated on an independent test set that served as the primary measure of its effectiveness in COVID-19 detection. To validate the final model, its performance was assessed through 3 approaches. First, external validation was applied on an EHR data set from a different geographic region in the Netherlands. Second, validation was conducted using results of polymerase chain reaction (PCR) test data obtained from municipal health services. Lastly, correlation between predicted outcomes and COVID-19–related hospitalizations in the Netherlands was assessed, encompassing the period around the outbreak of the pandemic in the Netherlands, that is, the period before widespread testing. Results: The model development used 300,359 GP consultations. We developed a highly accurate model for COVID-19 consultations (accuracy 0.97, F1-score 0.90, precision 0.85, recall 0.85, specificity 0.99). External validations showed comparable high performance. Validation on PCR test data showed high recall but low precision and specificity. Validation using hospital data showed significant correlation between COVID-19 predictions of the model and COVID-19–related hospitalizations (F1-score 96.8; P<.001; R2=0.69). Most importantly, the model was able to predict COVID-19 cases weeks before the first confirmed case in the Netherlands. Conclusions: The developed BERT model was able to accurately identify COVID-19 cases among GP consultations even preceding confirmed cases. The validated efficacy of our BERT model highlights the potential of NLP models to identify disease outbreaks early, exemplifying the power of multidisciplinary efforts in harnessing technology for disease identification. Moreover, the implications of this study extend beyond COVID-19 and offer a blueprint for the early recognition of various illnesses, revealing that such models could revolutionize disease surveillance. %M 37792444 %R 10.2196/49944 %U https://www.jmir.org/2023/1/e49944 %U https://doi.org/10.2196/49944 %U http://www.ncbi.nlm.nih.gov/pubmed/37792444 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 9 %N %P e44332 %T An Actionable Expert-System Algorithm to Support Nurse-Led Cancer Survivorship Care: Algorithm Development Study %A Pfisterer,Kaylen J %A Lohani,Raima %A Janes,Elizabeth %A Ng,Denise %A Wang,Dan %A Bryant-Lukosius,Denise %A Rendon,Ricardo %A Berlin,Alejandro %A Bender,Jacqueline %A Brown,Ian %A Feifer,Andrew %A Gotto,Geoffrey %A Saha,Shumit %A Cafazzo,Joseph A %A Pham,Quynh %+ Centre for Digital Therapeutics, University Health Network, Techna Institute, Toronto General Hospital/ R. Fraser Elliot Building, 4th Floor, 190 Elizabeth Street, Toronto, ON, M5G 2C4, Canada, 1 416 340 4800 ext 4765, q.pham@uhn.ca %K prostate cancer %K patient-reported outcomes %K nurse-led model of care %K expert system %K artificial intelligence–powered decision support %K digital health %K nursing %K algorithm development %K cancer treatment %K AI %K survivorship %K cancer %D 2023 %7 4.10.2023 %9 Original Paper %J JMIR Cancer %G English %X Background: Comprehensive models of survivorship care are necessary to improve access to and coordination of care. New models of care provide the opportunity to address the complexity of physical and psychosocial problems and long-term health needs experienced by patients following cancer treatment. Objective: This paper presents our expert-informed, rules-based survivorship algorithm to build a nurse-led model of survivorship care to support men living with prostate cancer (PCa). The algorithm is called No Evidence of Disease (Ned) and supports timelier decision-making, enhanced safety, and continuity of care. Methods: An initial rule set was developed and refined through working groups with clinical experts across Canada (eg, nurse experts, physician experts, and scientists; n=20), and patient partners (n=3). Algorithm priorities were defined through a multidisciplinary consensus meeting with clinical nurse specialists, nurse scientists, nurse practitioners, urologic oncologists, urologists, and radiation oncologists (n=17). The system was refined and validated using the nominal group technique. Results: Four levels of alert classification were established, initiated by responses on the Expanded Prostate Cancer Index Composite for Clinical Practice survey, and mediated by changes in minimal clinically important different alert thresholds, alert history, and clinical urgency with patient autonomy influencing clinical acuity. Patient autonomy was supported through tailored education as a first line of response, and alert escalation depending on a patient-initiated request for a nurse consultation. Conclusions: The Ned algorithm is positioned to facilitate PCa nurse-led care models with a high nurse-to-patient ratio. This novel expert-informed PCa survivorship care algorithm contains a defined escalation pathway for clinically urgent symptoms while honoring patient preference. Though further validation is required through a pragmatic trial, we anticipate the Ned algorithm will support timelier decision-making and enhance continuity of care through the automation of more frequent automated checkpoints, while empowering patients to self-manage their symptoms more effectively than standard care. International Registered Report Identifier (IRRID): RR2-10.1136/bmjopen-2020-045806 %M 37792435 %R 10.2196/44332 %U https://cancer.jmir.org/2023/1/e44332 %U https://doi.org/10.2196/44332 %U http://www.ncbi.nlm.nih.gov/pubmed/37792435 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e42758 %T Conversational AI and Vaccine Communication: Systematic Review of the Evidence %A Passanante,Aly %A Pertwee,Ed %A Lin,Leesa %A Lee,Kristi Yoonsup %A Wu,Joseph T %A Larson,Heidi J %+ Department of Infectious Disease Epidemiology, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, United Kingdom, 44 2076368636, aly.passanante@lshtm.ac.uk %K chatbots %K artificial intelligence %K conversational AI %K vaccine communication %K vaccine hesitancy %K conversational agent %K COVID-19 %K vaccine information %K health information %D 2023 %7 3.10.2023 %9 Review %J J Med Internet Res %G English %X Background: Since the mid-2010s, use of conversational artificial intelligence (AI; chatbots) in health care has expanded significantly, especially in the context of increased burdens on health systems and restrictions on in-person consultations with health care providers during the COVID-19 pandemic. One emerging use for conversational AI is to capture evolving questions and communicate information about vaccines and vaccination. Objective: The objective of this systematic review was to examine documented uses and evidence on the effectiveness of conversational AI for vaccine communication. Methods: This systematic review was conducted following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. PubMed, Web of Science, PsycINFO, MEDLINE, Scopus, CINAHL Complete, Cochrane Library, Embase, Epistemonikos, Global Health, Global Index Medicus, Academic Search Complete, and the University of London library database were searched for papers on the use of conversational AI for vaccine communication. The inclusion criteria were studies that included (1) documented instances of conversational AI being used for the purpose of vaccine communication and (2) evaluation data on the impact and effectiveness of the intervention. Results: After duplicates were removed, the review identified 496 unique records, which were then screened by title and abstract, of which 38 were identified for full-text review. Seven fit the inclusion criteria and were assessed and summarized in the findings of this review. Overall, vaccine chatbots deployed to date have been relatively simple in their design and have mainly been used to provide factual information to users in response to their questions about vaccines. Additionally, chatbots have been used for vaccination scheduling, appointment reminders, debunking misinformation, and, in some cases, for vaccine counseling and persuasion. Available evidence suggests that chatbots can have a positive effect on vaccine attitudes; however, studies were typically exploratory in nature, and some lacked a control group or had very small sample sizes. Conclusions: The review found evidence of potential benefits from conversational AI for vaccine communication. Factors that may contribute to the effectiveness of vaccine chatbots include their ability to provide credible and personalized information in real time, the familiarity and accessibility of the chatbot platform, and the extent to which interactions with the chatbot feel “natural” to users. However, evaluations have focused on the short-term, direct effects of chatbots on their users. The potential longer-term and societal impacts of conversational AI have yet to be analyzed. In addition, existing studies do not adequately address how ethics apply in the field of conversational AI around vaccines. In a context where further digitalization of vaccine communication can be anticipated, additional high-quality research will be required across all these areas. %M 37788057 %R 10.2196/42758 %U https://www.jmir.org/2023/1/e42758 %U https://doi.org/10.2196/42758 %U http://www.ncbi.nlm.nih.gov/pubmed/37788057 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e44187 %T Assessing Serious Spinal Pathology Using Bayesian Network Decision Support: Development and Validation Study %A Hill,Adele %A Joyner,Christopher H %A Keith-Jopp,Chloe %A Yet,Barbaros %A Tuncer Sakar,Ceren %A Marsh,William %A Morrissey,Dylan %+ Sport and Exercise Medicine, Queen Mary University of London, Mile End Hospital, Bancroft Road, London, E1 4DG, United Kingdom, 44 2078825010, d.morrissey@qmul.ac.uk %K artificial intelligence %K back pain %K Bayesian network %K expert consensus %D 2023 %7 3.10.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Identifying and managing serious spinal pathology (SSP) such as cauda equina syndrome or spinal infection in patients presenting with low back pain is challenging. Traditional red flag questioning is increasingly criticized, and previous studies show that many clinicians lack confidence in managing patients presenting with red flags. Improving decision-making and reducing the variability of care for these patients is a key priority for clinicians and researchers. Objective: We aimed to improve SSP identification by constructing and validating a decision support tool using a Bayesian network (BN), which is an artificial intelligence technique that combines current evidence and expert knowledge. Methods: A modified RAND appropriateness procedure was undertaken with 16 experts over 3 rounds, designed to elicit the variables, structure, and conditional probabilities necessary to build a causal BN. The BN predicts the likelihood of a patient with a particular presentation having an SSP. The second part of this study used an established framework to direct a 4-part validation that included comparison of the BN with consensus statements, practice guidelines, and recent research. Clinical cases were entered into the model and the results were compared with clinical judgment from spinal experts who were not involved in the elicitation. Receiver operating characteristic curves were plotted and area under the curve were calculated for accuracy statistics. Results: The RAND appropriateness procedure elicited a model including 38 variables in 3 domains: risk factors (10 variables), signs and symptoms (17 variables), and judgment factors (11 variables). Clear consensus was found in the risk factors and signs and symptoms for SSP conditions. The 4-part BN validation demonstrated good performance overall and identified areas for further development. Comparison with available clinical literature showed good overall agreement but suggested certain improvements required to, for example, 2 of the 11 judgment factors. Case analysis showed that cauda equina syndrome, space-occupying lesion/cancer, and inflammatory condition identification performed well across the validation domains. Fracture identification performed less well, but the reasons for the erroneous results are well understood. A review of the content by independent spinal experts backed up the issues with the fracture node, but the BN was otherwise deemed acceptable. Conclusions: The RAND appropriateness procedure and validation framework were successfully implemented to develop the BN for SSP. In comparison with other expert-elicited BN studies, this work goes a step further in validating the output before attempting implementation. Using a framework for model validation, the BN showed encouraging validity and has provided avenues for further developing the outputs that demonstrated poor accuracy. This study provides the vital first step of improving our ability to predict outcomes in low back pain by first considering the problem of SSP. International Registered Report Identifier (IRRID): RR2-10.2196/21804 %M 37788068 %R 10.2196/44187 %U https://formative.jmir.org/2023/1/e44187 %U https://doi.org/10.2196/44187 %U http://www.ncbi.nlm.nih.gov/pubmed/37788068 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e49898 %T Parkinson Disease Recognition Using a Gamified Website: Machine Learning Development and Usability Study %A Parab,Shubham %A Boster,Jerry %A Washington,Peter %+ Department of Information & Computer Sciences, University of Hawaii at Manoa, 2500 Campus Rd, Honolulu, HI, 96822, United States, 1 1 512 680 0926, pyw@hawaii.edu %K Parkinson disease %K digital health %K machine learning %K remote screening %K accessible screening %D 2023 %7 29.9.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Parkinson disease (PD) affects millions globally, causing motor function impairments. Early detection is vital, and diverse data sources aid diagnosis. We focus on lower arm movements during keyboard and trackpad or touchscreen interactions, which serve as reliable indicators of PD. Previous works explore keyboard tapping and unstructured device monitoring; we attempt to further these works with structured tests taking into account 2D hand movement in addition to finger tapping. Our feasibility study uses keystroke and mouse movement data from a remotely conducted, structured, web-based test combined with self-reported PD status to create a predictive model for detecting the presence of PD. Objective: Analysis of finger tapping speed and accuracy through keyboard input and analysis of 2D hand movement through mouse input allowed differentiation between participants with and without PD. This comparative analysis enables us to establish clear distinctions between the two groups and explore the feasibility of using motor behavior to predict the presence of the disease. Methods: Participants were recruited via email by the Hawaii Parkinson Association (HPA) and directed to a web application for the tests. The 2023 HPA symposium was also used as a forum to recruit participants and spread information about our study. The application recorded participant demographics, including age, gender, and race, as well as PD status. We conducted a series of tests to assess finger tapping, using on-screen prompts to request key presses of constant and random keys. Response times, accuracy, and unintended movements resulting in accidental presses were recorded. Participants performed a hand movement test consisting of tracing straight and curved on-screen ribbons using a trackpad or mouse, allowing us to evaluate stability and precision of 2D hand movement. From this tracing, the test collected and stored insights concerning lower arm motor movement. Results: Our formative study included 31 participants, 18 without PD and 13 with PD, and analyzed their lower limb movement data collected from keyboards and computer mice. From the data set, we extracted 28 features and evaluated their significances using an extra tree classifier predictor. A random forest model was trained using the 6 most important features identified by the predictor. These selected features provided insights into precision and movement speed derived from keyboard tapping and mouse tracing tests. This final model achieved an average F1-score of 0.7311 (SD 0.1663) and an average accuracy of 0.7429 (SD 0.1400) over 20 runs for predicting the presence of PD. Conclusions: This preliminary feasibility study suggests the possibility of using technology-based limb movement data to predict the presence of PD, demonstrating the practicality of implementing this approach in a cost-effective and accessible manner. In addition, this study demonstrates that structured mouse movement tests can be used in combination with finger tapping to detect PD. %M 37773607 %R 10.2196/49898 %U https://formative.jmir.org/2023/1/e49898 %U https://doi.org/10.2196/49898 %U http://www.ncbi.nlm.nih.gov/pubmed/37773607 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e49303 %T Centering Public Perceptions on Translating AI Into Clinical Practice: Patient and Public Involvement and Engagement Consultation Focus Group Study %A Lammons,William %A Silkens,Milou %A Hunter,Jamie %A Shah,Sudhir %A Stavropoulou,Charitini %+ National Institute of Health and Care Research, Applied Research Collaboration North Thames, Department of Applied Health Research, University College London, 1-19 Torrington Place, London, WC1E 7HB, United Kingdom, 44 (0)20 8059 0939, william.lammons@ucl.ac.uk %K acceptance %K AI in health care %K AI %K artificial intelligence %K health care research %K health care %K patient and public engagement and involvement %K patient engagement %K public engagement %K transition %D 2023 %7 26.9.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) is widely considered to be the new technical advancement capable of a large-scale modernization of health care. Considering AI’s potential impact on the clinician-patient relationship, health care provision, and health care systems more widely, patients and the wider public should be a part of the development, implementation, and embedding of AI applications in health care. Failing to establish patient and public engagement and involvement (PPIE) can limit AI’s impact. Objective: This study aims to (1) understand patients’ and the public’s perceived benefits and challenges for AI and (2) clarify how to best conduct PPIE in projects on translating AI into clinical practice, given public perceptions of AI. Methods: We conducted this qualitative PPIE focus-group consultation in the United Kingdom. A total of 17 public collaborators representing 7 National Institute of Health and Care Research Applied Research Collaborations across England participated in 1 of 3 web-based semistructured focus group discussions. We explored public collaborators’ understandings, experiences, and perceptions of AI applications in health care. Transcripts were coanalyzed iteratively with 2 public coauthors using thematic analysis. Results: We identified 3 primary deductive themes with 7 corresponding inductive subthemes. Primary theme 1, advantages of implementing AI in health care, had 2 subthemes: system improvements and improve quality of patient care and shared decision-making. Primary theme 2, challenges of implementing AI in health care, had 3 subthemes: challenges with security, bias, and access; public misunderstanding of AI; and lack of human touch in care and decision-making. Primary theme 3, recommendations on PPIE for AI in health care, had 2 subthemes: experience, empowerment, and raising awareness; and acknowledging and supporting diversity in PPIE. Conclusions: Patients and the public can bring unique perspectives on the development, implementation, and embedding of AI in health care. Early PPIE is therefore crucial not only to safeguard patients but also to increase the chances of acceptance of AI by the public and the impact AI can make in terms of outcomes. %M 37751234 %R 10.2196/49303 %U https://www.jmir.org/2023/1/e49303 %U https://doi.org/10.2196/49303 %U http://www.ncbi.nlm.nih.gov/pubmed/37751234 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e46548 %T Factors Influencing the Acceptability, Acceptance, and Adoption of Conversational Agents in Health Care: Integrative Review %A Wutz,Maximilian %A Hermes,Marius %A Winter,Vera %A Köberlein-Neu,Juliane %+ Center for Health Economics and Health Services Research, Schumpeter School of Business and Economics, University of Wuppertal, Rainer-Gruenter-Str 21, Wuppertal, 42119, Germany, 49 202 439 1381, maximilian.wutz@uni-wuppertal.de %K conversational agent %K chatbot %K acceptability %K acceptance %K adoption %K health care %K digital health %K artificial intelligence %K AI %K natural language %K mobile phone %D 2023 %7 26.9.2023 %9 Review %J J Med Internet Res %G English %X Background: Conversational agents (CAs), also known as chatbots, are digital dialog systems that enable people to have a text-based, speech-based, or nonverbal conversation with a computer or another machine based on natural language via an interface. The use of CAs offers new opportunities and various benefits for health care. However, they are not yet ubiquitous in daily practice. Nevertheless, research regarding the implementation of CAs in health care has grown tremendously in recent years. Objective: This review aims to present a synthesis of the factors that facilitate or hinder the implementation of CAs from the perspectives of patients and health care professionals. Specifically, it focuses on the early implementation outcomes of acceptability, acceptance, and adoption as cornerstones of later implementation success. Methods: We performed an integrative review. To identify relevant literature, a broad literature search was conducted in June 2021 with no date limits and using all fields in PubMed, Cochrane Library, Web of Science, LIVIVO, and PsycINFO. To keep the review current, another search was conducted in March 2022. To identify as many eligible primary sources as possible, we used a snowballing approach by searching reference lists and conducted a hand search. Factors influencing the acceptability, acceptance, and adoption of CAs in health care were coded through parallel deductive and inductive approaches, which were informed by current technology acceptance and adoption models. Finally, the factors were synthesized in a thematic map. Results: Overall, 76 studies were included in this review. We identified influencing factors related to 4 core Unified Theory of Acceptance and Use of Technology (UTAUT) and Unified Theory of Acceptance and Use of Technology 2 (UTAUT2) factors (performance expectancy, effort expectancy, facilitating conditions, and hedonic motivation), with most studies underlining the relevance of performance and effort expectancy. To meet the particularities of the health care context, we redefined the UTAUT2 factors social influence, habit, and price value. We identified 6 other influencing factors: perceived risk, trust, anthropomorphism, health issue, working alliance, and user characteristics. Overall, we identified 10 factors influencing acceptability, acceptance, and adoption among health care professionals (performance expectancy, effort expectancy, facilitating conditions, social influence, price value, perceived risk, trust, anthropomorphism, working alliance, and user characteristics) and 13 factors influencing acceptability, acceptance, and adoption among patients (additionally hedonic motivation, habit, and health issue). Conclusions: This review shows manifold factors influencing the acceptability, acceptance, and adoption of CAs in health care. Knowledge of these factors is fundamental for implementation planning. Therefore, the findings of this review can serve as a basis for future studies to develop appropriate implementation strategies. Furthermore, this review provides an empirical test of current technology acceptance and adoption models and identifies areas where additional research is necessary. Trial Registration: PROSPERO CRD42022343690; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=343690 %M 37751279 %R 10.2196/46548 %U https://www.jmir.org/2023/1/e46548 %U https://doi.org/10.2196/46548 %U http://www.ncbi.nlm.nih.gov/pubmed/37751279 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e49963 %T A Future of Smarter Digital Health Empowered by Generative Pretrained Transformer %A Miao,Hongyu %A Li,Chengdong %A Wang,Jing %+ College of Nursing, Florida State University, 98 Varsity Way, Tallahassee, FL, 32306, United States, 1 8506443299, JingWang@nursing.fsu.edu %K generative pretrained model %K artificial intelligence %K digital health %K generative pretrained transformer %K ChatGPT %K precision medicine %K AI %K privacy %K ethics %D 2023 %7 26.9.2023 %9 Viewpoint %J J Med Internet Res %G English %X Generative pretrained transformer (GPT) tools have been thriving, as ignited by the remarkable success of OpenAI’s recent chatbot product. GPT technology offers countless opportunities to significantly improve or renovate current health care research and practice paradigms, especially digital health interventions and digital health–enabled clinical care, and a future of smarter digital health can thus be expected. In particular, GPT technology can be incorporated through various digital health platforms in homes and hospitals embedded with numerous sensors, wearables, and remote monitoring devices. In this viewpoint paper, we highlight recent research progress that depicts the future picture of a smarter digital health ecosystem through GPT-facilitated centralized communications, automated analytics, personalized health care, and instant decision-making. %M 37751243 %R 10.2196/49963 %U https://www.jmir.org/2023/1/e49963 %U https://doi.org/10.2196/49963 %U http://www.ncbi.nlm.nih.gov/pubmed/37751243 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e46520 %T Predicting the Risk of Sleep Disorders Using a Machine Learning–Based Simple Questionnaire: Development and Validation Study %A Ha,Seokmin %A Choi,Su Jung %A Lee,Sujin %A Wijaya,Reinatt Hansel %A Kim,Jee Hyun %A Joo,Eun Yeon %A Kim,Jae Kyoung %+ Biomedical Mathematics Group, Institute for Basic Science, 55 Expo-ro Yuseong-gu, Daejeon, 34126, Republic of Korea, 82 42 350 2736, jaekkim@kaist.ac.kr %K obstructive sleep apnea %K insomnia %K comorbid insomnia and sleep apnea %K polysomnography %K questionnaires %K risk prediction %K XGBoost %K machine learning %K risk %K sleep %D 2023 %7 21.9.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Sleep disorders, such as obstructive sleep apnea (OSA), comorbid insomnia and sleep apnea (COMISA), and insomnia are common and can have serious health consequences. However, accurately diagnosing these conditions can be challenging as a result of the underrecognition of these diseases, the time-intensive nature of sleep monitoring necessary for a proper diagnosis, and patients’ hesitancy to undergo demanding and costly overnight polysomnography tests. Objective: We aim to develop a machine learning algorithm that can accurately predict the risk of OSA, COMISA, and insomnia with a simple set of questions, without the need for a polysomnography test. Methods: We applied extreme gradient boosting to the data from 2 medical centers (n=4257 from Samsung Medical Center and n=365 from Ewha Womans University Medical Center Seoul Hospital). Features were selected based on feature importance calculated by the Shapley additive explanations (SHAP) method. We applied extreme gradient boosting using selected features to develop a simple questionnaire predicting sleep disorders (SLEEPS). The accuracy of the algorithm was evaluated using the area under the receiver operating characteristics curve. Results: In total, 9 features were selected to construct SLEEPS. SLEEPS showed high accuracy, with an area under the receiver operating characteristics curve of greater than 0.897 for all 3 sleep disorders, and consistent performance across both sets of data. We found that the distinction between COMISA and OSA was critical for accurate prediction. A publicly accessible website was created based on the algorithm that provides predictions for the risk of the 3 sleep disorders and shows how the risk changes with changes in weight or age. Conclusions: SLEEPS has the potential to improve the diagnosis and treatment of sleep disorders by providing more accessibility and convenience. The creation of a publicly accessible website based on the algorithm provides a user-friendly tool for assessing the risk of OSA, COMISA, and insomnia. %M 37733411 %R 10.2196/46520 %U https://www.jmir.org/2023/1/e46520 %U https://doi.org/10.2196/46520 %U http://www.ncbi.nlm.nih.gov/pubmed/37733411 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e43963 %T Development and Integration of Machine Learning Algorithm to Identify Peripheral Arterial Disease: Multistakeholder Qualitative Study %A Wang,Sabrina M %A Hogg,H D Jeffry %A Sangvai,Devdutta %A Patel,Manesh R %A Weissler,E Hope %A Kellogg,Katherine C %A Ratliff,William %A Balu,Suresh %A Sendak,Mark %+ Duke Institute for Health Innovation, 200 Morris St, Durham, NC, 27701, United States, 1 (919) 684 4389, mark.sendak@duke.edu %K machine learning %K implementation %K integration %K support %K quality %K peripheral arterial disease %K algorithm %K efficacy %K structure %K barrier %K clinical %K engagement %K development %K translation %K detection %D 2023 %7 21.9.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Machine learning (ML)–driven clinical decision support (CDS) continues to draw wide interest and investment as a means of improving care quality and value, despite mixed real-world implementation outcomes. Objective: This study aimed to explore the factors that influence the integration of a peripheral arterial disease (PAD) identification algorithm to implement timely guideline-based care. Methods: A total of 12 semistructured interviews were conducted with individuals from 3 stakeholder groups during the first 4 weeks of integration of an ML-driven CDS. The stakeholder groups included technical, administrative, and clinical members of the team interacting with the ML-driven CDS. The ML-driven CDS identified patients with a high probability of having PAD, and these patients were then reviewed by an interdisciplinary team that developed a recommended action plan and sent recommendations to the patient’s primary care provider. Pseudonymized transcripts were coded, and thematic analysis was conducted by a multidisciplinary research team. Results: Three themes were identified: positive factors translating in silico performance to real-world efficacy, organizational factors and data structure factors affecting clinical impact, and potential challenges to advancing equity. Our study found that the factors that led to successful translation of in silico algorithm performance to real-world impact were largely nontechnical, given adequate efficacy in retrospective validation, including strong clinical leadership, trustworthy workflows, early consideration of end-user needs, and ensuring that the CDS addresses an actionable problem. Negative factors of integration included failure to incorporate the on-the-ground context, the lack of feedback loops, and data silos limiting the ML-driven CDS. The success criteria for each stakeholder group were also characterized to better understand how teams work together to integrate ML-driven CDS and to understand the varying needs across stakeholder groups. Conclusions: Longitudinal and multidisciplinary stakeholder engagement in the development and integration of ML-driven CDS underpins its effective translation into real-world care. Although previous studies have focused on the technical elements of ML-driven CDS, our study demonstrates the importance of including administrative and operational leaders as well as an early consideration of clinicians’ needs. Seeing how different stakeholder groups have this more holistic perspective also permits more effective detection of context-driven health care inequities, which are uncovered or exacerbated via ML-driven CDS integration through structural and organizational challenges. Many of the solutions to these inequities lie outside the scope of ML and require coordinated systematic solutions for mitigation to help reduce disparities in the care of patients with PAD. %M 37733427 %R 10.2196/43963 %U https://formative.jmir.org/2023/1/e43963 %U https://doi.org/10.2196/43963 %U http://www.ncbi.nlm.nih.gov/pubmed/37733427 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e48115 %T Large-Scale Biomedical Relation Extraction Across Diverse Relation Types: Model Development and Usability Study on COVID-19 %A Zhang,Zeyu %A Fang,Meng %A Wu,Rebecca %A Zong,Hui %A Huang,Honglian %A Tong,Yuantao %A Xie,Yujia %A Cheng,Shiyang %A Wei,Ziyi %A Crabbe,M James C %A Zhang,Xiaoyan %A Wang,Ying %+ Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, 1239 Siping Road, Shanghai, 200092, China, 86 21 65980233, nadger_wang@139.com %K biomedical text mining %K biomedical relation extraction %K pretrained language model %K task-adaptive pretraining %K knowledge graph %K knowledge discovery %K clinical drug path %K COVID-19 %D 2023 %7 20.9.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Biomedical relation extraction (RE) is of great importance for researchers to conduct systematic biomedical studies. It not only helps knowledge mining, such as knowledge graphs and novel knowledge discovery, but also promotes translational applications, such as clinical diagnosis, decision-making, and precision medicine. However, the relations between biomedical entities are complex and diverse, and comprehensive biomedical RE is not yet well established. Objective: We aimed to investigate and improve large-scale RE with diverse relation types and conduct usability studies with application scenarios to optimize biomedical text mining. Methods: Data sets containing 125 relation types with different entity semantic levels were constructed to evaluate the impact of entity semantic information on RE, and performance analysis was conducted on different model architectures and domain models. This study also proposed a continued pretraining strategy and integrated models with scripts into a tool. Furthermore, this study applied RE to the COVID-19 corpus with article topics and application scenarios of clinical interest to assess and demonstrate its biological interpretability and usability. Results: The performance analysis revealed that RE achieves the best performance when the detailed semantic type is provided. For a single model, PubMedBERT with continued pretraining performed the best, with an F1-score of 0.8998. Usability studies on COVID-19 demonstrated the interpretability and usability of RE, and a relation graph database was constructed, which was used to reveal existing and novel drug paths with edge explanations. The models (including pretrained and fine-tuned models), integrated tool (Docker), and generated data (including the COVID-19 relation graph database and drug paths) have been made publicly available to the biomedical text mining community and clinical researchers. Conclusions: This study provided a comprehensive analysis of RE with diverse relation types. Optimized RE models and tools for diverse relation types were developed, which can be widely used in biomedical text mining. Our usability studies provided a proof-of-concept demonstration of how large-scale RE can be leveraged to facilitate novel research. %M 37632414 %R 10.2196/48115 %U https://www.jmir.org/2023/1/e48115 %U https://doi.org/10.2196/48115 %U http://www.ncbi.nlm.nih.gov/pubmed/37632414 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e48780 %T Anki Tagger: A Generative AI Tool for Aligning Third-Party Resources to Preclinical Curriculum %A Pendergrast,Tricia %A Chalmers,Zachary %+ Northwestern University Feinberg School of Medicine, 303 E Chicago Ave, Morton 1-670, Chicago, IL, 60611, United States, 1 3125038194, zachary.chalmers@northwestern.edu %K ChatGPT %K undergraduate medical education %K large language models %K Anki %K flashcards %K artificial intelligence %K AI %D 2023 %7 20.9.2023 %9 Research Letter %J JMIR Med Educ %G English %X Using large language models, we developed a method to efficiently query existing flashcard libraries and select those most relevant to an individual's medical school curricula. %M 37728965 %R 10.2196/48780 %U https://mededu.jmir.org/2023/1/e48780 %U https://doi.org/10.2196/48780 %U http://www.ncbi.nlm.nih.gov/pubmed/37728965 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e45767 %T Using Social Media to Help Understand Patient-Reported Health Outcomes of Post–COVID-19 Condition: Natural Language Processing Approach %A Dolatabadi,Elham %A Moyano,Diana %A Bales,Michael %A Spasojevic,Sofija %A Bhambhoria,Rohan %A Bhatti,Junaid %A Debnath,Shyamolima %A Hoell,Nicholas %A Li,Xin %A Leng,Celine %A Nanda,Sasha %A Saab,Jad %A Sahak,Esmat %A Sie,Fanny %A Uppal,Sara %A Vadlamudi,Nirma Khatri %A Vladimirova,Antoaneta %A Yakimovich,Artur %A Yang,Xiaoxue %A Kocak,Sedef Akinli %A Cheung,Angela M %+ Faculty of Health, School of Health Policy and Management, York University, 4700 Keele Street, North York, Toronto, ON, M3J 1P3, Canada, 1 6477069756, edolatab@yorku.ca %K long COVID %K post–COVID-19 condition %K PCC %K social media %K natural language processing %K transformer models %K bidirectional encoder representations from transformers %K machine learning %K Twitter %K Reddit %K PRO %K patient-reported outcome %K patient-reported symptom %K health outcome %K symptom %K entity extraction %K entity normalization %D 2023 %7 19.9.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: While scientific knowledge of post–COVID-19 condition (PCC) is growing, there remains significant uncertainty in the definition of the disease, its expected clinical course, and its impact on daily functioning. Social media platforms can generate valuable insights into patient-reported health outcomes as the content is produced at high resolution by patients and caregivers, representing experiences that may be unavailable to most clinicians. Objective: In this study, we aimed to determine the validity and effectiveness of advanced natural language processing approaches built to derive insight into PCC-related patient-reported health outcomes from social media platforms Twitter and Reddit. We extracted PCC-related terms, including symptoms and conditions, and measured their occurrence frequency. We compared the outputs with human annotations and clinical outcomes and tracked symptom and condition term occurrences over time and locations to explore the pipeline’s potential as a surveillance tool. Methods: We used bidirectional encoder representations from transformers (BERT) models to extract and normalize PCC symptom and condition terms from English posts on Twitter and Reddit. We compared 2 named entity recognition models and implemented a 2-step normalization task to map extracted terms to unique concepts in standardized terminology. The normalization steps were done using a semantic search approach with BERT biencoders. We evaluated the effectiveness of BERT models in extracting the terms using a human-annotated corpus and a proximity-based score. We also compared the validity and reliability of the extracted and normalized terms to a web-based survey with more than 3000 participants from several countries. Results: UmlsBERT-Clinical had the highest accuracy in predicting entities closest to those extracted by human annotators. Based on our findings, the top 3 most commonly occurring groups of PCC symptom and condition terms were systemic (such as fatigue), neuropsychiatric (such as anxiety and brain fog), and respiratory (such as shortness of breath). In addition, we also found novel symptom and condition terms that had not been categorized in previous studies, such as infection and pain. Regarding the co-occurring symptoms, the pair of fatigue and headaches was among the most co-occurring term pairs across both platforms. Based on the temporal analysis, the neuropsychiatric terms were the most prevalent, followed by the systemic category, on both social media platforms. Our spatial analysis concluded that 42% (10,938/26,247) of the analyzed terms included location information, with the majority coming from the United States, United Kingdom, and Canada. Conclusions: The outcome of our social media–derived pipeline is comparable with the results of peer-reviewed articles relevant to PCC symptoms. Overall, this study provides unique insights into patient-reported health outcomes of PCC and valuable information about the patient’s journey that can help health care providers anticipate future needs. International Registered Report Identifier (IRRID): RR2-10.1101/2022.12.14.22283419 %M 37725432 %R 10.2196/45767 %U https://www.jmir.org/2023/1/e45767 %U https://doi.org/10.2196/45767 %U http://www.ncbi.nlm.nih.gov/pubmed/37725432 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e50514 %T Assessment of Resident and AI Chatbot Performance on the University of Toronto Family Medicine Residency Progress Test: Comparative Study %A Huang,Ryan ST %A Lu,Kevin Jia Qi %A Meaney,Christopher %A Kemppainen,Joel %A Punnett,Angela %A Leung,Fok-Han %+ Temerty Faculty of Medicine, University of Toronto, 1 King’s College Cir, Toronto, ON, M5S 1A8, Canada, 1 416 978 6585, ry.huang@mail.utoronto.ca %K medical education %K medical knowledge exam %K artificial intelligence %K AI %K natural language processing %K NLP %K large language model %K LLM %K machine learning, ChatGPT %K GPT-3.5 %K GPT-4 %K education %K language model %K education examination %K testing %K utility %K family medicine %K medical residents %K test %K community %D 2023 %7 19.9.2023 %9 Original Paper %J JMIR Med Educ %G English %X Background: Large language model (LLM)–based chatbots are evolving at an unprecedented pace with the release of ChatGPT, specifically GPT-3.5, and its successor, GPT-4. Their capabilities in general-purpose tasks and language generation have advanced to the point of performing excellently on various educational examination benchmarks, including medical knowledge tests. Comparing the performance of these 2 LLM models to that of Family Medicine residents on a multiple-choice medical knowledge test can provide insights into their potential as medical education tools. Objective: This study aimed to quantitatively and qualitatively compare the performance of GPT-3.5, GPT-4, and Family Medicine residents in a multiple-choice medical knowledge test appropriate for the level of a Family Medicine resident. Methods: An official University of Toronto Department of Family and Community Medicine Progress Test consisting of multiple-choice questions was inputted into GPT-3.5 and GPT-4. The artificial intelligence chatbot’s responses were manually reviewed to determine the selected answer, response length, response time, provision of a rationale for the outputted response, and the root cause of all incorrect responses (classified into arithmetic, logical, and information errors). The performance of the artificial intelligence chatbots were compared against a cohort of Family Medicine residents who concurrently attempted the test. Results: GPT-4 performed significantly better compared to GPT-3.5 (difference 25.0%, 95% CI 16.3%-32.8%; McNemar test: P<.001); it correctly answered 89/108 (82.4%) questions, while GPT-3.5 answered 62/108 (57.4%) questions correctly. Further, GPT-4 scored higher across all 11 categories of Family Medicine knowledge. In 86.1% (n=93) of the responses, GPT-4 provided a rationale for why other multiple-choice options were not chosen compared to the 16.7% (n=18) achieved by GPT-3.5. Qualitatively, for both GPT-3.5 and GPT-4 responses, logical errors were the most common, while arithmetic errors were the least common. The average performance of Family Medicine residents was 56.9% (95% CI 56.2%-57.6%). The performance of GPT-3.5 was similar to that of the average Family Medicine resident (P=.16), while the performance of GPT-4 exceeded that of the top-performing Family Medicine resident (P<.001). Conclusions: GPT-4 significantly outperforms both GPT-3.5 and Family Medicine residents on a multiple-choice medical knowledge test designed for Family Medicine residents. GPT-4 provides a logical rationale for its response choice, ruling out other answer choices efficiently and with concise justification. Its high degree of accuracy and advanced reasoning capabilities facilitate its potential applications in medical education, including the creation of exam questions and scenarios as well as serving as a resource for medical knowledge or information on community services. %M 37725411 %R 10.2196/50514 %U https://mededu.jmir.org/2023/1/e50514 %U https://doi.org/10.2196/50514 %U http://www.ncbi.nlm.nih.gov/pubmed/37725411 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e43632 %T Patients’ Views on AI for Risk Prediction in Shared Decision-Making for Knee Replacement Surgery: Qualitative Interview Study %A Gould,Daniel J %A Dowsey,Michelle M %A Glanville-Hearst,Marion %A Spelman,Tim %A Bailey,James A %A Choong,Peter F M %A Bunzli,Samantha %+ St Vincent's Hospital, Department of Surgery, University of Melbourne, 29 Regent Street, Melbourne, 3065, Australia, 61 9231 3955, daniel.gould@unimelb.edu.au %K artificial intelligence %K qualitative research %K semistructured interviews %K knee replacement %K risk prediction %K patient perception %K patient understanding %K patient preference %K patient perspective %D 2023 %7 18.9.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: The use of artificial intelligence (AI) in decision-making around knee replacement surgery is increasing, and this technology holds promise to improve the prediction of patient outcomes. Ambiguity surrounds the definition of AI, and there are mixed views on its application in clinical settings. Objective: In this study, we aimed to explore the understanding and attitudes of patients who underwent knee replacement surgery regarding AI in the context of risk prediction for shared clinical decision-making. Methods: This qualitative study involved patients who underwent knee replacement surgery at a tertiary referral center for joint replacement surgery. The participants were selected based on their age and sex. Semistructured interviews explored the participants’ understanding of AI and their opinions on its use in shared clinical decision-making. Data collection and reflexive thematic analyses were conducted concurrently. Recruitment continued until thematic saturation was achieved. Results: Thematic saturation was achieved with 19 interviews and confirmed with 1 additional interview, resulting in 20 participants being interviewed (female participants: n=11, 55%; male participants: n=9, 45%; median age: 66 years). A total of 11 (55%) participants had a substantial postoperative complication. Three themes captured the participants’ understanding of AI and their perceptions of its use in shared clinical decision-making. The theme Expectations captured the participants’ views of themselves as individuals with the right to self-determination as they sought therapeutic solutions tailored to their circumstances, needs, and desires, including whether to use AI at all. The theme Empowerment highlighted the potential of AI to enable patients to develop realistic expectations and equip them with personalized risk information to discuss in shared decision-making conversations with the surgeon. The theme Partnership captured the importance of symbiosis between AI and clinicians because AI has varied levels of interpretability and understanding of human emotions and empathy. Conclusions: Patients who underwent knee replacement surgery in this study had varied levels of familiarity with AI and diverse conceptualizations of its definitions and capabilities. Educating patients about AI through nontechnical explanations and illustrative scenarios could help inform their decision to use it for risk prediction in the shared decision-making process with their surgeon. These findings could be used in the process of developing a questionnaire to ascertain the views of patients undergoing knee replacement surgery on the acceptability of AI in shared clinical decision-making. Future work could investigate the accuracy of this patient group’s understanding of AI, beyond their familiarity with it, and how this influences their acceptance of its use. Surgeons may play a key role in finding a place for AI in the clinical setting as the uptake of this technology in health care continues to grow. %M 37721797 %R 10.2196/43632 %U https://www.jmir.org/2023/1/e43632 %U https://doi.org/10.2196/43632 %U http://www.ncbi.nlm.nih.gov/pubmed/37721797 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e47621 %T The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study %A Kuroiwa,Tomoyuki %A Sarcon,Aida %A Ibara,Takuya %A Yamada,Eriku %A Yamamoto,Akiko %A Tsukamoto,Kazuya %A Fujita,Koji %+ Division of Medical Design Innovations, Open Innovation Center, Institute of Research Innovation, Tokyo Medical and Dental University, 1-5-45 Yushima, Bunkyo-ku, Tokyo, 1138519, Japan, 81 358035279, fujiorth@tmd.ac.jp %K ChatGPT %K generative pretrained transformer %K natural language processing %K artificial intelligence %K chatbot %K diagnosis %K self-diagnosis %K accuracy %K precision %K language model %K orthopedic disease %K AI model %K health information %D 2023 %7 15.9.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) has gained tremendous popularity recently, especially the use of natural language processing (NLP). ChatGPT is a state-of-the-art chatbot capable of creating natural conversations using NLP. The use of AI in medicine can have a tremendous impact on health care delivery. Although some studies have evaluated ChatGPT’s accuracy in self-diagnosis, there is no research regarding its precision and the degree to which it recommends medical consultations. Objective: The aim of this study was to evaluate ChatGPT’s ability to accurately and precisely self-diagnose common orthopedic diseases, as well as the degree of recommendation it provides for medical consultations. Methods: Over a 5-day course, each of the study authors submitted the same questions to ChatGPT. The conditions evaluated were carpal tunnel syndrome (CTS), cervical myelopathy (CM), lumbar spinal stenosis (LSS), knee osteoarthritis (KOA), and hip osteoarthritis (HOA). Answers were categorized as either correct, partially correct, incorrect, or a differential diagnosis. The percentage of correct answers and reproducibility were calculated. The reproducibility between days and raters were calculated using the Fleiss κ coefficient. Answers that recommended that the patient seek medical attention were recategorized according to the strength of the recommendation as defined by the study. Results: The ratios of correct answers were 25/25, 1/25, 24/25, 16/25, and 17/25 for CTS, CM, LSS, KOA, and HOA, respectively. The ratios of incorrect answers were 23/25 for CM and 0/25 for all other conditions. The reproducibility between days was 1.0, 0.15, 0.7, 0.6, and 0.6 for CTS, CM, LSS, KOA, and HOA, respectively. The reproducibility between raters was 1.0, 0.1, 0.64, –0.12, and 0.04 for CTS, CM, LSS, KOA, and HOA, respectively. Among the answers recommending medical attention, the phrases “essential,” “recommended,” “best,” and “important” were used. Specifically, “essential” occurred in 4 out of 125, “recommended” in 12 out of 125, “best” in 6 out of 125, and “important” in 94 out of 125 answers. Additionally, 7 out of the 125 answers did not include a recommendation to seek medical attention. Conclusions: The accuracy and reproducibility of ChatGPT to self-diagnose five common orthopedic conditions were inconsistent. The accuracy could potentially be improved by adding symptoms that could easily identify a specific location. Only a few answers were accompanied by a strong recommendation to seek medical attention according to our study standards. Although ChatGPT could serve as a potential first step in accessing care, we found variability in accurate self-diagnosis. Given the risk of harm with self-diagnosis without medical follow-up, it would be prudent for an NLP to include clear language alerting patients to seek expert medical opinions. We hope to shed further light on the use of AI in a future clinical study. %M 37713254 %R 10.2196/47621 %U https://www.jmir.org/2023/1/e47621 %U https://doi.org/10.2196/47621 %U http://www.ncbi.nlm.nih.gov/pubmed/37713254 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e46523 %T Bot or Not? Detecting and Managing Participant Deception When Conducting Digital Research Remotely: Case Study of a Randomized Controlled Trial %A Loebenberg,Gemma %A Oldham,Melissa %A Brown,Jamie %A Dinu,Larisa %A Michie,Susan %A Field,Matt %A Greaves,Felix %A Garnett,Claire %+ UCL Tobacco and Alcohol Research Group, University College London, 1-19 Torrington Place, London, WC1E 7HB, United Kingdom, 44 20 7679 8781, gemma.loebenberg@ucl.ac.uk %K artificial intelligence %K false information %K mHealth applications %K participant deception %K participant %K recruit %K research subject %K web-based studies %D 2023 %7 14.9.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Evaluating digital interventions using remote methods enables the recruitment of large numbers of participants relatively conveniently and cheaply compared with in-person methods. However, conducting research remotely based on participant self-report with little verification is open to automated “bots” and participant deception. Objective: This paper uses a case study of a remotely conducted trial of an alcohol reduction app to highlight and discuss (1) the issues with participant deception affecting remote research trials with financial compensation; and (2) the importance of rigorous data management to detect and address these issues. Methods: We recruited participants on the internet from July 2020 to March 2022 for a randomized controlled trial (n=5602) evaluating the effectiveness of an alcohol reduction app, Drink Less. Follow-up occurred at 3 time points, with financial compensation offered (up to £36 [US $39.23]). Address authentication and telephone verification were used to detect 2 kinds of deception: “bots,” that is, automated responses generated in clusters; and manual participant deception, that is, participants providing false information. Results: Of the 1142 participants who enrolled in the first 2 months of recruitment, 75.6% (n=863) of them were identified as bots during data screening. As a result, a CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart) was added, and after this, no more bots were identified. Manual participant deception occurred throughout the study. Of the 5956 participants (excluding bots) who enrolled in the study, 298 (5%) were identified as false participants. The extent of this decreased from 110 in November 2020, to a negligible level by February 2022 including a number of months with 0. The decline occurred after we added further screening questions such as attention checks, removed the prominence of financial compensation from social media advertising, and added an additional requirement to provide a mobile phone number for identity verification. Conclusions: Data management protocols are necessary to detect automated bots and manual participant deception in remotely conducted trials. Bots and manual deception can be minimized by adding a CAPTCHA, attention checks, a requirement to provide a phone number for identity verification, and not prominently advertising financial compensation on social media. Trial Registration: ISRCTN Number ISRCTN64052601; https://doi.org/10.1186/ISRCTN64052601 %M 37707943 %R 10.2196/46523 %U https://www.jmir.org/2023/1/e46523 %U https://doi.org/10.2196/46523 %U http://www.ncbi.nlm.nih.gov/pubmed/37707943 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e47049 %T The Potential and Concerns of Using AI in Scientific Research: ChatGPT Performance Evaluation %A Khlaif,Zuheir N %A Mousa,Allam %A Hattab,Muayad Kamal %A Itmazi,Jamil %A Hassan,Amjad A %A Sanmugam,Mageswaran %A Ayyoub,Abedalkarim %+ Faculty of Humanities and Educational Sciences, An-Najah National University, PO Box 7, Nablus, Occupied Palestinian Territory, 970 592754908, zkhlaif@najah.edu %K artificial intelligence %K AI %K ChatGPT %K scientific research %K research ethics %D 2023 %7 14.9.2023 %9 Original Paper %J JMIR Med Educ %G English %X Background: Artificial intelligence (AI) has many applications in various aspects of our daily life, including health, criminal, education, civil, business, and liability law. One aspect of AI that has gained significant attention is natural language processing (NLP), which refers to the ability of computers to understand and generate human language. Objective: This study aims to examine the potential for, and concerns of, using AI in scientific research. For this purpose, high-impact research articles were generated by analyzing the quality of reports generated by ChatGPT and assessing the application’s impact on the research framework, data analysis, and the literature review. The study also explored concerns around ownership and the integrity of research when using AI-generated text. Methods: A total of 4 articles were generated using ChatGPT, and thereafter evaluated by 23 reviewers. The researchers developed an evaluation form to assess the quality of the articles generated. Additionally, 50 abstracts were generated using ChatGPT and their quality was evaluated. The data were subjected to ANOVA and thematic analysis to analyze the qualitative data provided by the reviewers. Results: When using detailed prompts and providing the context of the study, ChatGPT would generate high-quality research that could be published in high-impact journals. However, ChatGPT had a minor impact on developing the research framework and data analysis. The primary area needing improvement was the development of the literature review. Moreover, reviewers expressed concerns around ownership and the integrity of the research when using AI-generated text. Nonetheless, ChatGPT has a strong potential to increase human productivity in research and can be used in academic writing. Conclusions: AI-generated text has the potential to improve the quality of high-impact research articles. The findings of this study suggest that decision makers and researchers should focus more on the methodology part of the research, which includes research design, developing research tools, and analyzing data in depth, to draw strong theoretical and practical implications, thereby establishing a revolution in scientific research in the era of AI. The practical implications of this study can be used in different fields such as medical education to deliver materials to develop the basic competencies for both medicine students and faculty members. %M 37707884 %R 10.2196/47049 %U https://mededu.jmir.org/2023/1/e47049 %U https://doi.org/10.2196/47049 %U http://www.ncbi.nlm.nih.gov/pubmed/37707884 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e48534 %T Estimating Patient Satisfaction Through a Language Processing Model: Model Development and Evaluation %A Matsuda,Shinichi %A Ohtomo,Takumi %A Okuyama,Masaru %A Miyake,Hiraku %A Aoki,Kotonari %+ Drug Safety Division, Chugai Pharmaceutical Co Ltd, 2-1-1 Nihonbashi-Muromachi, Chuo-ku, Tokyo, 103-8324, Japan, 81 8080105061, matsudasni@chugai-pharm.co.jp %K breast cancer %K internet %K machine learning %K natural language processing %K natural language-processing model %K neural network %K NLP %K patient satisfaction %K textual data %D 2023 %7 14.9.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Measuring patient satisfaction is a crucial aspect of medical care. Advanced natural language processing (NLP) techniques enable the extraction and analysis of high-level insights from textual data; nonetheless, data obtained from patients are often limited. Objective: This study aimed to create a model that quantifies patient satisfaction based on diverse patient-written textual data. Methods: We constructed a neural network–based NLP model for this cross-sectional study using the textual content from disease blogs written in Japanese on the Internet between 1994 and 2020. We extracted approximately 20 million sentences from 56,357 patient-authored disease blogs and constructed a model to predict the patient satisfaction index (PSI) using a regression approach. After evaluating the model’s effectiveness, PSI was predicted before and after cancer notification to examine the emotional impact of cancer diagnoses on 48 patients with breast cancer. Results: We assessed the correlation between the predicted and actual PSI values, labeled by humans, using the test set of 169 sentences. The model successfully quantified patient satisfaction by detecting nuances in sentences with excellent effectiveness (Spearman correlation coefficient [ρ]=0.832; root-mean-squared error [RMSE]=0.166; P<.001). Furthermore, the PSI was significantly lower in the cancer notification period than in the preceding control period (−0.057 and −0.012, respectively; 2-tailed t47=5.392, P<.001), indicating that the model quantifies the psychological and emotional changes associated with the cancer diagnosis notification. Conclusions: Our model demonstrates the ability to quantify patient dissatisfaction and identify significant emotional changes during the disease course. This approach may also help detect issues in routine medical practice. %M 37707946 %R 10.2196/48534 %U https://formative.jmir.org/2023/1/e48534 %U https://doi.org/10.2196/48534 %U http://www.ncbi.nlm.nih.gov/pubmed/37707946 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 6 %N %P e51776 %T Shaping the Future of Older Adult Care: ChatGPT, Advanced AI, and the Transformation of Clinical Practice %A Fear,Kathleen %A Gleber,Conrad %+ UR Health Lab, University of Rochester Medical Center, 30 Corporate Woods, Suite 180, Rochester, NY, 14623, United States, 1 585 341 4954, kathleen_fear@urmc.rochester.edu %K generative AI %K artificial intelligence %K large language models %K ChatGPT %K Generative Pre-trained Transformer %D 2023 %7 13.9.2023 %9 Guest Editorial %J JMIR Aging %G English %X As the older adult population in the United States grows, new approaches to managing and streamlining clinical work are needed to accommodate their increased demand for health care. Deep learning and generative artificial intelligence (AI) have the potential to transform how care is delivered and how clinicians practice in geriatrics. In this editorial, we explore the opportunities and limitations of these technologies. %M 37703085 %R 10.2196/51776 %U https://aging.jmir.org/2023/1/e51776 %U https://doi.org/10.2196/51776 %U http://www.ncbi.nlm.nih.gov/pubmed/37703085 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e48628 %T Determinants of Intravenous Infusion Longevity and Infusion Failure via a Nonlinear Model Analysis of Smart Pump Event Logs: Retrospective Study %A Kia,Arash %A Waterson,James %A Bargary,Norma %A Rolt,Stuart %A Burke,Kevin %A Robertson,Jeremy %A Garcia,Samuel %A Benavoli,Alessio %A Bergström,David %+ Medical Affairs, Medication Management Solutions, Becton Dickinson, 11F Blue Bay Tower, Business Bay, Dubai, 25229, United Arab Emirates, 971 566035154, james.waterson@bd.com %K intravenous infusion %K vascular access device %K alarm fatigue %K intensive care units %K intensive care %K neonatal %K predictive model %K smart pump %K smart device %K health device %K infusion %K intravenous %K nonlinear model %K medical device %K therapy %K prediction model %K artificial intelligence %K AI %K machine learning %K predict %K predictive %K prediction %K log data %K event log %D 2023 %7 13.9.2023 %9 Original Paper %J JMIR AI %G English %X Background: Infusion failure may have severe consequences for patients receiving critical, short–half-life infusions. Continued interruptions to infusions can lead to subtherapeutic therapy. Objective: This study aims to identify and rank determinants of the longevity of continuous infusions administered through syringe drivers, using nonlinear predictive models. Additionally, this study aims to evaluate key factors influencing infusion longevity and develop and test a model for predicting the likelihood of achieving successful infusion longevity. Methods: Data were extracted from the event logs of smart pumps containing information on care profiles, medication types and concentrations, occlusion alarm settings, and the final infusion cessation cause. These data were then used to fit 5 nonlinear models and evaluate the best explanatory model. Results: Random forest was the best-fit predictor, with an F1-score of 80.42, compared to 5 other models (mean F1-score 75.06; range 67.48-79.63). When applied to infusion data in an individual syringe driver data set, the predictor model found that the final medication concentration and medication type were of less significance to infusion longevity compared to the rate and care unit. For low-rate infusions, rates ranging from 2 to 2.8 mL/hr performed best for achieving a balance between infusion longevity and fluid load per infusion, with an occlusion versus no-occlusion ratio of 0.553. Rates between 0.8 and 1.2 mL/hr exhibited the poorest performance with a ratio of 1.604. Higher rates, up to 4 mL/hr, performed better in terms of occlusion versus no-occlusion ratios. Conclusions: This study provides clinicians with insights into the specific types of infusion that warrant more intense observation or proactive management of intravenous access; additionally, it can offer valuable information regarding the average duration of uninterrupted infusions that can be expected in these care areas. Optimizing rate settings to improve infusion longevity for continuous infusions, achieved through compounding to create customized concentrations for individual patients, may be possible in light of the study’s outcomes. The study also highlights the potential of machine learning nonlinear models in predicting outcomes and life spans of specific therapies delivered via medical devices. %M 38875535 %R 10.2196/48628 %U https://ai.jmir.org/2023/1/e48628 %U https://doi.org/10.2196/48628 %U http://www.ncbi.nlm.nih.gov/pubmed/38875535 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e46891 %T Predicting the 5-Year Risk of Nonalcoholic Fatty Liver Disease Using Machine Learning Models: Prospective Cohort Study %A Huang,Guoqing %A Jin,Qiankai %A Mao,Yushan %+ Department of Endocrinology, The First Affiliated Hospital of Ningbo University, 247 Renmin Road, Ningbo, 315000, China, 86 13867878937, maoyushan@nbu.edu.cn %K nonalcoholic fatty liver disease %K machine learning %K independent risk factors %K prediction model %K model %K fatty liver %K prevention %K liver %K prognostic %K China %K development %K validation %K risk model %K clinical applicability %D 2023 %7 12.9.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Nonalcoholic fatty liver disease (NAFLD) has emerged as a worldwide public health issue. Identifying and targeting populations at a heightened risk of developing NAFLD over a 5-year period can help reduce and delay adverse hepatic prognostic events. Objective: This study aimed to investigate the 5-year incidence of NAFLD in the Chinese population. It also aimed to establish and validate a machine learning model for predicting the 5-year NAFLD risk. Methods: The study population was derived from a 5-year prospective cohort study. A total of 6196 individuals without NAFLD who underwent health checkups in 2010 at Zhenhai Lianhua Hospital in Ningbo, China, were enrolled in this study. Extreme gradient boosting (XGBoost)–recursive feature elimination, combined with the least absolute shrinkage and selection operator (LASSO), was used to screen for characteristic predictors. A total of 6 machine learning models, namely logistic regression, decision tree, support vector machine, random forest, categorical boosting, and XGBoost, were utilized in the construction of a 5-year risk model for NAFLD. Hyperparameter optimization of the predictive model was performed in the training set, and a further evaluation of the model performance was carried out in the internal and external validation sets. Results: The 5-year incidence of NAFLD was 18.64% (n=1155) in the study population. We screened 11 predictors for risk prediction model construction. After the hyperparameter optimization, CatBoost demonstrated the best prediction performance in the training set, with an area under the receiver operating characteristic (AUROC) curve of 0.810 (95% CI 0.768-0.852). Logistic regression showed the best prediction performance in the internal and external validation sets, with AUROC curves of 0.778 (95% CI 0.759-0.794) and 0.806 (95% CI 0.788-0.821), respectively. The development of web-based calculators has enhanced the clinical feasibility of the risk prediction model. Conclusions: Developing and validating machine learning models can aid in predicting which populations are at the highest risk of developing NAFLD over a 5-year period, thereby helping delay and reduce the occurrence of adverse liver prognostic events. %M 37698911 %R 10.2196/46891 %U https://www.jmir.org/2023/1/e46891 %U https://doi.org/10.2196/46891 %U http://www.ncbi.nlm.nih.gov/pubmed/37698911 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e44897 %T Construction of an Emotional Lexicon of Patients With Breast Cancer: Development and Sentiment Analysis %A Li,Chaixiu %A Fu,Jiaqi %A Lai,Jie %A Sun,Lijun %A Zhou,Chunlan %A Li,Wenji %A Jian,Biao %A Deng,Shisi %A Zhang,Yujie %A Guo,Zihan %A Liu,Yusheng %A Zhou,Yanni %A Xie,Shihui %A Hou,Mingyue %A Wang,Ru %A Chen,Qinjie %A Wu,Yanni %+ Nanfang Hospital, Southern Medical University, No 1838 Guangzhou Avenue North, Baiyun District, Guangdong Province, Guangzhou, 510515, China, 86 020 61641192, yanniwuSMU@126.com %K breast cancer %K lexicon construction %K domain emotional lexicon %K sentiment analysis %K natural language processing %D 2023 %7 12.9.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: The innovative method of sentiment analysis based on an emotional lexicon shows prominent advantages in capturing emotional information, such as individual attitudes, experiences, and needs, which provides a new perspective and method for emotion recognition and management for patients with breast cancer (BC). However, at present, sentiment analysis in the field of BC is limited, and there is no emotional lexicon for this field. Therefore, it is necessary to construct an emotional lexicon that conforms to the characteristics of patients with BC so as to provide a new tool for accurate identification and analysis of the patients’ emotions and a new method for their personalized emotion management. Objective: This study aimed to construct an emotional lexicon of patients with BC. Methods: Emotional words were obtained by merging the words in 2 general sentiment lexicons, the Chinese Linguistic Inquiry and Word Count (C-LIWC) and HowNet, and the words in text corpora acquired from patients with BC via Weibo, semistructured interviews, and expressive writing. The lexicon was constructed using manual annotation and classification under the guidance of Russell’s valence-arousal space. Ekman’s basic emotional categories, Lazarus’ cognitive appraisal theory of emotion, and a qualitative text analysis based on the text corpora of patients with BC were combined to determine the fine-grained emotional categories of the lexicon we constructed. Precision, recall, and the F1-score were used to evaluate the lexicon’s performance. Results: The text corpora collected from patients in different stages of BC included 150 written materials, 17 interviews, and 6689 original posts and comments from Weibo, with a total of 1,923,593 Chinese characters. The emotional lexicon of patients with BC contained 9357 words and covered 8 fine-grained emotional categories: joy, anger, sadness, fear, disgust, surprise, somatic symptoms, and BC terminology. Experimental results showed that precision, recall, and the F1-score of positive emotional words were 98.42%, 99.73%, and 99.07%, respectively, and those of negative emotional words were 99.73%, 98.38%, and 99.05%, respectively, which all significantly outperformed the C-LIWC and HowNet. Conclusions: The emotional lexicon with fine-grained emotional categories conforms to the characteristics of patients with BC. Its performance related to identifying and classifying domain-specific emotional words in BC is better compared to the C-LIWC and HowNet. This lexicon not only provides a new tool for sentiment analysis in the field of BC but also provides a new perspective for recognizing the specific emotional state and needs of patients with BC and formulating tailored emotional management plans. %M 37698914 %R 10.2196/44897 %U https://www.jmir.org/2023/1/e44897 %U https://doi.org/10.2196/44897 %U http://www.ncbi.nlm.nih.gov/pubmed/37698914 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e49989 %T Navigating the AI Revolution: The Case for Precise Regulation in Health Care %A Reddy,Sandeep %+ Deakin School of Medicine, 75 Pigdons Road, Waurn Ponds, Geelong, VIC-3216, Australia, 61 487194924, sandeep.reddy@deakin.edu.au %K artificial intelligence %K AI %K health care %K regulation %K precise regulation %K patient safety %K AI ethics %D 2023 %7 11.9.2023 %9 Viewpoint %J J Med Internet Res %G English %X Health care is undergoing a profound transformation through the integration of artificial intelligence (AI). However, the rapid integration and expansive growth of AI within health care systems present ethical and legal challenges that warrant careful consideration. In this viewpoint, the author argues that the health care domain, due to its complexity, requires specialized approaches to regulating AI. Precise regulation can provide clear guidelines for addressing these challenges, thereby ensuring ethical and legal AI implementations. %M 37695650 %R 10.2196/49989 %U https://www.jmir.org/2023/1/e49989 %U https://doi.org/10.2196/49989 %U http://www.ncbi.nlm.nih.gov/pubmed/37695650 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e49240 %T Clinical Accuracy of Large Language Models and Google Search Responses to Postpartum Depression Questions: Cross-Sectional Study %A Sezgin,Emre %A Chekeni,Faraaz %A Lee,Jennifer %A Keim,Sarah %+ Nationwide Children's Hospital, 700 Children's Dr, Columbus, OH, 43205, United States, 1 614 722 3179, emre.sezgin@nationwidechildrens.org %K mental health %K postpartum depression %K health information seeking %K large language model %K GPT %K LaMDA %K Google %K ChatGPT %K artificial intelligence %K natural language processing %K generative AI %K depression %K cross-sectional study %K clinical accuracy %D 2023 %7 11.9.2023 %9 Research Letter %J J Med Internet Res %G English %X %M 37695668 %R 10.2196/49240 %U https://www.jmir.org/2023/1/e49240 %U https://doi.org/10.2196/49240 %U http://www.ncbi.nlm.nih.gov/pubmed/37695668 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e44909 %T Machine Learning for the Prediction of Procedural Case Durations Developed Using a Large Multicenter Database: Algorithm Development and Validation Study %A Kendale,Samir %A Bishara,Andrew %A Burns,Michael %A Solomon,Stuart %A Corriere,Matthew %A Mathis,Michael %+ Department of Anesthesia, Critical Care & Pain Medicine, Beth Israel Deaconess Medical Center, 1 Deaconess Road, Boston, MA, 02215, United States, 1 6177545400, skendale@bidmc.harvard.edu %K medical informatics %K artificial intelligence %K AI %K machine learning %K operating room %K OR management %K perioperative %K algorithm development %K validation %K patient communication %K surgical procedure %K prediction model %D 2023 %7 8.9.2023 %9 Original Paper %J JMIR AI %G English %X Background: Accurate projections of procedural case durations are complex but critical to the planning of perioperative staffing, operating room resources, and patient communication. Nonlinear prediction models using machine learning methods may provide opportunities for hospitals to improve upon current estimates of procedure duration. Objective: The aim of this study was to determine whether a machine learning algorithm scalable across multiple centers could make estimations of case duration within a tolerance limit because there are substantial resources required for operating room functioning that relate to case duration. Methods: Deep learning, gradient boosting, and ensemble machine learning models were generated using perioperative data available at 3 distinct time points: the time of scheduling, the time of patient arrival to the operating or procedure room (primary model), and the time of surgical incision or procedure start. The primary outcome was procedure duration, defined by the time between the arrival and the departure of the patient from the procedure room. Model performance was assessed by mean absolute error (MAE), the proportion of predictions falling within 20% of the actual duration, and other standard metrics. Performance was compared with a baseline method of historical means within a linear regression model. Model features driving predictions were assessed using Shapley additive explanations values and permutation feature importance. Results: A total of 1,177,893 procedures from 13 academic and private hospitals between 2016 and 2019 were used. Across all procedures, the median procedure duration was 94 (IQR 50-167) minutes. In estimating the procedure duration, the gradient boosting machine was the best-performing model, demonstrating an MAE of 34 (SD 47) minutes, with 46% of the predictions falling within 20% of the actual duration in the test data set. This represented a statistically and clinically significant improvement in predictions compared with a baseline linear regression model (MAE 43 min; P<.001; 39% of the predictions falling within 20% of the actual duration). The most important features in model training were historical procedure duration by surgeon, the word “free” within the procedure text, and the time of day. Conclusions: Nonlinear models using machine learning techniques may be used to generate high-performing, automatable, explainable, and scalable prediction models for procedure duration. %M 38875567 %R 10.2196/44909 %U https://ai.jmir.org/2023/1/e44909 %U https://doi.org/10.2196/44909 %U http://www.ncbi.nlm.nih.gov/pubmed/38875567 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 9 %N %P e47095 %T Combinatorial Use of Machine Learning and Logistic Regression for Predicting Carotid Plaque Risk Among 5.4 Million Adults With Fatty Liver Disease Receiving Health Check-Ups: Population-Based Cross-Sectional Study %A Deng,Yuhan %A Ma,Yuan %A Fu,Jingzhu %A Wang,Xiaona %A Yu,Canqing %A Lv,Jun %A Man,Sailimai %A Wang,Bo %A Li,Liming %+ Meinian Institute of Health, 13 Floor, Health Work, Huayuan Road, Haidian District, Beijing, 100083, China, 86 010 82097560, paul@meinianresearch.com %K machine learning %K carotid plaque %K health check-up %K prediction %K fatty liver %K risk assessment %K risk stratification %K cardiovascular %K logistic regression %D 2023 %7 7.9.2023 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Carotid plaque can progress into stroke, myocardial infarction, etc, which are major global causes of death. Evidence shows a significant increase in carotid plaque incidence among patients with fatty liver disease. However, unlike the high detection rate of fatty liver disease, screening for carotid plaque in the asymptomatic population is not yet prevalent due to cost-effectiveness reasons, resulting in a large number of patients with undetected carotid plaques, especially among those with fatty liver disease. Objective: This study aimed to combine the advantages of machine learning (ML) and logistic regression to develop a straightforward prediction model among the population with fatty liver disease to identify individuals at risk of carotid plaque. Methods: Our study included 5,420,640 participants with fatty liver from Meinian Health Care Center. We used random forest, elastic net (EN), and extreme gradient boosting ML algorithms to select important features from potential predictors. Features acknowledged by all 3 models were enrolled in logistic regression analysis to develop a carotid plaque prediction model. Model performance was evaluated based on the area under the receiver operating characteristic curve, calibration curve, Brier score, and decision curve analysis both in a randomly split internal validation data set, and an external validation data set comprising 32,682 participants from MJ Health Check-up Center. Risk cutoff points for carotid plaque were determined based on the Youden index, predicted probability distribution, and prevalence rate of the internal validation data set to classify participants into high-, intermediate-, and low-risk groups. This risk classification was further validated in the external validation data set. Results: Among the participants, 26.23% (1,421,970/5,420,640) were diagnosed with carotid plaque in the development data set, and 21.64% (7074/32,682) were diagnosed in the external validation data set. A total of 6 features, including age, systolic blood pressure, low-density lipoprotein cholesterol (LDL-C), total cholesterol, fasting blood glucose, and hepatic steatosis index (HSI) were collectively selected by all 3 ML models out of 27 predictors. After eliminating the issue of collinearity between features, the logistic regression model established with the 5 independent predictors reached an area under the curve of 0.831 in the internal validation data set and 0.801 in the external validation data set, and showed good calibration capability graphically. Its predictive performance was comprehensively competitive compared with the single use of either logistic regression or ML algorithms. Optimal predicted probability cutoff points of 25% and 65% were determined for classifying individuals into low-, intermediate-, and high-risk categories for carotid plaque. Conclusions: The combination of ML and logistic regression yielded a practical carotid plaque prediction model, and was of great public health implications in the early identification and risk assessment of carotid plaque among individuals with fatty liver. %M 37676713 %R 10.2196/47095 %U https://publichealth.jmir.org/2023/1/e47095 %U https://doi.org/10.2196/47095 %U http://www.ncbi.nlm.nih.gov/pubmed/37676713 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e42047 %T An Explainable Artificial Intelligence Software Tool for Weight Management Experts (PRIMO): Mixed Methods Study %A Fernandes,Glenn J %A Choi,Arthur %A Schauer,Jacob Michael %A Pfammatter,Angela F %A Spring,Bonnie J %A Darwiche,Adnan %A Alshurafa,Nabil I %+ Department of Computer Science, Northwestern University, 633 Clark St, Evanston, IL, 60208, United States, 1 847 491 3500, glennfer@u.northwestern.edu %K explainable artificial intelligence %K explainable AI %K machine learning %K ML %K interpretable ML %K random forest %K decision-making %K weight loss prediction %K mobile phone %D 2023 %7 6.9.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Predicting the likelihood of success of weight loss interventions using machine learning (ML) models may enhance intervention effectiveness by enabling timely and dynamic modification of intervention components for nonresponders to treatment. However, a lack of understanding and trust in these ML models impacts adoption among weight management experts. Recent advances in the field of explainable artificial intelligence enable the interpretation of ML models, yet it is unknown whether they enhance model understanding, trust, and adoption among weight management experts. Objective: This study aimed to build and evaluate an ML model that can predict 6-month weight loss success (ie, ≥7% weight loss) from 5 engagement and diet-related features collected over the initial 2 weeks of an intervention, to assess whether providing ML-based explanations increases weight management experts’ agreement with ML model predictions, and to inform factors that influence the understanding and trust of ML models to advance explainability in early prediction of weight loss among weight management experts. Methods: We trained an ML model using the random forest (RF) algorithm and data from a 6-month weight loss intervention (N=419). We leveraged findings from existing explainability metrics to develop Prime Implicant Maintenance of Outcome (PRIMO), an interactive tool to understand predictions made by the RF model. We asked 14 weight management experts to predict hypothetical participants’ weight loss success before and after using PRIMO. We compared PRIMO with 2 other explainability methods, one based on feature ranking and the other based on conditional probability. We used generalized linear mixed-effects models to evaluate participants’ agreement with ML predictions and conducted likelihood ratio tests to examine the relationship between explainability methods and outcomes for nested models. We conducted guided interviews and thematic analysis to study the impact of our tool on experts’ understanding and trust in the model. Results: Our RF model had 81% accuracy in the early prediction of weight loss success. Weight management experts were significantly more likely to agree with the model when using PRIMO (χ2=7.9; P=.02) compared with the other 2 methods with odds ratios of 2.52 (95% CI 0.91-7.69) and 3.95 (95% CI 1.50-11.76). From our study, we inferred that our software not only influenced experts’ understanding and trust but also impacted decision-making. Several themes were identified through interviews: preference for multiple explanation types, need to visualize uncertainty in explanations provided by PRIMO, and need for model performance metrics on similar participant test instances. Conclusions: Our results show the potential for weight management experts to agree with the ML-based early prediction of success in weight loss treatment programs, enabling timely and dynamic modification of intervention components to enhance intervention effectiveness. Our findings provide methods for advancing the understandability and trust of ML models among weight management experts. %M 37672333 %R 10.2196/42047 %U https://www.jmir.org/2023/1/e42047 %U https://doi.org/10.2196/42047 %U http://www.ncbi.nlm.nih.gov/pubmed/37672333 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 11 %N %P e48693 %T Applications of Natural Language Processing for the Management of Stroke Disorders: Scoping Review %A De Rosario,Helios %A Pitarch-Corresa,Salvador %A Pedrosa,Ignacio %A Vidal-Pedrós,Marina %A de Otto-López,Beatriz %A García-Mieres,Helena %A Álvarez-Rodríguez,Lydia %+ Instituto de Biomecánica de Valencia, Universitat Politècnica de València, Camino de Vera s/n, Ed. 9C, Valencia, 46022, Spain, 34 961111170, helios.derosario@ibv.org %K stroke %K natural language processing %K artificial intelligence %K scoping review %K scoping %K review methods %K review methodology %K NLP %K cardiovascular %K machine learning %K deep learning %D 2023 %7 6.9.2023 %9 Review %J JMIR Med Inform %G English %X Background: Recent advances in natural language processing (NLP) have heightened the interest of the medical community in its application to health care in general, in particular to stroke, a medical emergency of great impact. In this rapidly evolving context, it is necessary to learn and understand the experience already accumulated by the medical and scientific community. Objective: The aim of this scoping review was to explore the studies conducted in the last 10 years using NLP to assist the management of stroke emergencies so as to gain insight on the state of the art, its main contexts of application, and the software tools that are used. Methods: Data were extracted from Scopus and Medline through PubMed, using the keywords “natural language processing” and “stroke.” Primary research questions were related to the phases, contexts, and types of textual data used in the studies. Secondary research questions were related to the numerical and statistical methods and the software used to process the data. The extracted data were structured in tables and their relative frequencies were calculated. The relationships between categories were analyzed through multiple correspondence analysis. Results: Twenty-nine papers were included in the review, with the majority being cohort studies of ischemic stroke published in the last 2 years. The majority of papers focused on the use of NLP to assist in the diagnostic phase, followed by the outcome prognosis, using text data from diagnostic reports and in many cases annotations on medical images. The most frequent approach was based on general machine learning techniques applied to the results of relatively simple NLP methods with the support of ontologies and standard vocabularies. Although smaller in number, there has been an increasing body of studies using deep learning techniques on numerical and vectorized representations of the texts obtained with more sophisticated NLP tools. Conclusions: Studies focused on NLP applied to stroke show specific trends that can be compared to the more general application of artificial intelligence to stroke. The purpose of using NLP is often to improve processes in a clinical context rather than to assist in the rehabilitation process. The state of the art in NLP is represented by deep learning architectures, among which Bidirectional Encoder Representations from Transformers has been found to be especially widely used in the medical field in general, and for stroke in particular, with an increasing focus on the processing of annotations on medical images. %M 37672328 %R 10.2196/48693 %U https://medinform.jmir.org/2023/1/e48693 %U https://doi.org/10.2196/48693 %U http://www.ncbi.nlm.nih.gov/pubmed/37672328 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e48254 %T Assessing Health Students' Attitudes and Usage of ChatGPT in Jordan: Validation Study %A Sallam,Malik %A Salim,Nesreen A %A Barakat,Muna %A Al-Mahzoum,Kholoud %A Al-Tammemi,Ala'a B %A Malaeb,Diana %A Hallit,Rabih %A Hallit,Souheil %+ Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Queen Rania Al-Abdullah Street-Aljubeiha, Amman, 11942, Jordan, 962 0791845186, malik.sallam@ju.edu.jo %K artificial intelligence %K machine learning %K education %K technology %K healthcare %K survey %K opinion %K knowledge %K practices %K KAP %D 2023 %7 5.9.2023 %9 Original Paper %J JMIR Med Educ %G English %X Background: ChatGPT is a conversational large language model that has the potential to revolutionize knowledge acquisition. However, the impact of this technology on the quality of education is still unknown considering the risks and concerns surrounding ChatGPT use. Therefore, it is necessary to assess the usability and acceptability of this promising tool. As an innovative technology, the intention to use ChatGPT can be studied in the context of the technology acceptance model (TAM). Objective: This study aimed to develop and validate a TAM-based survey instrument called TAME-ChatGPT (Technology Acceptance Model Edited to Assess ChatGPT Adoption) that could be employed to examine the successful integration and use of ChatGPT in health care education. Methods: The survey tool was created based on the TAM framework. It comprised 13 items for participants who heard of ChatGPT but did not use it and 23 items for participants who used ChatGPT. Using a convenient sampling approach, the survey link was circulated electronically among university students between February and March 2023. Exploratory factor analysis (EFA) was used to assess the construct validity of the survey instrument. Results: The final sample comprised 458 respondents, the majority among them undergraduate students (n=442, 96.5%). Only 109 (23.8%) respondents had heard of ChatGPT prior to participation and only 55 (11.3%) self-reported ChatGPT use before the study. EFA analysis on the attitude and usage scales showed significant Bartlett tests of sphericity scores (P<.001) and adequate Kaiser-Meyer-Olkin measures (0.823 for the attitude scale and 0.702 for the usage scale), confirming the factorability of the correlation matrices. The EFA showed that 3 constructs explained a cumulative total of 69.3% variance in the attitude scale, and these subscales represented perceived risks, attitude to technology/social influence, and anxiety. For the ChatGPT usage scale, EFA showed that 4 constructs explained a cumulative total of 72% variance in the data and comprised the perceived usefulness, perceived risks, perceived ease of use, and behavior/cognitive factors. All the ChatGPT attitude and usage subscales showed good reliability with Cronbach α values >.78 for all the deduced subscales. Conclusions: The TAME-ChatGPT demonstrated good reliability, validity, and usefulness in assessing health care students’ attitudes toward ChatGPT. The findings highlighted the importance of considering risk perceptions, usefulness, ease of use, attitudes toward technology, and behavioral factors when adopting ChatGPT as a tool in health care education. This information can aid the stakeholders in creating strategies to support the optimal and ethical use of ChatGPT and to identify the potential challenges hindering its successful implementation. Future research is recommended to guide the effective adoption of ChatGPT in health care education. %M 37578934 %R 10.2196/48254 %U https://mededu.jmir.org/2023/1/e48254 %U https://doi.org/10.2196/48254 %U http://www.ncbi.nlm.nih.gov/pubmed/37578934 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 9 %N %P e45547 %T Data-Efficient Computational Pathology Platform for Faster and Cheaper Breast Cancer Subtype Identifications: Development of a Deep Learning Model %A Bae,Kideog %A Jeon,Young Seok %A Hwangbo,Yul %A Yoo,Chong Woo %A Han,Nayoung %A Feng,Mengling %+ Saw Swee Hock School of Public Health, National University of Singapore, 12 Science Drive 2, Tahir foundation MD1 #09-01, Singapore, 117549, Singapore, 65 65164984, ephfm@nus.edu.sg %K deep learning %K self-supervised learning %K immunohistochemical staining %K machine learning %K histology %K pathology %K computation %K predict %K diagnosis %K diagnose %K carcinoma %K cancer %K oncology %K breast cancer %D 2023 %7 5.9.2023 %9 Original Paper %J JMIR Cancer %G English %X Background: Breast cancer subtyping is a crucial step in determining therapeutic options, but the molecular examination based on immunohistochemical staining is expensive and time-consuming. Deep learning opens up the possibility to predict the subtypes based on the morphological information from hematoxylin and eosin staining, a much cheaper and faster alternative. However, training the predictive model conventionally requires a large number of histology images, which is challenging to collect by a single institute. Objective: We aimed to develop a data-efficient computational pathology platform, 3DHistoNet, which is capable of learning from z-stacked histology images to accurately predict breast cancer subtypes with a small sample size. Methods: We retrospectively examined 401 cases of patients with primary breast carcinoma diagnosed between 2018 and 2020 at the Department of Pathology, National Cancer Center, South Korea. Pathology slides of the patients with breast carcinoma were prepared according to the standard protocols. Age, gender, histologic grade, hormone receptor (estrogen receptor [ER], progesterone receptor [PR], and androgen receptor [AR]) status, erb-B2 receptor tyrosine kinase 2 (HER2) status, and Ki-67 index were evaluated by reviewing medical charts and pathological records. Results: The area under the receiver operating characteristic curve and decision curve were analyzed to evaluate the performance of our 3DHistoNet platform for predicting the ER, PR, AR, HER2, and Ki67 subtype biomarkers with 5-fold cross-validation. We demonstrated that 3DHistoNet can predict all clinically important biomarkers (ER, PR, AR, HER2, and Ki67) with performance exceeding the conventional multiple instance learning models by a considerable margin (area under the receiver operating characteristic curve: 0.75-0.91 vs 0.67-0.8). We further showed that our z-stack histology scanning method can make up for insufficient training data sets without any additional cost incurred. Finally, 3DHistoNet offered an additional capability to generate attention maps that reveal correlations between Ki67 and histomorphological features, which renders the hematoxylin and eosin image in higher fidelity to the pathologist. Conclusions: Our stand-alone, data-efficient pathology platform that can both generate z-stacked images and predict key biomarkers is an appealing tool for breast cancer diagnosis. Its development would encourage morphology-based diagnosis, which is faster, cheaper, and less error-prone compared to the protein quantification method based on immunohistochemical staining. %M 37669090 %R 10.2196/45547 %U https://cancer.jmir.org/2023/1/e45547 %U https://doi.org/10.2196/45547 %U http://www.ncbi.nlm.nih.gov/pubmed/37669090 %0 Journal Article %@ 1947-2579 %I JMIR Publications %V 15 %N %P e50934 %T Framework for Classifying Explainable Artificial Intelligence (XAI) Algorithms in Clinical Medicine %A Gniadek,Thomas %A Kang,Jason %A Theparee,Talent %A Krive,Jacob %+ Department of Biomedical and Health Information Sciences, University of Illinois at Chicago, 1919 W Taylor St 233 AHSB, MC-530, Chicago, IL, 60612, United States, 1 312 996 1445, krive@uic.edu %K explainable artificial intelligence %K XAI %K artificial intelligence %K AI %K AI medicine %K pathology informatics %K radiology informatics %D 2023 %7 1.9.2023 %9 Viewpoint %J Online J Public Health Inform %G English %X Artificial intelligence (AI) applied to medicine offers immense promise, in addition to safety and regulatory concerns. Traditional AI produces a core algorithm result, typically without a measure of statistical confidence or an explanation of its biological-theoretical basis. Efforts are underway to develop explainable AI (XAI) algorithms that not only produce a result but also an explanation to support that result. Here we present a framework for classifying XAI algorithms applied to clinical medicine: An algorithm’s clinical scope is defined by whether the core algorithm output leads to observations (eg, tests, imaging, clinical evaluation), interventions (eg, procedures, medications), diagnoses, and prognostication. Explanations are classified by whether they provide empiric statistical information, association with a historical population or populations, or association with an established disease mechanism or mechanisms. XAI implementations can be classified based on whether algorithm training and validation took into account the actions of health care providers in response to the insights and explanations provided or whether training was performed using only the core algorithm output as the end point. Finally, communication modalities used to convey an XAI explanation can be used to classify algorithms and may affect clinical outcomes. This framework can be used when designing, evaluating, and comparing XAI algorithms applied to medicine. %M 38046562 %R 10.2196/50934 %U https://ojphi.jmir.org/2023/1/e50934 %U https://doi.org/10.2196/50934 %U http://www.ncbi.nlm.nih.gov/pubmed/38046562 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e50844 %T AI Is Changing the Landscape of Academic Writing: What Can Be Done? Authors’ Reply to: AI Increases the Pressure to Overhaul the Scientific Peer Review Process. Comment on “Artificial Intelligence Can Generate Fraudulent but Authentic-Looking Scientific Medical Articles: Pandora’s Box Has Been Opened” %A Májovský,Martin %A Mikolov,Tomas %A Netuka,David %+ Department of Neurosurgery and Neurooncology, First Faculty of Medicine, Charles University, U Vojenské nemocnice 1200, Prague, 16000, Czech Republic, 420 973202963, majovmar@uvn.cz %K artificial intelligence %K AI %K publications %K ethics %K neurosurgery %K ChatGPT %K Chat Generative Pre-trained Transformer %K language models %K fraudulent medical articles %D 2023 %7 31.8.2023 %9 Letter to the Editor %J J Med Internet Res %G English %X %M 37651175 %R 10.2196/50844 %U https://www.jmir.org/2023/1/e50844 %U https://doi.org/10.2196/50844 %U http://www.ncbi.nlm.nih.gov/pubmed/37651175 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e50591 %T AI Increases the Pressure to Overhaul the Scientific Peer Review Process. Comment on “Artificial Intelligence Can Generate Fraudulent but Authentic-Looking Scientific Medical Articles: Pandora’s Box Has Been Opened” %A Liu,Nicholas %A Brown,Amy %+ John A Burns School of Medicine, University of Hawai'i at Mānoa, 651 Ilalo St, Honolulu, HI, 96813, United States, 1 808 692 1000, nliu6@hawaii.edu %K artificial intelligence %K AI %K publications %K ethics %K neurosurgery %K ChatGPT %K Chat Generative Pre-trained Transformer %K language models %K fraudulent medical articles %D 2023 %7 31.8.2023 %9 Letter to the Editor %J J Med Internet Res %G English %X %M 37651167 %R 10.2196/50591 %U https://www.jmir.org/2023/1/e50591 %U https://doi.org/10.2196/50591 %U http://www.ncbi.nlm.nih.gov/pubmed/37651167 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e51584 %T Best Practices for Using AI Tools as an Author, Peer Reviewer, or Editor %A Leung,Tiffany I %A de Azevedo Cardoso,Taiane %A Mavragani,Amaryllis %A Eysenbach,Gunther %+ JMIR Publications, Inc, 130 Queens Quay East, Unit 1100, Toronto, ON, M5A 0P6, Canada, 1 416 583 2040, tiffany.leung@jmir.org %K publishing %K open access publishing %K open science %K publication policy %K science editing %K scholarly publishing %K scientific publishing %K research %K scientific research %K editorial %K artificial intelligence %K AI %D 2023 %7 31.8.2023 %9 Editorial %J J Med Internet Res %G English %X The ethics of generative artificial intelligence (AI) use in scientific manuscript content creation has become a serious matter of concern in the scientific publishing community. Generative AI has computationally become capable of elaborating research questions; refining programming code; generating text in scientific language; and generating images, graphics, or figures. However, this technology should be used with caution. In this editorial, we outline the current state of editorial policies on generative AI or chatbot use in authorship, peer review, and editorial processing of scientific and scholarly manuscripts. Additionally, we provide JMIR Publications’ editorial policies on these issues. We further detail JMIR Publications’ approach to the applications of AI in the editorial process for manuscripts in review in a JMIR Publications journal. %M 37651164 %R 10.2196/51584 %U https://www.jmir.org/2023/1/e51584 %U https://doi.org/10.2196/51584 %U http://www.ncbi.nlm.nih.gov/pubmed/37651164 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e48763 %T Consolidated Reporting Guidelines for Prognostic and Diagnostic Machine Learning Modeling Studies: Development and Validation %A Klement,William %A El Emam,Khaled %+ University of Ottawa, 401 Smyth Road, Ottawa, ON, K1H 8L1, Canada, 1 6137377600, kelemam@ehealthinformation.ca %K machine learning %K prognostic models %K prediction models %K reporting guidelines %K reproducibility guidelines %K diagnostic %K prognostic %K model evaluation %K model training %D 2023 %7 31.8.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: The reporting of machine learning (ML) prognostic and diagnostic modeling studies is often inadequate, making it difficult to understand and replicate such studies. To address this issue, multiple consensus and expert reporting guidelines for ML studies have been published. However, these guidelines cover different parts of the analytics lifecycle, and individually, none of them provide a complete set of reporting requirements. Objective: We aimed to consolidate the ML reporting guidelines and checklists in the literature to provide reporting items for prognostic and diagnostic ML in in-silico and shadow mode studies. Methods: We conducted a literature search that identified 192 unique peer-reviewed English articles that provide guidance and checklists for reporting ML studies. The articles were screened by their title and abstract against a set of 9 inclusion and exclusion criteria. Articles that were filtered through had their quality evaluated by 2 raters using a 9-point checklist constructed from guideline development good practices. The average κ was 0.71 across all quality criteria. The resulting 17 high-quality source papers were defined as having a quality score equal to or higher than the median. The reporting items in these 17 articles were consolidated and screened against a set of 6 inclusion and exclusion criteria. The resulting reporting items were sent to an external group of 11 ML experts for review and updated accordingly. The updated checklist was used to assess the reporting in 6 recent modeling papers in JMIR AI. Feedback from the external review and initial validation efforts was used to improve the reporting items. Results: In total, 37 reporting items were identified and grouped into 5 categories based on the stage of the ML project: defining the study details, defining and collecting the data, modeling methodology, model evaluation, and explainability. None of the 17 source articles covered all the reporting items. The study details and data description reporting items were the most common in the source literature, with explainability and methodology guidance (ie, data preparation and model training) having the least coverage. For instance, a median of 75% of the data description reporting items appeared in each of the 17 high-quality source guidelines, but only a median of 33% of the data explainability reporting items appeared. The highest-quality source articles tended to have more items on reporting study details. Other categories of reporting items were not related to the source article quality. We converted the reporting items into a checklist to support more complete reporting. Conclusions: Our findings supported the need for a set of consolidated reporting items, given that existing high-quality guidelines and checklists do not individually provide complete coverage. The consolidated set of reporting items is expected to improve the quality and reproducibility of ML modeling studies. %M 37651179 %R 10.2196/48763 %U https://www.jmir.org/2023/1/e48763 %U https://doi.org/10.2196/48763 %U http://www.ncbi.nlm.nih.gov/pubmed/37651179 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e48123 %T Effect of Benign Biopsy Findings on an Artificial Intelligence–Based Cancer Detector in Screening Mammography: Retrospective Case-Control Study %A Zouzos,Athanasios %A Milovanovic,Aleksandra %A Dembrower,Karin %A Strand,Fredrik %+ Department of Oncology and Pathology, Karolinska Institute, Solnavagen 1, Stockholm, 171 77, Sweden, 46 729142636, athanasios.zouzos@ki.se %K artificial intelligence %K AI %K mammography %K breast cancer %K benign biopsy %K screening %K cancer screening %K diagnostic %K radiology %K detection system %D 2023 %7 31.8.2023 %9 Original Paper %J JMIR AI %G English %X Background: Artificial intelligence (AI)–based cancer detectors (CAD) for mammography are starting to be used for breast cancer screening in radiology departments. It is important to understand how AI CAD systems react to benign lesions, especially those that have been subjected to biopsy. Objective: Our goal was to corroborate the hypothesis that women with previous benign biopsy and cytology assessments would subsequently present increased AI CAD abnormality scores even though they remained healthy. Methods: This is a retrospective study applying a commercial AI CAD system (Insight MMG, version 1.1.4.3; Lunit Inc) to a cancer-enriched mammography screening data set of 10,889 women (median age 56, range 40-74 years). The AI CAD generated a continuous prediction score for tumor suspicion between 0.00 and 1.00, where 1.00 represented the highest level of suspicion. A binary read (flagged or not flagged) was defined on the basis of a predetermined cutoff threshold (0.40). The flagged median and proportion of AI scores were calculated for women who were healthy, those who had a benign biopsy finding, and those who were diagnosed with breast cancer. For women with a benign biopsy finding, the interval between mammography and the biopsy was used for stratification of AI scores. The effect of increasing age was examined using subgroup analysis and regression modeling. Results: Of a total of 10,889 women, 234 had a benign biopsy finding before or after screening. The proportions of flagged healthy women were 3.5%, 11%, and 84% for healthy women without a benign biopsy finding, those with a benign biopsy finding, and women with breast cancer, respectively (P<.001). For the 8307 women with complete information, radiologist 1, radiologist 2, and the AI CAD system flagged 8.5%, 6.8%, and 8.5% of examinations of women who had a prior benign biopsy finding. The AI score correlated only with increasing age of the women in the cancer group (P=.01). Conclusions: Compared to healthy women without a biopsy, the examined AI CAD system flagged a much larger proportion of women who had or would have a benign biopsy finding based on a radiologist’s decision. However, the flagging rate was not higher than that for radiologists. Further research should be focused on training the AI CAD system taking prior biopsy information into account. %M 38875554 %R 10.2196/48123 %U https://ai.jmir.org/2023/1/e48123 %U https://doi.org/10.2196/48123 %U http://www.ncbi.nlm.nih.gov/pubmed/38875554 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e47260 %T Artificial Intelligence–Based Consumer Health Informatics Application: Scoping Review %A Asan,Onur %A Choi,Euiji %A Wang,Xiaomei %+ School of Systems and Enterprises, Stevens Institute of Technology, 1 Castle Point Terrace, Hoboken, NJ, 07030, United States, 1 4145264330, oasan@stevens.edu %K consumer informatics %K artificial intelligence %K mobile health %K mHealth %K patient outcomes %K personalized health care %K machine learning %K digital health %K mobile phone %D 2023 %7 30.8.2023 %9 Review %J J Med Internet Res %G English %X Background: There is no doubt that the recent surge in artificial intelligence (AI) research will change the trajectory of next-generation health care, making it more approachable and accessible to patients. Therefore, it is critical to research patient perceptions and outcomes because this trend will allow patients to be the primary consumers of health technology and decision makers for their own health. Objective: This study aimed to review and analyze papers on AI-based consumer health informatics (CHI) for successful future patient-centered care. Methods: We searched for all peer-reviewed papers in PubMed published in English before July 2022. Research on an AI-based CHI tool or system that reports patient outcomes or perceptions was identified for the scoping review. Results: We identified 20 papers that met our inclusion criteria. The eligible studies were summarized and discussed with respect to the role of the AI-based CHI system, patient outcomes, and patient perceptions. The AI-based CHI systems identified included systems in mobile health (13/20, 65%), robotics (5/20, 25%), and telemedicine (2/20, 10%). All the systems aimed to provide patients with personalized health care. Patient outcomes and perceptions across various clinical disciplines were discussed, demonstrating the potential of an AI-based CHI system to benefit patients. Conclusions: This scoping review showed the trend in AI-based CHI systems and their impact on patient outcomes as well as patients’ perceptions of these systems. Future studies should also explore how clinicians and health care professionals perceive these consumer-based systems and integrate them into the overall workflow. %M 37647122 %R 10.2196/47260 %U https://www.jmir.org/2023/1/e47260 %U https://doi.org/10.2196/47260 %U http://www.ncbi.nlm.nih.gov/pubmed/37647122 %0 Journal Article %@ 2561-1011 %I JMIR Publications %V 7 %N %P e44983 %T Digital Transformation in the Diagnostics and Therapy of Cardiovascular Diseases: Comprehensive Literature Review %A Stremmel,Christopher %A Breitschwerdt,Rüdiger %+ Department of Cardiology, LMU University Hospital, Marchioninistr. 15, Munich, 81377, Germany, 49 894400712622, christopher.stremmel@med.uni-muenchen.de %K cardiovascular %K digital medicine %K telehealth %K artificial intelligence %K telemedicine %K mobile phone %K review %D 2023 %7 30.8.2023 %9 Review %J JMIR Cardio %G English %X Background: The digital transformation of our health care system has experienced a clear shift in the last few years due to political, medical, and technical innovations and reorganization. In particular, the cardiovascular field has undergone a significant change, with new broad perspectives in terms of optimized treatment strategies for patients nowadays. Objective: After a short historical introduction, this comprehensive literature review aimed to provide a detailed overview of the scientific evidence regarding digitalization in the diagnostics and therapy of cardiovascular diseases (CVDs). Methods: We performed an extensive literature search of the PubMed database and included all related articles that were published as of March 2022. Of the 3021 studies identified, 1639 (54.25%) studies were selected for a structured analysis and presentation (original articles: n=1273, 77.67%; reviews or comments: n=366, 22.33%). In addition to studies on CVDs in general, 829 studies could be assigned to a specific CVD with a diagnostic and therapeutic approach. For data presentation, all 829 publications were grouped into 6 categories of CVDs. Results: Evidence-based innovations in the cardiovascular field cover a wide medical spectrum, starting from the diagnosis of congenital heart diseases or arrhythmias and overoptimized workflows in the emergency care setting of acute myocardial infarction to telemedical care for patients having chronic diseases such as heart failure, coronary artery disease, or hypertension. The use of smartphones and wearables as well as the integration of artificial intelligence provides important tools for location-independent medical care and the prevention of adverse events. Conclusions: Digital transformation has opened up multiple new perspectives in the cardiovascular field, with rapidly expanding scientific evidence. Beyond important improvements in terms of patient care, these innovations are also capable of reducing costs for our health care system. In the next few years, digital transformation will continue to revolutionize the field of cardiovascular medicine and broaden our medical and scientific horizons. %M 37647103 %R 10.2196/44983 %U https://cardio.jmir.org/2023/1/e44983 %U https://doi.org/10.2196/44983 %U http://www.ncbi.nlm.nih.gov/pubmed/37647103 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e49283 %T An Artificial Intelligence Model for Predicting Trauma Mortality Among Emergency Department Patients in South Korea: Retrospective Cohort Study %A Lee,Seungseok %A Kang,Wu Seong %A Kim,Do Wan %A Seo,Sang Hyun %A Kim,Joongsuck %A Jeong,Soon Tak %A Yon,Dong Keon %A Lee,Jinseok %+ Department of Biomedical Engineering, Kyung Hee University, 1732 Deogyeong-daero, Giheung-gu, Yongin, 17104, Republic of Korea, 82 312012570, gonasago@khu.ac.kr %K artificial intelligence %K trauma %K mortality prediction %K international classification of disease %K emergency department %K ICD %K model %K models %K mortality %K predict %K prediction %K predictive %K emergency %K death %K traumatic %K nationwide %K national %K cohort %K retrospective %D 2023 %7 29.8.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Within the trauma system, the emergency department (ED) is the hospital’s first contact and is vital for allocating medical resources. However, there is generally limited information about patients that die in the ED. Objective: The aim of this study was to develop an artificial intelligence (AI) model to predict trauma mortality and analyze pertinent mortality factors for all patients visiting the ED. Methods: We used the Korean National Emergency Department Information System (NEDIS) data set (N=6,536,306), incorporating over 400 hospitals between 2016 and 2019. We included the International Classification of Disease 10th Revision (ICD-10) codes and chose the following input features to predict ED patient mortality: age, sex, intentionality, injury, emergent symptom, Alert/Verbal/Painful/Unresponsive (AVPU) scale, Korean Triage and Acuity Scale (KTAS), and vital signs. We compared three different feature set performances for AI input: all features (n=921), ICD-10 features (n=878), and features excluding ICD-10 codes (n=43). We devised various machine learning models with an ensemble approach via 5-fold cross-validation and compared the performance of each model with that of traditional prediction models. Lastly, we investigated explainable AI feature effects and deployed our final AI model on a public website, providing access to our mortality prediction results among patients visiting the ED. Results: Our proposed AI model with the all-feature set achieved the highest area under the receiver operating characteristic curve (AUROC) of 0.9974 (adaptive boosting [AdaBoost], AdaBoost + light gradient boosting machine [LightGBM]: Ensemble), outperforming other state-of-the-art machine learning and traditional prediction models, including extreme gradient boosting (AUROC=0.9972), LightGBM (AUROC=0.9973), ICD-based injury severity scores (AUC=0.9328 for the inclusive model and AUROC=0.9567 for the exclusive model), and KTAS (AUROC=0.9405). In addition, our proposed AI model outperformed a cutting-edge AI model designed for in-hospital mortality prediction (AUROC=0.7675) for all ED visitors. From the AI model, we also discovered that age and unresponsiveness (coma) were the top two mortality predictors among patients visiting the ED, followed by oxygen saturation, multiple rib fractures (ICD-10 code S224), painful response (stupor, semicoma), and lumbar vertebra fracture (ICD-10 code S320). Conclusions: Our proposed AI model exhibits remarkable accuracy in predicting ED mortality. Including the necessity for external validation, a large nationwide data set would provide a more accurate model and minimize overfitting. We anticipate that our AI-based risk calculator tool will substantially aid health care providers, particularly regarding triage and early diagnosis for trauma patients. %M 37642984 %R 10.2196/49283 %U https://www.jmir.org/2023/1/e49283 %U https://doi.org/10.2196/49283 %U http://www.ncbi.nlm.nih.gov/pubmed/37642984 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e47540 %T Sharing Data With Shared Benefits: Artificial Intelligence Perspective %A Tajabadi,Mohammad %A Grabenhenrich,Linus %A Ribeiro,Adèle %A Leyer,Michael %A Heider,Dominik %+ Department of Data Science in Biomedicine, Faculty of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Str. 6, Marburg, 35043, Germany, 49 6421 2821579, dominik.heider@uni-marburg.de %K federated learning %K machine learning %K medical data %K fairness %K data sharing %K artificial intelligence %K development %K artificial intelligence model %K applications %K data analysis %K diagnostic tool %K tool %D 2023 %7 29.8.2023 %9 Viewpoint %J J Med Internet Res %G English %X Artificial intelligence (AI) and data sharing go hand in hand. In order to develop powerful AI models for medical and health applications, data need to be collected and brought together over multiple centers. However, due to various reasons, including data privacy, not all data can be made publicly available or shared with other parties. Federated and swarm learning can help in these scenarios. However, in the private sector, such as between companies, the incentive is limited, as the resulting AI models would be available for all partners irrespective of their individual contribution, including the amount of data provided by each party. Here, we explore a potential solution to this challenge as a viewpoint, aiming to establish a fairer approach that encourages companies to engage in collaborative data analysis and AI modeling. Within the proposed approach, each individual participant could gain a model commensurate with their respective data contribution, ultimately leading to better diagnostic tools for all participants in a fair manner. %M 37642995 %R 10.2196/47540 %U https://www.jmir.org/2023/1/e47540 %U https://doi.org/10.2196/47540 %U http://www.ncbi.nlm.nih.gov/pubmed/37642995 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e48496 %T A Comprehensive, Valid, and Reliable Tool to Assess the Degree of Responsibility of Digital Health Solutions That Operate With or Without Artificial Intelligence: 3-Phase Mixed Methods Study %A Lehoux,Pascale %A Rocha de Oliveira,Robson %A Rivard,Lysanne %A Silva,Hudson Pacifico %A Alami,Hassane %A Mörch,Carl Maria %A Malas,Kathy %+ Department of Health Management, Evaluation and Policy, Université de Montréal; Center for Public Health Research, 7101, Avenue du Parc, Montréal, QC, H3N 1X9, Canada, 1 5143437978, pascale.lehoux@umontreal.ca %K Responsible Innovation in Health %K digital health policy %K artificial intelligence ethics %K responsible research and innovation %K mixed methods %K e-Delphi %K interrater agreement %K mobile phone %D 2023 %7 28.8.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Clinicians’ scope of responsibilities is being steadily transformed by digital health solutions that operate with or without artificial intelligence (DAI solutions). Most tools developed to foster ethical practices lack rigor and do not concurrently capture the health, social, economic, and environmental issues that such solutions raise. Objective: To support clinical leadership in this field, we aimed to develop a comprehensive, valid, and reliable tool that measures the responsibility of DAI solutions by adapting the multidimensional and already validated Responsible Innovation in Health Tool. Methods: We conducted a 3-phase mixed methods study. Relying on a scoping review of available tools, phase 1 (concept mapping) led to a preliminary version of the Responsible DAI solutions Assessment Tool. In phase 2, an international 2-round e-Delphi expert panel rated on a 5-level scale the importance, clarity, and appropriateness of the tool’s components. In phase 3, a total of 2 raters independently applied the revised tool to a sample of DAI solutions (n=25), interrater reliability was measured, and final minor changes were made to the tool. Results: The mapping process identified a comprehensive set of responsibility premises, screening criteria, and assessment attributes specific to DAI solutions. e-Delphi experts critically assessed these new components and provided comments to increase content validity (n=293), and after round 2, consensus was reached on 85% (22/26) of the items surveyed. Interrater agreement was substantial for a subcriterion and almost perfect for all other criteria and assessment attributes. Conclusions: The Responsible DAI solutions Assessment Tool offers a comprehensive, valid, and reliable means of assessing the degree of responsibility of DAI solutions in health. As regulation remains limited, this forward-looking tool has the potential to change practice toward more equitable as well as economically and environmentally sustainable digital health care. %M 37639297 %R 10.2196/48496 %U https://www.jmir.org/2023/1/e48496 %U https://doi.org/10.2196/48496 %U http://www.ncbi.nlm.nih.gov/pubmed/37639297 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e47335 %T Barriers and Enablers for Implementation of an Artificial Intelligence–Based Decision Support Tool to Reduce the Risk of Readmission of Patients With Heart Failure: Stakeholder Interviews %A Nair,Monika %A Andersson,Jonas %A Nygren,Jens M %A Lundgren,Lina E %+ School of Business, Innovation and Sustainability, Halmstad University, Kristian IV:s väg 3, Halmstad, 30118, Sweden, 46 707227544, lina.lundgren@hh.se %K implementation %K AI systems %K health care %K interviews %K artificial Intelligence %K AI %K decision support tool %K readmission %K prediction %K heart failure %K digital tool %D 2023 %7 23.8.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Artificial intelligence (AI) applications in health care are expected to provide value for health care organizations, professionals, and patients. However, the implementation of such systems should be carefully planned and organized in order to ensure quality, safety, and acceptance. The gathered view of different stakeholders is a great source of information to understand the barriers and enablers for implementation in a specific context. Objective: This study aimed to understand the context and stakeholder perspectives related to the future implementation of a clinical decision support system for predicting readmissions of patients with heart failure. The study was part of a larger project involving model development, interface design, and implementation planning of the system. Methods: Interviews were held with 12 stakeholders from the regional and municipal health care organizations to gather their views on the potential effects implementation of such a decision support system could have as well as barriers and enablers for implementation. Data were analyzed based on the categories defined in the nonadoption, abandonment, scale-up, spread, sustainability (NASSS) framework. Results: Stakeholders had in general a positive attitude and curiosity toward AI-based decision support systems, and mentioned several barriers and enablers based on the experiences of previous implementations of information technology systems. Central aspects to consider for the proposed clinical decision support system were design aspects, access to information throughout the care process, and integration into the clinical workflow. The implementation of such a system could lead to a number of effects related to both clinical outcomes as well as resource allocation, which are all important to address in the planning of implementation. Stakeholders saw, however, value in several aspects of implementing such system, emphasizing the increased quality of life for those patients who can avoid being hospitalized. Conclusions: Several ideas were put forward on how the proposed AI system would potentially affect and provide value for patients, professionals, and the organization, and implementation aspects were important parts of that. A successful system can help clinicians to prioritize the need for different types of treatments but also be used for planning purposes within the hospital. However, the system needs not only technological and clinical precision but also a carefully planned implementation process. Such a process should take into consideration the aspects related to all the categories in the NASSS framework. This study further highlighted the importance to study stakeholder needs early in the process of development, design, and implementation of decision support systems, as the data revealed new information on the potential use of the system and the placement of the application in the care process. %M 37610799 %R 10.2196/47335 %U https://formative.jmir.org/2023/1/e47335 %U https://doi.org/10.2196/47335 %U http://www.ncbi.nlm.nih.gov/pubmed/37610799 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e48659 %T Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study %A Rao,Arya %A Pang,Michael %A Kim,John %A Kamineni,Meghana %A Lie,Winston %A Prasad,Anoop K %A Landman,Adam %A Dreyer,Keith %A Succi,Marc D %+ Department of Radiology, Massachusetts General Hospital, 55 Fruit Street, Boston, MA, 02114, United States, 1 617 935 9144, msucci@partners.org %K large language models %K LLMs %K artificial intelligence %K AI %K clinical decision support %K clinical vignettes %K ChatGPT %K Generative Pre-trained Transformer %K GPT %K utility %K development %K usability %K chatbot %K accuracy %K decision-making %D 2023 %7 22.8.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Large language model (LLM)–based artificial intelligence chatbots direct the power of large training data sets toward successive, related tasks as opposed to single-ask tasks, for which artificial intelligence already achieves impressive performance. The capacity of LLMs to assist in the full scope of iterative clinical reasoning via successive prompting, in effect acting as artificial physicians, has not yet been evaluated. Objective: This study aimed to evaluate ChatGPT’s capacity for ongoing clinical decision support via its performance on standardized clinical vignettes. Methods: We inputted all 36 published clinical vignettes from the Merck Sharpe & Dohme (MSD) Clinical Manual into ChatGPT and compared its accuracy on differential diagnoses, diagnostic testing, final diagnosis, and management based on patient age, gender, and case acuity. Accuracy was measured by the proportion of correct responses to the questions posed within the clinical vignettes tested, as calculated by human scorers. We further conducted linear regression to assess the contributing factors toward ChatGPT’s performance on clinical tasks. Results: ChatGPT achieved an overall accuracy of 71.7% (95% CI 69.3%-74.1%) across all 36 clinical vignettes. The LLM demonstrated the highest performance in making a final diagnosis with an accuracy of 76.9% (95% CI 67.8%-86.1%) and the lowest performance in generating an initial differential diagnosis with an accuracy of 60.3% (95% CI 54.2%-66.6%). Compared to answering questions about general medical knowledge, ChatGPT demonstrated inferior performance on differential diagnosis (β=–15.8%; P<.001) and clinical management (β=–7.4%; P=.02) question types. Conclusions: ChatGPT achieves impressive accuracy in clinical decision-making, with increasing strength as it gains more clinical information at its disposal. In particular, ChatGPT demonstrates the greatest accuracy in tasks of final diagnosis as compared to initial diagnosis. Limitations include possible model hallucinations and the unclear composition of ChatGPT’s training data set. %M 37606976 %R 10.2196/48659 %U https://www.jmir.org/2023/1/e48659 %U https://doi.org/10.2196/48659 %U http://www.ncbi.nlm.nih.gov/pubmed/37606976 %0 Journal Article %@ 2564-1891 %I JMIR Publications %V 3 %N %P e47317 %T Using Machine Learning Technology (Early Artificial Intelligence–Supported Response With Social Listening Platform) to Enhance Digital Social Understanding for the COVID-19 Infodemic: Development and Implementation Study %A White,Becky K %A Gombert,Arnault %A Nguyen,Tim %A Yau,Brian %A Ishizumi,Atsuyoshi %A Kirchner,Laura %A León,Alicia %A Wilson,Harry %A Jaramillo-Gutierrez,Giovanna %A Cerquides,Jesus %A D’Agostino,Marcelo %A Salvi,Cristiana %A Sreenath,Ravi Shankar %A Rambaud,Kimberly %A Samhouri,Dalia %A Briand,Sylvie %A Purnat,Tina D %+ Department of Epidemic and Pandemic Preparedness and Prevention, World Health Organization, Ave Appia 21, Geneva, 1202, Switzerland, 41 227912111, purnatt@who.int %K infodemic %K sentiment %K narrative analysis %K social listening %K natural language processing %K social media %K public health %K pandemic preparedness %K pandemic response %K artificial intelligence %K AI text analytics %K COVID-19 %K information voids %K machine learning %D 2023 %7 21.8.2023 %9 Original Paper %J JMIR Infodemiology %G English %X Background: Amid the COVID-19 pandemic, there has been a need for rapid social understanding to inform infodemic management and response. Although social media analysis platforms have traditionally been designed for commercial brands for marketing and sales purposes, they have been underused and adapted for a comprehensive understanding of social dynamics in areas such as public health. Traditional systems have challenges for public health use, and new tools and innovative methods are required. The World Health Organization Early Artificial Intelligence–Supported Response with Social Listening (EARS) platform was developed to overcome some of these challenges. Objective: This paper describes the development of the EARS platform, including data sourcing, development, and validation of a machine learning categorization approach, as well as the results from the pilot study. Methods: Data for EARS are collected daily from web-based conversations in publicly available sources in 9 languages. Public health and social media experts developed a taxonomy to categorize COVID-19 narratives into 5 relevant main categories and 41 subcategories. We developed a semisupervised machine learning algorithm to categorize social media posts into categories and various filters. To validate the results obtained by the machine learning–based approach, we compared it to a search-filter approach, applying Boolean queries with the same amount of information and measured the recall and precision. Hotelling T2 was used to determine the effect of the classification method on the combined variables. Results: The EARS platform was developed, validated, and applied to characterize conversations regarding COVID-19 since December 2020. A total of 215,469,045 social posts were collected for processing from December 2020 to February 2022. The machine learning algorithm outperformed the Boolean search filters method for precision and recall in both English and Spanish languages (P<.001). Demographic and other filters provided useful insights on data, and the gender split of users in the platform was largely consistent with population-level data on social media use. Conclusions: The EARS platform was developed to address the changing needs of public health analysts during the COVID-19 pandemic. The application of public health taxonomy and artificial intelligence technology to a user-friendly social listening platform, accessible directly by analysts, is a significant step in better enabling understanding of global narratives. The platform was designed for scalability; iterations and new countries and languages have been added. This research has shown that a machine learning approach is more accurate than using only keywords and has the benefit of categorizing and understanding large amounts of digital social data during an infodemic. Further technical developments are needed and planned for continuous improvements, to meet the challenges in the generation of infodemic insights from social media for infodemic managers and public health professionals. %M 37422854 %R 10.2196/47317 %U https://infodemiology.jmir.org/2023/1/e47317 %U https://doi.org/10.2196/47317 %U http://www.ncbi.nlm.nih.gov/pubmed/37422854 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e47366 %T Evaluating the Potential of Machine Learning and Wearable Devices in End-of-Life Care in Predicting 7-Day Death Events Among Patients With Terminal Cancer: Cohort Study %A Liu,Jen-Hsuan %A Shih,Chih-Yuan %A Huang,Hsien-Liang %A Peng,Jen-Kuei %A Cheng,Shao-Yi %A Tsai,Jaw-Shiun %A Lai,Feipei %+ Department of Family Medicine, National Taiwan University Hospital, National Taiwan University, 7 Chung-Shan South Road, Taipei, 100225, Taiwan, 886 2 2312 3456 ext 62147, jawshiun@ntu.edu.tw %K artificial intelligence %K end-of-life care %K machine learning %K palliative care %K survival prediction %K terminal cancer %K wearable device %D 2023 %7 18.8.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: An accurate prediction of mortality in end-of-life care is crucial but presents challenges. Existing prognostic tools demonstrate moderate performance in predicting survival across various time frames, primarily in in-hospital settings and single-time evaluations. However, these tools may fail to capture the individualized and diverse trajectories of patients. Limited evidence exists regarding the use of artificial intelligence (AI) and wearable devices, specifically among patients with cancer at the end of life. Objective: This study aimed to investigate the potential of using wearable devices and AI to predict death events among patients with cancer at the end of life. Our hypothesis was that continuous monitoring through smartwatches can offer valuable insights into the progression of patients at the end of life and enable the prediction of changes in their condition, which could ultimately enhance personalized care, particularly in outpatient or home care settings. Methods: This prospective study was conducted at the National Taiwan University Hospital. Patients diagnosed with cancer and receiving end-of-life care were invited to enroll in wards, outpatient clinics, and home-based care settings. Each participant was given a smartwatch to collect physiological data, including steps taken, heart rate, sleep time, and blood oxygen saturation. Clinical assessments were conducted weekly. The participants were followed until the end of life or up to 52 weeks. With these input features, we evaluated the prediction performance of several machine learning–based classifiers and a deep neural network in 7-day death events. We used area under the receiver operating characteristic curve (AUROC), F1-score, accuracy, and specificity as evaluation metrics. A Shapley additive explanations value analysis was performed to further explore the models with good performance. Results: From September 2021 to August 2022, overall, 1657 data points were collected from 40 patients with a median survival time of 34 days, with the detection of 28 death events. Among the proposed models, extreme gradient boost (XGBoost) yielded the best result, with an AUROC of 96%, F1-score of 78.5%, accuracy of 93%, and specificity of 97% on the testing set. The Shapley additive explanations value analysis identified the average heart rate as the most important feature. Other important features included steps taken, appetite, urination status, and clinical care phase. Conclusions: We demonstrated the successful prediction of patient deaths within the next 7 days using a combination of wearable devices and AI. Our findings highlight the potential of integrating AI and wearable technology into clinical end-of-life care, offering valuable insights and supporting clinical decision-making for personalized patient care. It is important to acknowledge that our study was conducted in a relatively small cohort; thus, further research is needed to validate our approach and assess its impact on clinical care. Trial Registration: ClinicalTrials.gov NCT05054907; https://classic.clinicaltrials.gov/ct2/show/NCT05054907 %M 37594793 %R 10.2196/47366 %U https://www.jmir.org/2023/1/e47366 %U https://doi.org/10.2196/47366 %U http://www.ncbi.nlm.nih.gov/pubmed/37594793 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e46854 %T Prediction of Medical Disputes Between Health Care Workers and Patients in Terms of Hospital Legal Construction Using Machine Learning Techniques: Externally Validated Cross-Sectional Study %A Yi,Min %A Cao,Yuebin %A Wang,Lin %A Gu,Yaowen %A Zheng,Xueqian %A Wang,Jiangjun %A Chen,Wei %A Wei,Liangyu %A Zhou,Yujin %A Shi,Chenyi %A Cao,Yanlin %+ Institute of Medical Information and Library, Chinese Academy of Medical Sciences and Peking Union Medical College, No 3 Yabao Road Chaoyang District, Beijing, 100020, China, 86 13370136475, cao.yanlin@imicams.ac.cn %K medical workers %K medical disputes %K hospital legal construction %K machine learning %K multicenter analysis %D 2023 %7 17.8.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Medical disputes are a global public health issue that is receiving increasing attention. However, studies investigating the relationship between hospital legal construction and medical disputes are scarce. The development of a multicenter model incorporating machine learning (ML) techniques for the individualized prediction of medical disputes would be beneficial for medical workers. Objective: This study aimed to identify predictors related to medical disputes from the perspective of hospital legal construction and the use of ML techniques to build models for predicting the risk of medical disputes. Methods: This study enrolled 38,053 medical workers from 130 tertiary hospitals in Hunan province, China. The participants were randomly divided into a training cohort (34,286/38,053, 90.1%) and an internal validation cohort (3767/38,053, 9.9%). Medical workers from 87 tertiary hospitals in Beijing were included in an external validation cohort (26,285/26,285, 100%). This study used logistic regression and 5 ML techniques: decision tree, random forest, support vector machine, gradient boosting decision tree (GBDT), and deep neural network. In total, 12 metrics, including discrimination and calibration, were used for performance evaluation. A scoring system was developed to select the optimal model. Shapley additive explanations was used to generate the importance coefficients for characteristics. To promote the clinical practice of our proposed optimal model, reclassification of patients was performed, and a web-based app for medical dispute prediction was created, which can be easily accessed by the public. Results: Medical disputes occurred among 46.06% (17,527/38,053) of the medical workers in Hunan province, China. Among the 26 clinical characteristics, multivariate analysis demonstrated that 18 characteristics were significantly associated with medical disputes, and these characteristics were used for ML model development. Among the ML techniques, GBDT was identified as the optimal model, demonstrating the lowest Brier score (0.205), highest area under the receiver operating characteristic curve (0.738, 95% CI 0.722-0.754), and the largest discrimination slope (0.172) and Youden index (1.355). In addition, it achieved the highest metrics score (63 points), followed by deep neural network (46 points) and random forest (45 points), in the internal validation set. In the external validation set, GBDT still performed comparably, achieving the second highest metrics score (52 points). The high-risk group had more than twice the odds of experiencing medical disputes compared with the low-risk group. Conclusions: We established a prediction model to stratify medical workers into different risk groups for encountering medical disputes. Among the 5 ML models, GBDT demonstrated the optimal comprehensive performance and was used to construct the web-based app. Our proposed model can serve as a useful tool for identifying medical workers at high risk of medical disputes. We believe that preventive strategies should be implemented for the high-risk group. %M 37590041 %R 10.2196/46854 %U https://www.jmir.org/2023/1/e46854 %U https://doi.org/10.2196/46854 %U http://www.ncbi.nlm.nih.gov/pubmed/37590041 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e47427 %T Using ChatGPT as a Learning Tool in Acupuncture Education: Comparative Study %A Lee,Hyeonhoon %+ Department of Anesthesiology and Pain Medicine, Seoul National University Hospital, 101 Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea, 82 2 2072 4627, hhoon@snu.ac.kr %K ChatGPT %K educational tool %K artificial intelligence %K acupuncture %K AI %K personalized education %K students %D 2023 %7 17.8.2023 %9 Original Paper %J JMIR Med Educ %G English %X Background: ChatGPT (Open AI) is a state-of-the-art artificial intelligence model with potential applications in the medical fields of clinical practice, research, and education. Objective: This study aimed to evaluate the potential of ChatGPT as an educational tool in college acupuncture programs, focusing on its ability to support students in learning acupuncture point selection, treatment planning, and decision-making. Methods: We collected case studies published in Acupuncture in Medicine between June 2022 and May 2023. Both ChatGPT-3.5 and ChatGPT-4 were used to generate suggestions for acupuncture points based on case presentations. A Wilcoxon signed-rank test was conducted to compare the number of acupuncture points generated by ChatGPT-3.5 and ChatGPT-4, and the overlapping ratio of acupuncture points was calculated. Results: Among the 21 case studies, 14 studies were included for analysis. ChatGPT-4 generated significantly more acupuncture points (9.0, SD 1.1) compared to ChatGPT-3.5 (5.6, SD 0.6; P<.001). The overlapping ratios of acupuncture points for ChatGPT-3.5 (0.40, SD 0.28) and ChatGPT-4 (0.34, SD 0.27; P=.67) were not significantly different. Conclusions: ChatGPT may be a useful educational tool for acupuncture students, providing valuable insights into personalized treatment plans. However, it cannot fully replace traditional diagnostic methods, and further studies are needed to ensure its safe and effective implementation in acupuncture education. %M 37590034 %R 10.2196/47427 %U https://mededu.jmir.org/2023/1/e47427 %U https://doi.org/10.2196/47427 %U http://www.ncbi.nlm.nih.gov/pubmed/37590034 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e50696 %T Ethical Challenges in AI Approaches to Eating Disorders %A Sharp,Gemma %A Torous,John %A West,Madeline L %+ Department of Neuroscience, Monash University, 99 Commercial Road, Melbourne, 3004, Australia, 61 421253188, gemma.sharp@monash.edu %K eating disorders %K body image %K artificial intelligence %K AI %K chatbot %K ethics %D 2023 %7 14.8.2023 %9 Editorial %J J Med Internet Res %G English %X The use of artificial intelligence (AI) to assist with the prevention, identification, and management of eating disorders and body image concerns is exciting, but it is not without risk. Technology is advancing rapidly, and ensuring that responsible standards are in place to mitigate risk and protect users is vital to the success and safety of technologies and users. %M 37578836 %R 10.2196/50696 %U https://www.jmir.org/2023/1/e50696 %U https://doi.org/10.2196/50696 %U http://www.ncbi.nlm.nih.gov/pubmed/37578836 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 12 %N %P e46900 %T Appropriateness and Comprehensiveness of Using ChatGPT for Perioperative Patient Education in Thoracic Surgery in Different Language Contexts: Survey Study %A Shao,Chen-ye %A Li,Hui %A Liu,Xiao-long %A Li,Chang %A Yang,Li-qin %A Zhang,Yue-juan %A Luo,Jing %A Zhao,Jun %+ Department of Thoracic Surgery, The First Affiliated Hospital of Soochow University, 899 Pinghai Road, Gusu District, Suzhou, 215006, China, 86 15250965957, zhaojia0327@126.com %K patient education %K ChatGPT %K Generative Pre-trained Transformer %K thoracic surgery %K evaluation %K patient %K education %K surgery %K thoracic %K language %K language model %K clinical workflow %K artificial intelligence %K AI %K workflow %K communication %K feasibility %D 2023 %7 14.8.2023 %9 Short Paper %J Interact J Med Res %G English %X Background: ChatGPT, a dialogue-based artificial intelligence language model, has shown promise in assisting clinical workflows and patient-clinician communication. However, there is a lack of feasibility assessments regarding its use for perioperative patient education in thoracic surgery. Objective: This study aimed to assess the appropriateness and comprehensiveness of using ChatGPT for perioperative patient education in thoracic surgery in both English and Chinese contexts. Methods: This pilot study was conducted in February 2023. A total of 37 questions focused on perioperative patient education in thoracic surgery were created based on guidelines and clinical experience. Two sets of inquiries were made to ChatGPT for each question, one in English and the other in Chinese. The responses generated by ChatGPT were evaluated separately by experienced thoracic surgical clinicians for appropriateness and comprehensiveness based on a hypothetical draft response to a patient’s question on the electronic information platform. For a response to be qualified, it required at least 80% of reviewers to deem it appropriate and 50% to deem it comprehensive. Statistical analyses were performed using the unpaired chi-square test or Fisher exact test, with a significance level set at P<.05. Results: The set of 37 commonly asked questions covered topics such as disease information, diagnostic procedures, perioperative complications, treatment measures, disease prevention, and perioperative care considerations. In both the English and Chinese contexts, 34 (92%) out of 37 responses were qualified in terms of both appropriateness and comprehensiveness. The remaining 3 (8%) responses were unqualified in these 2 contexts. The unqualified responses primarily involved the diagnosis of disease symptoms and surgical-related complications symptoms. The reasons for determining the responses as unqualified were similar in both contexts. There was no statistically significant difference (34/37, 92% vs 34/37, 92%; P=.99) in the qualification rate between the 2 language sets. Conclusions: This pilot study demonstrates the potential feasibility of using ChatGPT for perioperative patient education in thoracic surgery in both English and Chinese contexts. ChatGPT is expected to enhance patient satisfaction, reduce anxiety, and improve compliance during the perioperative period. In the future, there will be remarkable potential application for using artificial intelligence, in conjunction with human review, for patient education and health consultation after patients have provided their informed consent. %M 37578819 %R 10.2196/46900 %U https://www.i-jmr.org/2023/1/e46900 %U https://doi.org/10.2196/46900 %U http://www.ncbi.nlm.nih.gov/pubmed/37578819 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e48009 %T Ethical Considerations of Using ChatGPT in Health Care %A Wang,Changyu %A Liu,Siru %A Yang,Hao %A Guo,Jiulin %A Wu,Yuxuan %A Liu,Jialin %+ Information Center, West China Hospital, Sichuan University, No 37 Guo Xue Xiang, Chengdu, 610041, China, 86 28 85422306, DLJL8@163.com %K ethics %K ChatGPT %K artificial intelligence %K AI %K large language models %K health care %K artificial intelligence development %K development %K algorithm %K patient safety %K patient privacy %K safety %K privacy %D 2023 %7 11.8.2023 %9 Viewpoint %J J Med Internet Res %G English %X ChatGPT has promising applications in health care, but potential ethical issues need to be addressed proactively to prevent harm. ChatGPT presents potential ethical challenges from legal, humanistic, algorithmic, and informational perspectives. Legal ethics concerns arise from the unclear allocation of responsibility when patient harm occurs and from potential breaches of patient privacy due to data collection. Clear rules and legal boundaries are needed to properly allocate liability and protect users. Humanistic ethics concerns arise from the potential disruption of the physician-patient relationship, humanistic care, and issues of integrity. Overreliance on artificial intelligence (AI) can undermine compassion and erode trust. Transparency and disclosure of AI-generated content are critical to maintaining integrity. Algorithmic ethics raise concerns about algorithmic bias, responsibility, transparency and explainability, as well as validation and evaluation. Information ethics include data bias, validity, and effectiveness. Biased training data can lead to biased output, and overreliance on ChatGPT can reduce patient adherence and encourage self-diagnosis. Ensuring the accuracy, reliability, and validity of ChatGPT-generated content requires rigorous validation and ongoing updates based on clinical practice. To navigate the evolving ethical landscape of AI, AI in health care must adhere to the strictest ethical standards. Through comprehensive ethical guidelines, health care professionals can ensure the responsible use of ChatGPT, promote accurate and reliable information exchange, protect patient privacy, and empower patients to make informed decisions about their health care. %M 37566454 %R 10.2196/48009 %U https://www.jmir.org/2023/1/e48009 %U https://doi.org/10.2196/48009 %U http://www.ncbi.nlm.nih.gov/pubmed/37566454 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e28848 %T A Fast and Minimal System to Identify Depression Using Smartphones: Explainable Machine Learning–Based Approach %A Ahmed,Md Sabbir %A Ahmed,Nova %+ Design Inclusion and Access Lab, North South University, Plot #15, Block #B, Bashundhara, Dhaka, 1229, Bangladesh, 880 1781920068, msg2sabbir@gmail.com %K smartphone %K depression %K explainable machine learning %K low-resource settings %K real-time system %K students %D 2023 %7 10.8.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Existing robust, pervasive device-based systems developed in recent years to detect depression require data collected over a long period and may not be effective in cases where early detection is crucial. Additionally, due to the requirement of running systems in the background for prolonged periods, existing systems can be resource inefficient. As a result, these systems can be infeasible in low-resource settings. Objective: Our main objective was to develop a minimalistic system to identify depression using data retrieved in the fastest possible time. Another objective was to explain the machine learning (ML) models that were best for identifying depression. Methods: We developed a fast tool that retrieves the past 7 days’ app usage data in 1 second (mean 0.31, SD 1.10 seconds). A total of 100 students from Bangladesh participated in our study, and our tool collected their app usage data and responses to the Patient Health Questionnaire-9. To identify depressed and nondepressed students, we developed a diverse set of ML models: linear, tree-based, and neural network–based models. We selected important features using the stable approach, along with 3 main types of feature selection (FS) approaches: filter, wrapper, and embedded methods. We developed and validated the models using the nested cross-validation method. Additionally, we explained the best ML models through the Shapley additive explanations (SHAP) method. Results: Leveraging only the app usage data retrieved in 1 second, our light gradient boosting machine model used the important features selected by the stable FS approach and correctly identified 82.4% (n=42) of depressed students (precision=75%, F1-score=78.5%). Moreover, after comprehensive exploration, we presented a parsimonious stacking model where around 5 features selected by the all-relevant FS approach Boruta were used in each iteration of validation and showed a maximum precision of 77.4% (balanced accuracy=77.9%). Feature importance analysis suggested app usage behavioral markers containing diurnal usage patterns as being more important than aggregated data-based markers. In addition, a SHAP analysis of our best models presented behavioral markers that were related to depression. For instance, students who were not depressed spent more time on education apps on weekdays, whereas those who were depressed used a higher number of photo and video apps and also had a higher deviation in using photo and video apps over the morning, afternoon, evening, and night time periods of the weekend. Conclusions: Due to our system’s fast and minimalistic nature, it may make a worthwhile contribution to identifying depression in underdeveloped and developing regions. In addition, our detailed discussion about the implication of our findings can facilitate the development of less resource-intensive systems to better understand students who are depressed and take steps for intervention. %M 37561568 %R 10.2196/28848 %U https://formative.jmir.org/2023/1/e28848 %U https://doi.org/10.2196/28848 %U http://www.ncbi.nlm.nih.gov/pubmed/37561568 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e46761 %T Chatbots to Improve Sexual and Reproductive Health: Realist Synthesis %A Mills,Rhiana %A Mangone,Emily Rose %A Lesh,Neal %A Mohan,Diwakar %A Baraitser,Paula %+ SH24, 35A Westminster Bridge Road, London, SE1 7JB, United Kingdom, 44 7742932445, rhiana@sh24.org.uk %K chatbot %K sexual and reproductive health %K realist synthesis %K social networks %K service networks %K disclosure %K artificial intelligence %K sexual %K reproductive %K social media %K counseling %K treatment %K development %K theory %K digital device %K device %D 2023 %7 9.8.2023 %9 Review %J J Med Internet Res %G English %X Background: Digital technologies may improve sexual and reproductive health (SRH) across diverse settings. Chatbots are computer programs designed to simulate human conversation, and there is a growing interest in the potential for chatbots to provide responsive and accurate information, counseling, linkages to products and services, or a companion on an SRH journey. Objective: This review aimed to identify assumptions about the value of chatbots for SRH and collate the evidence to support them. Methods: We used a realist approach that starts with an initial program theory and generates causal explanations in the form of context, mechanism, and outcome configurations to test and develop that theory. We generated our program theory, drawing on the expertise of the research team, and then searched the literature to add depth and develop this theory with evidence. Results: The evidence supports our program theory, which suggests that chatbots are a promising intervention for SRH information and service delivery. This is because chatbots offer anonymous and nonjudgmental interactions that encourage disclosure of personal information, provide complex information in a responsive and conversational tone that increases understanding, link to SRH conversations within web-based and offline social networks, provide immediate support or service provision 24/7 by automating some tasks, and provide the potential to develop long-term relationships with users who return over time. However, chatbots may be less valuable where people find any conversation about SRH (even with a chatbot) stigmatizing, for those who lack confidential access to digital devices, where conversations do not feel natural, and where chatbots are developed as stand-alone interventions without reference to service contexts. Conclusions: Chatbots in SRH could be developed further to automate simple tasks and support service delivery. They should prioritize achieving an authentic conversational tone, which could be developed to facilitate content sharing in social networks, should support long-term relationship building with their users, and should be integrated into wider service networks. %M 37556194 %R 10.2196/46761 %U https://www.jmir.org/2023/1/e46761 %U https://doi.org/10.2196/46761 %U http://www.ncbi.nlm.nih.gov/pubmed/37556194 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e48978 %T Performance of ChatGPT on the Situational Judgement Test—A Professional Dilemmas–Based Examination for Doctors in the United Kingdom %A Borchert,Robin J %A Hickman,Charlotte R %A Pepys,Jack %A Sadler,Timothy J %+ Department of Radiology, University of Cambridge, Hills Road, Cambridge, CB2 0QQ, United Kingdom, 1 1223 805000, rb729@medschl.cam.ac.uk %K ChatGPT %K language models %K Situational Judgement Test %K medical education %K artificial intelligence %K language model %K exam %K examination %K SJT %K judgement %K reasoning %K communication %K chatbot %D 2023 %7 7.8.2023 %9 Original Paper %J JMIR Med Educ %G English %X Background: ChatGPT is a large language model that has performed well on professional examinations in the fields of medicine, law, and business. However, it is unclear how ChatGPT would perform on an examination assessing professionalism and situational judgement for doctors. Objective: We evaluated the performance of ChatGPT on the Situational Judgement Test (SJT): a national examination taken by all final-year medical students in the United Kingdom. This examination is designed to assess attributes such as communication, teamwork, patient safety, prioritization skills, professionalism, and ethics. Methods: All questions from the UK Foundation Programme Office’s (UKFPO’s) 2023 SJT practice examination were inputted into ChatGPT. For each question, ChatGPT’s answers and rationales were recorded and assessed on the basis of the official UK Foundation Programme Office scoring template. Questions were categorized into domains of Good Medical Practice on the basis of the domains referenced in the rationales provided in the scoring sheet. Questions without clear domain links were screened by reviewers and assigned one or multiple domains. ChatGPT's overall performance, as well as its performance across the domains of Good Medical Practice, was evaluated. Results: Overall, ChatGPT performed well, scoring 76% on the SJT but scoring full marks on only a few questions (9%), which may reflect possible flaws in ChatGPT’s situational judgement or inconsistencies in the reasoning across questions (or both) in the examination itself. ChatGPT demonstrated consistent performance across the 4 outlined domains in Good Medical Practice for doctors. Conclusions: Further research is needed to understand the potential applications of large language models, such as ChatGPT, in medical education for standardizing questions and providing consistent rationales for examinations assessing professionalism and ethics. %M 37548997 %R 10.2196/48978 %U https://mededu.jmir.org/2023/1/e48978 %U https://doi.org/10.2196/48978 %U http://www.ncbi.nlm.nih.gov/pubmed/37548997 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e49034 %T Effects of Combinational Use of Additional Differential Diagnostic Generators on the Diagnostic Accuracy of the Differential Diagnosis List Developed by an Artificial Intelligence–Driven Automated History–Taking System: Pilot Cross-Sectional Study %A Harada,Yukinori %A Tomiyama,Shusaku %A Sakamoto,Tetsu %A Sugimoto,Shu %A Kawamura,Ren %A Yokose,Masashi %A Hayashi,Arisa %A Shimizu,Taro %+ Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, 880 Kitakobayashi, Mibu, Shimotsugagun, 321-0293, Japan, 81 282 86 1111, yharada@dokkyomed.ac.jp %K collective intelligence %K differential diagnosis generator %K diagnostic accuracy %K automated medical history taking system %K artificial intelligence %K AI %D 2023 %7 2.8.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Low diagnostic accuracy is a major concern in automated medical history–taking systems with differential diagnosis (DDx) generators. Extending the concept of collective intelligence to the field of DDx generators such that the accuracy of judgment becomes higher when accepting an integrated diagnosis list from multiple people than when accepting a diagnosis list from a single person may be a possible solution. Objective: The purpose of this study is to assess whether the combined use of several DDx generators improves the diagnostic accuracy of DDx lists. Methods: We used medical history data and the top 10 DDx lists (index DDx lists) generated by an artificial intelligence (AI)–driven automated medical history–taking system from 103 patients with confirmed diagnoses. Two research physicians independently created the other top 10 DDx lists (second and third DDx lists) per case by imputing key information into the other 2 DDx generators based on the medical history generated by the automated medical history–taking system without reading the index lists generated by the automated medical history–taking system. We used the McNemar test to assess the improvement in diagnostic accuracy from the index DDx lists to the three types of combined DDx lists: (1) simply combining DDx lists from the index, second, and third lists; (2) creating a new top 10 DDx list using a 1/n weighting rule; and (3) creating new lists with only shared diagnoses among DDx lists from the index, second, and third lists. We treated the data generated by 2 research physicians from the same patient as independent cases. Therefore, the number of cases included in analyses in the case using 2 additional lists was 206 (103 cases × 2 physicians’ input). Results: The diagnostic accuracy of the index lists was 46% (47/103). Diagnostic accuracy was improved by simply combining the other 2 DDx lists (133/206, 65%, P<.001), whereas the other 2 combined DDx lists did not improve the diagnostic accuracy of the DDx lists (106/206, 52%, P=.05 in the collective list with the 1/n weighting rule and 29/206, 14%, P<.001 in the only shared diagnoses among the 3 DDx lists). Conclusions: Simply adding each of the top 10 DDx lists from additional DDx generators increased the diagnostic accuracy of the DDx list by approximately 20%, suggesting that the combinational use of DDx generators early in the diagnostic process is beneficial. %M 37531164 %R 10.2196/49034 %U https://formative.jmir.org/2023/1/e49034 %U https://doi.org/10.2196/49034 %U http://www.ncbi.nlm.nih.gov/pubmed/37531164 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e47748 %T Artificial Intelligence Versus Human-Controlled Doctor in Virtual Reality Simulation for Sepsis Team Training: Randomized Controlled Study %A Liaw,Sok Ying %A Tan,Jian Zhi %A Bin Rusli,Khairul Dzakirin %A Ratan,Rabindra %A Zhou,Wentao %A Lim,Siriwan %A Lau,Tang Ching %A Seah,Betsy %A Chua,Wei Ling %+ Alice Lee Centre for Nursing Studies, National University of Singapore, Block MD11, Level 2, 10 Medical Drive, Singapore, 117597, Singapore, 65 65167451, nurliaw@nus.edu.sg %K artificial intelligence %K interprofessional education %K interprofessional communication %K sepsis care %K team training %K virtual reality %K simulation %K AI %K health care education %K nursing student %K nursing education %K medical education %D 2023 %7 26.7.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Interprofessional communication is needed to enhance the early recognition and management of patients with sepsis. Preparing medical and nursing students using virtual reality simulation has been shown to be an effective learning approach for sepsis team training. However, its scalability is constrained by unequal cohort sizes between medical and nursing students. An artificial intelligence (AI) medical team member can be implemented in a virtual reality simulation to engage nursing students in sepsis team training. Objective: This study aimed to evaluate the effectiveness of an AI-powered doctor versus a human-controlled doctor in training nursing students for sepsis care and interprofessional communication. Methods: A randomized controlled trial study was conducted with 64 nursing students who were randomly assigned to undertake sepsis team training with an AI-powered doctor (AI-powered group) or with medical students using virtual reality simulation (human-controlled group). Participants from both groups were tested on their sepsis and communication performance through simulation-based assessments (posttest). Participants’ sepsis knowledge and self-efficacy in interprofessional communication were also evaluated before and after the study interventions. Results: A total of 32 nursing students from each group completed the simulation-based assessment, sepsis and communication knowledge test, and self-efficacy questionnaire. Compared with the baseline scores, both the AI-powered and human-controlled groups demonstrated significant improvements in communication knowledge (P=.001) and self-efficacy in interprofessional communication (P<.001) in posttest scores. For sepsis care knowledge, a significant improvement in sepsis care knowledge from the baseline was observed in the AI-powered group (P<.001) but not in the human-controlled group (P=.16). Although no significant differences were found in sepsis care performance between the groups (AI-powered group: mean 13.63, SD 4.23, vs human-controlled group: mean 12.75, SD 3.85, P=.39), the AI-powered group (mean 9.06, SD 1.78) had statistically significantly higher sepsis posttest knowledge scores (P=.009) than the human-controlled group (mean 7.75, SD 2.08). No significant differences were found in interprofessional communication performance between the 2 groups (AI-powered group: mean 29.34, SD 8.37, vs human-controlled group: mean 27.06, SD 5.69, P=.21). However, the human-controlled group (mean 69.6, SD 14.4) reported a significantly higher level of self-efficacy in interprofessional communication (P=.008) than the AI-powered group (mean 60.1, SD 13.3). Conclusions: Our study suggested that AI-powered doctors are not inferior to human-controlled virtual reality simulations with respect to sepsis care and interprofessional communication performance, which supports the viability of implementing AI-powered doctors to achieve scalability in sepsis team training. Our findings also suggested that future innovations should focus on the sociability of AI-powered doctors to enhance users’ interprofessional communication training. Perhaps in the nearer term, future studies should examine how to best blend AI-powered training with human-controlled virtual reality simulation to optimize clinical performance in sepsis care and interprofessional communication. Trial Registration: ClinicalTrials.gov NCT05953441; https://clinicaltrials.gov/study/NCT05953441 %M 37494112 %R 10.2196/47748 %U https://www.jmir.org/2023/1/e47748 %U https://doi.org/10.2196/47748 %U http://www.ncbi.nlm.nih.gov/pubmed/37494112 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e43068 %T A Medical Ethics Framework for Conversational Artificial Intelligence %A Fournier-Tombs,Eleonore %A McHardy,Juliette %+ United Nations University Centre for Policy Research, 767 Third Avenue, Floor 35, New York, NY, 10017, United States, 1 646 905 5225, fourniertombs@unu.edu %K chatbot %K medicine %K ethics %K AI ethics %K AI policy %K conversational agent %K COVID-19 %K risk %K medical ethics %K privacy %K data governance %K artificial intelligence %D 2023 %7 26.7.2023 %9 Viewpoint %J J Med Internet Res %G English %X The launch of OpenAI’s GPT-3 model in June 2020 began a new era for conversational chatbots. While there are chatbots that do not use artificial intelligence (AI), conversational chatbots integrate AI language models that allow for back-and-forth conversation between an AI system and a human user. GPT-3, since upgraded to GPT-4, harnesses a natural language processing technique called sentence embedding and allows for conversations with users that are more nuanced and realistic than before. The launch of this model came in the first few months of the COVID-19 pandemic, where increases in health care needs globally combined with social distancing measures made virtual medicine more relevant than ever. GPT-3 and other conversational models have been used for a wide variety of medical purposes, from providing basic COVID-19–related guidelines to personalized medical advice and even prescriptions. The line between medical professionals and conversational chatbots is somewhat blurred, notably in hard-to-reach communities where the chatbot replaced face-to-face health care. Considering these blurred lines and the circumstances accelerating the adoption of conversational chatbots globally, we analyze the use of these tools from an ethical perspective. Notably, we map out the many types of risks in the use of conversational chatbots in medicine to the principles of medical ethics. In doing so, we propose a framework for better understanding the effects of these chatbots on both patients and the medical field more broadly, with the hope of informing safe and appropriate future developments. %M 37224277 %R 10.2196/43068 %U https://www.jmir.org/2023/1/e43068 %U https://doi.org/10.2196/43068 %U http://www.ncbi.nlm.nih.gov/pubmed/37224277 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e41858 %T Using Hypothesis-Led Machine Learning and Hierarchical Cluster Analysis to Identify Disease Pathways Prior to Dementia: Longitudinal Cohort Study %A Huang,Shih-Tsung %A Hsiao,Fei-Yuan %A Tsai,Tsung-Hsien %A Chen,Pei-Jung %A Peng,Li-Ning %A Chen,Liang-Kung %+ Center for Geriatrics and Gerontology, Taipei Veterans General Hospital, No. 201, Sec 2, Shih-Pai Road, Taipei, 11217, Taiwan, 886 2 28757711, lkchen2@vghtpe.gov.tw %K dementia %K machine learning %K cluster analysis %K disease %K condition %K symptoms %K data %K data set %K cardiovascular %K neuropsychiatric %K infection %K mobility %K mental conditions %K development %D 2023 %7 26.7.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Dementia development is a complex process in which the occurrence and sequential relationships of different diseases or conditions may construct specific patterns leading to incident dementia. Objective: This study aimed to identify patterns of disease or symptom clusters and their sequences prior to incident dementia using a novel approach incorporating machine learning methods. Methods: Using Taiwan’s National Health Insurance Research Database, data from 15,700 older people with dementia and 15,700 nondementia controls matched on age, sex, and index year (n=10,466, 67% for the training data set and n=5234, 33% for the testing data set) were retrieved for analysis. Using machine learning methods to capture specific hierarchical disease triplet clusters prior to dementia, we designed a study algorithm with four steps: (1) data preprocessing, (2) disease or symptom pathway selection, (3) model construction and optimization, and (4) data visualization. Results: Among 15,700 identified older people with dementia, 10,466 and 5234 subjects were randomly assigned to the training and testing data sets, and 6215 hierarchical disease triplet clusters with positive correlations with dementia onset were identified. We subsequently generated 19,438 features to construct prediction models, and the model with the best performance was support vector machine (SVM) with the by-group LASSO (least absolute shrinkage and selection operator) regression method (total corresponding features=2513; accuracy=0.615; sensitivity=0.607; specificity=0.622; positive predictive value=0.612; negative predictive value=0.619; area under the curve=0.639). In total, this study captured 49 hierarchical disease triplet clusters related to dementia development, and the most characteristic patterns leading to incident dementia started with cardiovascular conditions (mainly hypertension), cerebrovascular disease, mobility disorders, or infections, followed by neuropsychiatric conditions. Conclusions: Dementia development in the real world is an intricate process involving various diseases or conditions, their co-occurrence, and sequential relationships. Using a machine learning approach, we identified 49 hierarchical disease triplet clusters with leading roles (cardio- or cerebrovascular disease) and supporting roles (mental conditions, locomotion difficulties, infections, and nonspecific neurological conditions) in dementia development. Further studies using data from other countries are needed to validate the prediction algorithms for dementia development, allowing the development of comprehensive strategies to prevent or care for dementia in the real world. %M 37494081 %R 10.2196/41858 %U https://www.jmir.org/2023/1/e41858 %U https://doi.org/10.2196/41858 %U http://www.ncbi.nlm.nih.gov/pubmed/37494081 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e46769 %T Self-Supervised Electroencephalogram Representation Learning for Automatic Sleep Staging: Model Development and Evaluation Study %A Yang,Chaoqi %A Xiao,Cao %A Westover,M Brandon %A Sun,Jimeng %+ Computer Science Department, Carle's Illinois College of Medicine, University of Illinois, Urbana Champaign, 201 N Goodwin Ave, Urbana, IL, 61801, United States, 1 9142698058, jimeng.sun@gmail.com %K physiological signals %K electroencephalogram %K EEG %K sleep staging %K sleep %K predict %K wearable devices %K wearable %K self-supervised learning %K digital health %K mHealth %K mobile health %K healthcare %K health care %K machine learning %D 2023 %7 26.7.2023 %9 Original Paper %J JMIR AI %G English %X Background: Deep learning models have shown great success in automating tasks in sleep medicine by learning from carefully annotated electroencephalogram (EEG) data. However, effectively using a large amount of raw EEG data remains a challenge. Objective: In this study, we aim to learn robust vector representations from massive unlabeled EEG signals, such that the learned vectorized features (1) are expressive enough to replace the raw signals in the sleep staging task, and (2) provide better predictive performance than supervised models in scenarios involving fewer labels and noisy samples. Methods: We propose a self-supervised model, Contrast with the World Representation (ContraWR), for EEG signal representation learning. Unlike previous models that use a set of negative samples, our model uses global statistics (ie, the average representation) from the data set to distinguish signals associated with different sleep stages. The ContraWR model is evaluated on 3 real-world EEG data sets that include both settings: at-home and in-laboratory EEG recording. Results: ContraWR outperforms 4 recently reported self-supervised learning methods on the sleep staging task across 3 large EEG data sets. ContraWR also supersedes supervised learning when fewer training labels are available (eg, 4% accuracy improvement when less than 2% of data are labeled on the Sleep EDF data set). Moreover, the model provides informative, representative feature structures in 2D projection. Conclusions: We show that ContraWR is robust to noise and can provide high-quality EEG representations for downstream prediction tasks. The proposed model can be generalized to other unsupervised physiological signal learning tasks. Future directions include exploring task-specific data augmentations and combining self-supervised methods with supervised methods, building upon the initial success of self-supervised learning reported in this study. %M 38090533 %R 10.2196/46769 %U https://ai.jmir.org/2023/1/e46769 %U https://doi.org/10.2196/46769 %U http://www.ncbi.nlm.nih.gov/pubmed/38090533 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e48966 %T ChatGPT vs Google for Queries Related to Dementia and Other Cognitive Decline: Comparison of Results %A Hristidis,Vagelis %A Ruggiano,Nicole %A Brown,Ellen L %A Ganta,Sai Rithesh Reddy %A Stewart,Selena %+ Department of Computer Science and Engineering, University of California, Riverside, Winston Chung Hall, Room 317, Riverside, CA, 92521, United States, 1 9518272478, vagelis@cs.ucr.edu %K chatbots %K large language models %K ChatGPT %K web search %K language model %K Google %K aging %K cognitive %K cognition %K dementia %K gerontology %K geriatric %K geriatrics %K query %K queries %K information seeking %K search %D 2023 %7 25.7.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: People living with dementia or other cognitive decline and their caregivers (PLWD) increasingly rely on the web to find information about their condition and available resources and services. The recent advancements in large language models (LLMs), such as ChatGPT, provide a new alternative to the more traditional web search engines, such as Google. Objective: This study compared the quality of the results of ChatGPT and Google for a collection of PLWD-related queries. Methods: A set of 30 informational and 30 service delivery (transactional) PLWD-related queries were selected and submitted to both Google and ChatGPT. Three domain experts assessed the results for their currency of information, reliability of the source, objectivity, relevance to the query, and similarity of their response. The readability of the results was also analyzed. Interrater reliability coefficients were calculated for all outcomes. Results: Google had superior currency and higher reliability. ChatGPT results were evaluated as more objective. ChatGPT had a significantly higher response relevance, while Google often drew upon sources that were referral services for dementia care or service providers themselves. The readability was low for both platforms, especially for ChatGPT (mean grade level 12.17, SD 1.94) compared to Google (mean grade level 9.86, SD 3.47). The similarity between the content of ChatGPT and Google responses was rated as high for 13 (21.7%) responses, medium for 16 (26.7%) responses, and low for 31 (51.6%) responses. Conclusions: Both Google and ChatGPT have strengths and weaknesses. ChatGPT rarely includes the source of a result. Google more often provides a date for and a known reliable source of the response compared to ChatGPT, whereas ChatGPT supplies more relevant responses to queries. The results of ChatGPT may be out of date and often do not specify a validity time stamp. Google sometimes returns results based on commercial entities. The readability scores for both indicate that responses are often not appropriate for persons with low health literacy skills. In the future, the addition of both the source and the date of health-related information and availability in other languages may increase the value of these platforms for both nonmedical and medical professionals. %M 37490317 %R 10.2196/48966 %U https://www.jmir.org/2023/1/e48966 %U https://doi.org/10.2196/48966 %U http://www.ncbi.nlm.nih.gov/pubmed/37490317 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e36121 %T Evaluation of 2 Artificial Intelligence Software for Chest X-Ray Screening and Pulmonary Tuberculosis Diagnosis: Protocol for a Retrospective Case-Control Study %A Mohd Hisham,Muhammad Faiz %A Lodz,Noor Aliza %A Muhammad,Eida Nurhadzira %A Asari,Filza Noor %A Mahmood,Mohd Ihsani %A Abu Bakar,Zamzurina %+ Institute for Public Health, National Institute of Health, Ministry of Health Malaysia, No 1, Jalan Setia Murni U13/52, Setia Alam, Shah Alam, 40170, Malaysia, 60 333628888 ext 8722, faizhisham86@gmail.com %K artificial intelligence %K AI %K evaluation %K pulmonary tuberculosis %K PTB %K chest x-ray %K CXR %K screening %D 2023 %7 25.7.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: According to the World Bank, Malaysia reported an estimated 97 tuberculosis cases per 100,000 people in 2021. Chest x-ray (CXR) remains the best conventional method for the early detection of pulmonary tuberculosis (PTB) infection. The intervention of artificial intelligence (AI) in PTB diagnosis could efficiently aid human interpreters and reduce health professionals’ work burden. To date, no AI studies have been evaluated in Malaysia. Objective: This study aims to evaluate the performance of Putralytica and Qure.ai software for CXR screening and PTB diagnosis among the Malaysian population. Methods: We will conduct a retrospective case-control study at the Respiratory Medicine Institute, National Cancer Institute, and Sungai Buloh Health Clinic. A total of 1500 CXR images of patients who completed treatments or check-ups will be selected and categorized into three groups: (1) abnormal PTB cases, (2) abnormal non-PTB cases, and (3) normal cases. These CXR images, along with their clinical findings, will be the reference standard in this study. All patient data, including sociodemographic characteristics and clinical history, will be collected prior to screening via Putralytica and Qure.ai software and readers’ interpretation, which are the index tests for this study. Interpretation from all 3 index tests will be compared with the reference standard, and significant statistical analysis will be computed. Results: Data collection is expected to commence in August 2023. It is anticipated that 1 year will be needed to conduct the study. Conclusions: This study will measure the accuracy of Putralytica and Qure.ai software and whether their findings will concur with readers’ interpretation and the reference standard, thus providing evidence toward the effectiveness of implementing AI in the medical setting. International Registered Report Identifier (IRRID): PRR1-10.2196/36121 %M 37490330 %R 10.2196/36121 %U https://www.researchprotocols.org/2023/1/e36121 %U https://doi.org/10.2196/36121 %U http://www.ncbi.nlm.nih.gov/pubmed/37490330 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e46340 %T Diagnostic Test Accuracy of Deep Learning Prediction Models on COVID-19 Severity: Systematic Review and Meta-Analysis %A Wang,Changyu %A Liu,Siru %A Tang,Yu %A Yang,Hao %A Liu,Jialin %+ Information Center, West China Hospital, Sichuan University, No. 37 Guoxue Road, Chengdu, 610041, China, 86 28 85422306, DLJL8@163.com %K COVID-19 %K deep learning %K prognostics and health management %K Severity of Illness Index %K accuracy %K AI %K prediction model %K systematic review %K meta-analysis %K disease severity %K prognosis %K digital health intervention %D 2023 %7 21.7.2023 %9 Review %J J Med Internet Res %G English %X Background: Deep learning (DL) prediction models hold great promise in the triage of COVID-19. Objective: We aimed to evaluate the diagnostic test accuracy of DL prediction models for assessing and predicting the severity of COVID-19. Methods: We searched PubMed, Scopus, LitCovid, Embase, Ovid, and the Cochrane Library for studies published from December 1, 2019, to April 30, 2022. Studies that used DL prediction models to assess or predict COVID-19 severity were included, while those without diagnostic test accuracy analysis or severity dichotomies were excluded. QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies 2), PROBAST (Prediction Model Risk of Bias Assessment Tool), and funnel plots were used to estimate the bias and applicability. Results: A total of 12 retrospective studies involving 2006 patients reported the cross-sectionally assessed value of DL on COVID-19 severity. The pooled sensitivity and area under the curve were 0.92 (95% CI 0.89-0.94; I2=0.00%) and 0.95 (95% CI 0.92-0.96), respectively. A total of 13 retrospective studies involving 3951 patients reported the longitudinal predictive value of DL for disease severity. The pooled sensitivity and area under the curve were 0.76 (95% CI 0.74-0.79; I2=0.00%) and 0.80 (95% CI 0.76-0.83), respectively. Conclusions: DL prediction models can help clinicians identify potentially severe cases for early triage. However, high-quality research is lacking. Trial Registration: PROSPERO CRD42022329252; https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD 42022329252 %M 37477951 %R 10.2196/46340 %U https://www.jmir.org/2023/1/e46340 %U https://doi.org/10.2196/46340 %U http://www.ncbi.nlm.nih.gov/pubmed/37477951 %0 Journal Article %@ 2561-1011 %I JMIR Publications %V 7 %N %P e48795 %T Effective Prediction of Mortality by Heart Disease Among Women in Jordan Using the Chi-Squared Automatic Interaction Detection Model: Retrospective Validation Study %A Bani Hani,Salam %A Ahmad,Muayyad %+ Clinical Nursing Department, School of Nursing, The University of Jordan, Queen Rania St, Amman, 11942, Jordan, 962 785577701, banihani.salam@yahoo.com %K coronary heart disease %K mortality %K artificial intelligence %K machine learning %K algorithms %K algorithm %K women %K death %K predict %K prediction %K predictive %K heart %K cardiology %K coronary %K CHD %K cardiovascular disease %K CVD %K cardiovascular %D 2023 %7 20.7.2023 %9 Original Paper %J JMIR Cardio %G English %X Background: Many current studies have claimed that the actual risk of heart disease among women is equal to that in men. Using a large machine learning algorithm (MLA) data set to predict mortality in women, data mining techniques have been used to identify significant aspects of variables that help in identifying the primary causes of mortality within this target category of the population. Objective: This study aims to predict mortality caused by heart disease among women, using an artificial intelligence technique–based MLA. Methods: A retrospective design was used to retrieve big data from the electronic health records of 2028 women with heart disease. Data were collected for Jordanian women who were admitted to public health hospitals from 2015 to the end of 2021. We checked the extracted data for noise, consistency issues, and missing values. After categorizing, organizing, and cleaning the extracted data, the redundant data were eliminated. Results: Out of 9 artificial intelligence models, the Chi-squared Automatic Interaction Detection model had the highest accuracy (93.25%) and area under the curve (0.825) among the build models. The participants were 62.6 (SD 15.4) years old on average. Angina pectoris was the most frequent diagnosis in the women's extracted files (n=1,264,000, 62.3%), followed by congestive heart failure (n=764,000, 37.7%). Age, systolic blood pressure readings with a cutoff value of >187 mm Hg, medical diagnosis (women diagnosed with congestive heart failure were at a higher risk of death [n=31, 16.58%]), pulse pressure with a cutoff value of 98 mm Hg, and oxygen saturation (measured using pulse oximetry) with a cutoff value of 93% were the main predictors for death among women. Conclusions: To predict the outcomes in this study, we used big data that were extracted from the clinical variables from the electronic health records. The Chi-squared Automatic Interaction Detection model—an MLA—confirmed the precise identification of the key predictors of cardiovascular mortality among women and can be used as a practical tool for clinical prediction. %M 37471126 %R 10.2196/48795 %U https://cardio.jmir.org/2023/1/e48795 %U https://doi.org/10.2196/48795 %U http://www.ncbi.nlm.nih.gov/pubmed/37471126 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e46105 %T Applied Machine Learning Techniques to Diagnose Voice-Affecting Conditions and Disorders: Systematic Literature Review %A Idrisoglu,Alper %A Dallora,Ana Luiza %A Anderberg,Peter %A Berglund,Johan Sanmartin %+ Department of Health, Blekinge Institute of Technology, Valhallavägen 1, Karslkrona, 37141, Sweden, 46 701462619, alper.idrisoglu@bth.se %K diagnosis %K digital biomarkers %K machine learning %K monitoring %K voice-affecting disorder %K voice features %D 2023 %7 19.7.2023 %9 Review %J J Med Internet Res %G English %X Background: Normal voice production depends on the synchronized cooperation of multiple physiological systems, which makes the voice sensitive to changes. Any systematic, neurological, and aerodigestive distortion is prone to affect voice production through reduced cognitive, pulmonary, and muscular functionality. This sensitivity inspired using voice as a biomarker to examine disorders that affect the voice. Technological improvements and emerging machine learning (ML) technologies have enabled possibilities of extracting digital vocal features from the voice for automated diagnosis and monitoring systems. Objective: This study aims to summarize a comprehensive view of research on voice-affecting disorders that uses ML techniques for diagnosis and monitoring through voice samples where systematic conditions, nonlaryngeal aerodigestive disorders, and neurological disorders are specifically of interest. Methods: This systematic literature review (SLR) investigated the state of the art of voice-based diagnostic and monitoring systems with ML technologies, targeting voice-affecting disorders without direct relation to the voice box from the point of view of applied health technology. Through a comprehensive search string, studies published from 2012 to 2022 from the databases Scopus, PubMed, and Web of Science were scanned and collected for assessment. To minimize bias, retrieval of the relevant references in other studies in the field was ensured, and 2 authors assessed the collected studies. Low-quality studies were removed through a quality assessment and relevant data were extracted through summary tables for analysis. The articles were checked for similarities between author groups to prevent cumulative redundancy bias during the screening process, where only 1 article was included from the same author group. Results: In the analysis of the 145 included studies, support vector machines were the most utilized ML technique (51/145, 35.2%), with the most studied disease being Parkinson disease (PD; reported in 87/145, 60%, studies). After 2017, 16 additional voice-affecting disorders were examined, in contrast to the 3 investigated previously. Furthermore, an upsurge in the use of artificial neural network–based architectures was observed after 2017. Almost half of the included studies were published in last 2 years (2021 and 2022). A broad interest from many countries was observed. Notably, nearly one-half (n=75) of the studies relied on 10 distinct data sets, and 11/145 (7.6%) used demographic data as an input for ML models. Conclusions: This SLR revealed considerable interest across multiple countries in using ML techniques for diagnosing and monitoring voice-affecting disorders, with PD being the most studied disorder. However, the review identified several gaps, including limited and unbalanced data set usage in studies, and a focus on diagnostic test rather than disorder-specific monitoring. Despite the limitations of being constrained by only peer-reviewed publications written in English, the SLR provides valuable insights into the current state of research on ML-based voice-affecting disorder diagnosis and monitoring and highlighting areas to address in future research. %M 37467031 %R 10.2196/46105 %U https://www.jmir.org/2023/1/e46105 %U https://doi.org/10.2196/46105 %U http://www.ncbi.nlm.nih.gov/pubmed/37467031 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e45041 %T Predicting Fetal Alcohol Spectrum Disorders Using Machine Learning Techniques: Multisite Retrospective Cohort Study %A Oh,Sarah Soyeon %A Kuang,Irene %A Jeong,Hyewon %A Song,Jin-Yeop %A Ren,Boyu %A Moon,Jong Youn %A Park,Eun-Cheol %A Kawachi,Ichiro %+ Artificial Intelligence and Big-Data Convergence Center, Gil Medical Center, Gachon University College of Medicine, 191 Hambangmoe-ro, Yeonsu-gu, Incheon, 21936, Republic of Korea, 82 1021245754, moonjy@gachon.ac.kr %K fetal alcohol syndrome %K machine learning %K algorithm %K development %K fetal %K fetus %K maternal %K obstetric %K gynecology %K pregnant %K prenatal %K antenatal %K postnatal %K predict %K developmental disability %K prenatal alcohol exposure %K alcohol %K alcohol exposure %K developmental %K disability %K pregnancy %K age %K race %K diagnosis %K diagnostic %K treatment %D 2023 %7 18.7.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Fetal alcohol syndrome (FAS) is a lifelong developmental disability that occurs among individuals with prenatal alcohol exposure (PAE). With improved prediction models, FAS can be diagnosed or treated early, if not completely prevented. Objective: In this study, we sought to compare different machine learning algorithms and their FAS predictive performance among women who consumed alcohol during pregnancy. We also aimed to identify which variables (eg, timing of exposure to alcohol during pregnancy and type of alcohol consumed) were most influential in generating an accurate model. Methods: Data from the collaborative initiative on fetal alcohol spectrum disorders from 2007 to 2017 were used to gather information about 595 women who consumed alcohol during pregnancy at 5 hospital sites around the United States. To obtain information about PAE, questionnaires or in-person interviews, as well as reviews of medical, legal, or social service records were used to gather information about alcohol consumption. Four different machine learning algorithms (logistic regression, XGBoost, light gradient-boosting machine, and CatBoost) were trained to predict the prevalence of FAS at birth, and model performance was measured by analyzing the area under the receiver operating characteristics curve (AUROC). Of the total cases, 80% were randomly selected for training, while 20% remained as test data sets for predicting FAS. Feature importance was also analyzed using Shapley values for the best-performing algorithm. Results: Overall, there were 20 cases of FAS within a total population of 595 individuals with PAE. Most of the drinking occurred in the first trimester only (n=491) or throughout all 3 trimesters (n=95); however, there were also reports of drinking in the first and second trimesters only (n=8), and 1 case of drinking in the third trimester only (n=1). The CatBoost method delivered the best performance in terms of AUROC (0.92) and area under the precision-recall curve (AUPRC 0.51), followed by the logistic regression method (AUROC 0.90; AUPRC 0.59), the light gradient-boosting machine (AUROC 0.89; AUPRC 0.52), and XGBoost (AUROC 0.86; AURPC 0.45). Shapley values in the CatBoost model revealed that 12 variables were considered important in FAS prediction, with drinking throughout all 3 trimesters of pregnancy, maternal age, race, and type of alcoholic beverage consumed (eg, beer, wine, or liquor) scoring highly in overall feature importance. For most predictive measures, the best performance was obtained by the CatBoost algorithm, with an AUROC of 0.92, precision of 0.50, specificity of 0.29, F1 score of 0.29, and accuracy of 0.96. Conclusions: Machine learning algorithms were able to identify FAS risk with a prediction performance higher than that of previous models among pregnant drinkers. For small training sets, which are common with FAS, boosting mechanisms like CatBoost may help alleviate certain problems associated with data imbalances and difficulties in optimization or generalization. %M 37463016 %R 10.2196/45041 %U https://www.jmir.org/2023/1/e45041 %U https://doi.org/10.2196/45041 %U http://www.ncbi.nlm.nih.gov/pubmed/37463016 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e45000 %T Artificial Intelligence–Enabled Software Prototype to Inform Opioid Pharmacovigilance From Electronic Health Records: Development and Usability Study %A Sorbello,Alfred %A Haque,Syed Arefinul %A Hasan,Rashedul %A Jermyn,Richard %A Hussein,Ahmad %A Vega,Alex %A Zembrzuski,Krzysztof %A Ripple,Anna %A Ahadpour,Mitra %+ Center for Drug Evaluation and Research, US Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, United States, 1 (888) 463 6332, mdrashedul.hasan@fda.hhs.gov %K electronic health records %K pharmacovigilance %K artificial intelligence %K real world data %K EHR %K natural language %K software application %K drug %K Food and Drug Administration %K deep learning %D 2023 %7 18.7.2023 %9 Original Paper %J JMIR AI %G English %X Background: The use of patient health and treatment information captured in structured and unstructured formats in computerized electronic health record (EHR) repositories could potentially augment the detection of safety signals for drug products regulated by the US Food and Drug Administration (FDA). Natural language processing and other artificial intelligence (AI) techniques provide novel methodologies that could be leveraged to extract clinically useful information from EHR resources. Objective: Our aim is to develop a novel AI-enabled software prototype to identify adverse drug event (ADE) safety signals from free-text discharge summaries in EHRs to enhance opioid drug safety and research activities at the FDA. Methods: We developed a prototype for web-based software that leverages keyword and trigger-phrase searching with rule-based algorithms and deep learning to extract candidate ADEs for specific opioid drugs from discharge summaries in the Medical Information Mart for Intensive Care III (MIMIC III) database. The prototype uses MedSpacy components to identify relevant sections of discharge summaries and a pretrained natural language processing (NLP) model, Spark NLP for Healthcare, for named entity recognition. Fifteen FDA staff members provided feedback on the prototype’s features and functionalities. Results: Using the prototype, we were able to identify known, labeled, opioid-related adverse drug reactions from text in EHRs. The AI-enabled model achieved accuracy, recall, precision, and F1-scores of 0.66, 0.69, 0.64, and 0.67, respectively. FDA participants assessed the prototype as highly desirable in user satisfaction, visualizations, and in the potential to support drug safety signal detection for opioid drugs from EHR data while saving time and manual effort. Actionable design recommendations included (1) enlarging the tabs and visualizations; (2) enabling more flexibility and customizations to fit end users’ individual needs; (3) providing additional instructional resources; (4) adding multiple graph export functionality; and (5) adding project summaries. Conclusions: The novel prototype uses innovative AI-based techniques to automate searching for, extracting, and analyzing clinically useful information captured in unstructured text in EHRs. It increases efficiency in harnessing real-world data for opioid drug safety and increases the usability of the data to support regulatory review while decreasing the manual research burden. %M 37771410 %R 10.2196/45000 %U https://ai.jmir.org/2023/1/e45000 %U https://doi.org/10.2196/45000 %U http://www.ncbi.nlm.nih.gov/pubmed/37771410 %0 Journal Article %@ 2371-4379 %I JMIR Publications %V 8 %N %P e47592 %T An “All-Data-on-Hand” Deep Learning Model to Predict Hospitalization for Diabetic Ketoacidosis in Youth With Type 1 Diabetes: Development and Validation Study %A Williams,David D %A Ferro,Diana %A Mullaney,Colin %A Skrabonja,Lydia %A Barnes,Mitchell S %A Patton,Susana R %A Lockee,Brent %A Tallon,Erin M %A Vandervelden,Craig A %A Schweisberger,Cintya %A Mehta,Sanjeev %A McDonough,Ryan %A Lind,Marcus %A D'Avolio,Leonard %A Clements,Mark A %+ Health Services and Outcomes Research, Children's Mercy - Kansas City, 2401 Gillham Road, Kansas City, MO, 64108, United States, 1 816 731 7214, ddwilliams@cmh.edu %K type 1 diabetes %K T1D %K diabetic ketoacidosis %K DKA %K machine learning %K deep learning %K artificial intelligence %K AI %K recurrent neural network %K RNN %K long short-term memory %K LSTM %K natural language processing %K NLP %D 2023 %7 18.7.2023 %9 Original Paper %J JMIR Diabetes %G English %X Background: Although prior research has identified multiple risk factors for diabetic ketoacidosis (DKA), clinicians continue to lack clinic-ready models to predict dangerous and costly episodes of DKA. We asked whether we could apply deep learning, specifically the use of a long short-term memory (LSTM) model, to accurately predict the 180-day risk of DKA-related hospitalization for youth with type 1 diabetes (T1D). Objective: We aimed to describe the development of an LSTM model to predict the 180-day risk of DKA-related hospitalization for youth with T1D. Methods: We used 17 consecutive calendar quarters of clinical data (January 10, 2016, to March 18, 2020) for 1745 youths aged 8 to 18 years with T1D from a pediatric diabetes clinic network in the Midwestern United States. The input data included demographics, discrete clinical observations (laboratory results, vital signs, anthropometric measures, diagnosis, and procedure codes), medications, visit counts by type of encounter, number of historic DKA episodes, number of days since last DKA admission, patient-reported outcomes (answers to clinic intake questions), and data features derived from diabetes- and nondiabetes-related clinical notes via natural language processing. We trained the model using input data from quarters 1 to 7 (n=1377), validated it using input from quarters 3 to 9 in a partial out-of-sample (OOS-P; n=1505) cohort, and further validated it in a full out-of-sample (OOS-F; n=354) cohort with input from quarters 10 to 15. Results: DKA admissions occurred at a rate of 5% per 180-days in both out-of-sample cohorts. In the OOS-P and OOS-F cohorts, the median age was 13.7 (IQR 11.3-15.8) years and 13.1 (IQR 10.7-15.5) years; median glycated hemoglobin levels at enrollment were 8.6% (IQR 7.6%-9.8%) and 8.1% (IQR 6.9%-9.5%); recall was 33% (26/80) and 50% (9/18) for the top-ranked 5% of youth with T1D; and 14.15% (213/1505) and 12.7% (45/354) had prior DKA admissions (after the T1D diagnosis), respectively. For lists rank ordered by the probability of hospitalization, precision increased from 33% to 56% to 100% for positions 1 to 80, 1 to 25, and 1 to 10 in the OOS-P cohort and from 50% to 60% to 80% for positions 1 to 18, 1 to 10, and 1 to 5 in the OOS-F cohort, respectively. Conclusions: The proposed LSTM model for predicting 180-day DKA-related hospitalization was valid in this sample. Future research should evaluate model validity in multiple populations and settings to account for health inequities that may be present in different segments of the population (eg, racially or socioeconomically diverse cohorts). Rank ordering youth by probability of DKA-related hospitalization will allow clinics to identify the most at-risk youth. The clinical implication of this is that clinics may then create and evaluate novel preventive interventions based on available resources. %M 37224506 %R 10.2196/47592 %U https://diabetes.jmir.org/2023/1/e47592 %U https://doi.org/10.2196/47592 %U http://www.ncbi.nlm.nih.gov/pubmed/37224506 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 12 %N %P e45903 %T The Role of Artificial Intelligence Model Documentation in Translational Science: Scoping Review %A Brereton,Tracey A %A Malik,Momin M %A Lifson,Mark %A Greenwood,Jason D %A Peterson,Kevin J %A Overgaard,Shauna M %+ Center for Digital Health, Mayo Clinic, 200 1st St SW, Rochester, MN, 55905, United States, 1 (507) 284 2511, brereton.tracey@mayo.edu %K health %K informatics %K artificial intelligence %K machine learning %K documentation %K explainability %K ethics %K translational science %K scoping review %K medical modeling software %K clinical decision support %K decision support intervention %D 2023 %7 14.7.2023 %9 Review %J Interact J Med Res %G English %X Background: Despite the touted potential of artificial intelligence (AI) and machine learning (ML) to revolutionize health care, clinical decision support tools, herein referred to as medical modeling software (MMS), have yet to realize the anticipated benefits. One proposed obstacle is the acknowledged gaps in AI translation. These gaps stem partly from the fragmentation of processes and resources to support MMS transparent documentation. Consequently, the absence of transparent reporting hinders the provision of evidence to support the implementation of MMS in clinical practice, thereby serving as a substantial barrier to the successful translation of software from research settings to clinical practice. Objective: This study aimed to scope the current landscape of AI- and ML-based MMS documentation practices and elucidate the function of documentation in facilitating the translation of ethical and explainable MMS into clinical workflows. Methods: A scoping review was conducted in accordance with PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. PubMed was searched using Medical Subject Headings key concepts of AI, ML, ethical considerations, and explainability to identify publications detailing AI- and ML-based MMS documentation, in addition to snowball sampling of selected reference lists. To include the possibility of implicit documentation practices not explicitly labeled as such, we did not use documentation as a key concept but as an inclusion criterion. A 2-stage screening process (title and abstract screening and full-text review) was conducted by 1 author. A data extraction template was used to record publication-related information; barriers to developing ethical and explainable MMS; available standards, regulations, frameworks, or governance strategies related to documentation; and recommendations for documentation for papers that met the inclusion criteria. Results: Of the 115 papers retrieved, 21 (18.3%) papers met the requirements for inclusion. Ethics and explainability were investigated in the context of AI- and ML-based MMS documentation and translation. Data detailing the current state and challenges and recommendations for future studies were synthesized. Notable themes defining the current state and challenges that required thorough review included bias, accountability, governance, and explainability. Recommendations identified in the literature to address present barriers call for a proactive evaluation of MMS, multidisciplinary collaboration, adherence to investigation and validation protocols, transparency and traceability requirements, and guiding standards and frameworks that enhance documentation efforts and support the translation of AI- and ML-based MMS. Conclusions: Resolving barriers to translation is critical for MMS to deliver on expectations, including those barriers identified in this scoping review related to bias, accountability, governance, and explainability. Our findings suggest that transparent strategic documentation, aligning translational science and regulatory science, will support the translation of MMS by coordinating communication and reporting and reducing translational barriers, thereby furthering the adoption of MMS. %M 37450330 %R 10.2196/45903 %U https://www.i-jmr.org/2023/1/e45903 %U https://doi.org/10.2196/45903 %U http://www.ncbi.nlm.nih.gov/pubmed/37450330 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e50336 %T Authors’ Reply to: Variability in Large Language Models’ Responses to Medical Licensing and Certification Examinations %A Gilson,Aidan %A Safranek,Conrad W %A Huang,Thomas %A Socrates,Vimig %A Chi,Ling %A Taylor,Richard Andrew %A Chartash,David %+ Section for Biomedical Informatics and Data Science, Yale University School of Medicine, 100 College Street, 9th Fl, New Haven, CT, 06510, United States, 1 203 737 5379, david.chartash@yale.edu %K natural language processing %K NLP %K MedQA %K generative pre-trained transformer %K GPT %K medical education %K chatbot %K artificial intelligence %K AI %K education technology %K ChatGPT %K conversational agent %K machine learning %K large language models %K knowledge assessment %D 2023 %7 13.7.2023 %9 Letter to the Editor %J JMIR Med Educ %G English %X %M 37440299 %R 10.2196/50336 %U https://mededu.jmir.org/2023/1/e50336 %U https://doi.org/10.2196/50336 %U http://www.ncbi.nlm.nih.gov/pubmed/37440299 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e48305 %T Variability in Large Language Models’ Responses to Medical Licensing and Certification Examinations. Comment on “How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment” %A Epstein,Richard H %A Dexter,Franklin %+ Department of Anesthesiology, Perioperative Medicine and Pain Management, University of Miami Miller School of Medicine, 1400 NW 12th Ave, Suite 4022F, Miami, FL, 33136, United States, 1 215 896 7850, repstein@med.miami.edu %K natural language processing %K NLP %K MedQA %K generative pre-trained transformer %K GPT %K medical education %K chatbot %K artificial intelligence %K AI %K education technology %K ChatGPT %K Google Bard %K conversational agent %K machine learning %K large language models %K knowledge assessment %D 2023 %7 13.7.2023 %9 Letter to the Editor %J JMIR Med Educ %G English %X %M 37440293 %R 10.2196/48305 %U https://mededu.jmir.org/2023/1/e48305 %U https://doi.org/10.2196/48305 %U http://www.ncbi.nlm.nih.gov/pubmed/37440293 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e45872 %T Creating an Innovative Artificial Intelligence–Based Technology (TCRact) for Designing and Optimizing T Cell Receptors for Use in Cancer Immunotherapies: Protocol for an Observational Trial %A Bujak,Joanna %A Kłęk,Stanisław %A Balawejder,Martyna %A Kociniak,Aleksandra %A Wilkus,Kinga %A Szatanek,Rafał %A Orzeszko,Zofia %A Welanyk,Joanna %A Torbicz,Grzegorz %A Jęckowski,Mateusz %A Kucharczyk,Tomasz %A Wohadlo,Łukasz %A Borys,Maciej %A Stadnik,Honorata %A Wysocki,Michał %A Kayser,Magdalena %A Słomka,Marta Ewa %A Kosmowska,Anna %A Horbacka,Karolina %A Gach,Tomasz %A Markowska,Beata %A Kowalczyk,Tomasz %A Karoń,Jacek %A Karczewski,Marek %A Szura,Mirosław %A Sanecka-Duin,Anna %A Blum,Agnieszka %+ Ardigen SA, ul. Podole 76, Cracow, 30-394, Poland, 48 123409494, anna.sanecka-duin@ardigen.com %K AI %K artificial intelligence %K colorectal cancer %K HLA %K human leukocyte antigen %K immunotherapy %K neoantigen %K T cell receptors %K TCR %D 2023 %7 13.7.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: Cancer continues to be the leading cause of mortality in high-income countries, necessitating the development of more precise and effective treatment modalities. Immunotherapy, specifically adoptive cell transfer of T cell receptor (TCR)-engineered T cells (TCR-T therapy), has shown promise in engaging the immune system for cancer treatment. One of the biggest challenges in the development of TCR-T therapies is the proper prediction of the pairing between TCRs and peptide-human leukocyte antigen (pHLAs). Modern computational immunology, using artificial intelligence (AI)-based platforms, provides the means to optimize the speed and accuracy of TCR screening and discovery. Objective: This study proposes an observational clinical trial protocol to collect patient samples and generate a database of pHLA:TCR sequences to aid the development of an AI-based platform for efficient selection of specific TCRs. Methods: The multicenter observational study, involving 8 participating hospitals, aims to enroll patients diagnosed with stage II, III, or IV colorectal cancer adenocarcinoma. Results: Patient recruitment has recently been completed, with 100 participants enrolled. Primary tumor tissue and peripheral blood samples have been obtained, and peripheral blood mononuclear cells have been isolated and cryopreserved. Nucleic acid extraction (DNA and RNA) has been performed in 86 cases. Additionally, 57 samples underwent whole exome sequencing to determine the presence of somatic mutations and RNA sequencing for gene expression profiling. Conclusions: The results of this study may have a significant impact on the treatment of patients with colorectal cancer. The comprehensive database of pHLA:TCR sequences generated through this observational clinical trial will facilitate the development of the AI-based platform for TCR selection. The results obtained thus far demonstrate successful patient recruitment and sample collection, laying the foundation for further analysis and the development of an innovative tool to expedite and enhance TCR selection for precision cancer treatments. Trial Registration: ClinicalTrials.gov NCT04994093; https://clinicaltrials.gov/ct2/show/NCT04994093 International Registered Report Identifier (IRRID): DERR1-10.2196/45872 %M 37440307 %R 10.2196/45872 %U https://www.researchprotocols.org/2023/1/e45872 %U https://doi.org/10.2196/45872 %U http://www.ncbi.nlm.nih.gov/pubmed/37440307 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 9 %N %P e44467 %T Pediatric Injury Surveillance From Uncoded Emergency Department Admission Records in Italy: Machine Learning–Based Text-Mining Approach %A Azzolina,Danila %A Bressan,Silvia %A Lorenzoni,Giulia %A Baldan,Giulia Andrea %A Bartolotta,Patrizia %A Scognamiglio,Federico %A Francavilla,Andrea %A Lanera,Corrado %A Da Dalt,Liviana %A Gregori,Dario %+ Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences, and Public Health, University of Padova, Via Leonardo Loredan 18, Padua, 35128, Italy, 39 049 8275384, dario.gregori@unipd.it %K machine learning %K pediatrics %K child and adolescent health %K text mining %K injury %K death %K surveillance %K pediatric admission %K hospitalization %K patient record %K unintentional injury %K emergency department %K emergency %K epidemiological surveillance %D 2023 %7 12.7.2023 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Unintentional injury is the leading cause of death in young children. Emergency department (ED) diagnoses are a useful source of information for injury epidemiological surveillance purposes. However, ED data collection systems often use free-text fields to report patient diagnoses. Machine learning techniques (MLTs) are powerful tools for automatic text classification. The MLT system is useful to improve injury surveillance by speeding up the manual free-text coding tasks of ED diagnoses. Objective: This research aims to develop a tool for automatic free-text classification of ED diagnoses to automatically identify injury cases. The automatic classification system also serves for epidemiological purposes to identify the burden of pediatric injuries in Padua, a large province in the Veneto region in the Northeast Italy. Methods: The study includes 283,468 pediatric admissions between 2007 and 2018 to the Padova University Hospital ED, a large referral center in Northern Italy. Each record reports a diagnosis by free text. The records are standard tools for reporting patient diagnoses. An expert pediatrician manually classified a randomly extracted sample of approximately 40,000 diagnoses. This study sample served as the gold standard to train an MLT classifier. After preprocessing, a document-term matrix was created. The machine learning classifiers, including decision tree, random forest, gradient boosting method (GBM), and support vector machine (SVM), were tuned by 4-fold cross-validation. The injury diagnoses were classified into 3 hierarchical classification tasks, as follows: injury versus noninjury (task A), intentional versus unintentional injury (task B), and type of unintentional injury (task C), according to the World Health Organization classification of injuries. Results: The SVM classifier achieved the highest performance accuracy (94.14%) in classifying injury versus noninjury cases (task A). The GBM method produced the best results (92% accuracy) for the unintentional and intentional injury classification task (task B). The highest accuracy for the unintentional injury subclassification (task C) was achieved by the SVM classifier. The SVM, random forest, and GBM algorithms performed similarly against the gold standard across different tasks. Conclusions: This study shows that MLTs are promising techniques for improving epidemiological surveillance, allowing for the automatic classification of pediatric ED free-text diagnoses. The MLTs revealed a suitable classification performance, especially for general injuries and intentional injury classification. This automatic classification could facilitate the epidemiological surveillance of pediatric injuries by also reducing the health professionals’ efforts in manually classifying diagnoses for research purposes. %M 37436799 %R 10.2196/44467 %U https://publichealth.jmir.org/2023/1/e44467 %U https://doi.org/10.2196/44467 %U http://www.ncbi.nlm.nih.gov/pubmed/37436799 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e42621 %T The FeatureCloud Platform for Federated Learning in Biomedicine: Unified Approach %A Matschinske,Julian %A Späth,Julian %A Bakhtiari,Mohammad %A Probul,Niklas %A Kazemi Majdabadi,Mohammad Mahdi %A Nasirigerdeh,Reza %A Torkzadehmahani,Reihaneh %A Hartebrodt,Anne %A Orban,Balazs-Attila %A Fejér,Sándor-József %A Zolotareva,Olga %A Das,Supratim %A Baumbach,Linda %A Pauling,Josch K %A Tomašević,Olivera %A Bihari,Béla %A Bloice,Marcus %A Donner,Nina C %A Fdhila,Walid %A Frisch,Tobias %A Hauschild,Anne-Christin %A Heider,Dominik %A Holzinger,Andreas %A Hötzendorfer,Walter %A Hospes,Jan %A Kacprowski,Tim %A Kastelitz,Markus %A List,Markus %A Mayer,Rudolf %A Moga,Mónika %A Müller,Heimo %A Pustozerova,Anastasia %A Röttger,Richard %A Saak,Christina C %A Saranti,Anna %A Schmidt,Harald H H W %A Tschohl,Christof %A Wenke,Nina K %A Baumbach,Jan %+ University of Hamburg, Notkestrasse 9, Hamburg, 22607, Germany, 49 40 42838 ext 7640, julian.matschinske@uni-hamburg.de %K privacy-preserving machine learning %K federated learning %K interactive platform %K artificial intelligence %K AI store %K privacy-enhancing technologies %K additive secret sharing %D 2023 %7 12.7.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Machine learning and artificial intelligence have shown promising results in many areas and are driven by the increasing amount of available data. However, these data are often distributed across different institutions and cannot be easily shared owing to strict privacy regulations. Federated learning (FL) allows the training of distributed machine learning models without sharing sensitive data. In addition, the implementation is time-consuming and requires advanced programming skills and complex technical infrastructures. Objective: Various tools and frameworks have been developed to simplify the development of FL algorithms and provide the necessary technical infrastructure. Although there are many high-quality frameworks, most focus only on a single application case or method. To our knowledge, there are no generic frameworks, meaning that the existing solutions are restricted to a particular type of algorithm or application field. Furthermore, most of these frameworks provide an application programming interface that needs programming knowledge. There is no collection of ready-to-use FL algorithms that are extendable and allow users (eg, researchers) without programming knowledge to apply FL. A central FL platform for both FL algorithm developers and users does not exist. This study aimed to address this gap and make FL available to everyone by developing FeatureCloud, an all-in-one platform for FL in biomedicine and beyond. Methods: The FeatureCloud platform consists of 3 main components: a global frontend, a global backend, and a local controller. Our platform uses a Docker to separate the local acting components of the platform from the sensitive data systems. We evaluated our platform using 4 different algorithms on 5 data sets for both accuracy and runtime. Results: FeatureCloud removes the complexity of distributed systems for developers and end users by providing a comprehensive platform for executing multi-institutional FL analyses and implementing FL algorithms. Through its integrated artificial intelligence store, federated algorithms can easily be published and reused by the community. To secure sensitive raw data, FeatureCloud supports privacy-enhancing technologies to secure the shared local models and assures high standards in data privacy to comply with the strict General Data Protection Regulation. Our evaluation shows that applications developed in FeatureCloud can produce highly similar results compared with centralized approaches and scale well for an increasing number of participating sites. Conclusions: FeatureCloud provides a ready-to-use platform that integrates the development and execution of FL algorithms while reducing the complexity to a minimum and removing the hurdles of federated infrastructure. Thus, we believe that it has the potential to greatly increase the accessibility of privacy-preserving and distributed data analyses in biomedicine and beyond. %M 37436815 %R 10.2196/42621 %U https://www.jmir.org/2023/1/e42621 %U https://doi.org/10.2196/42621 %U http://www.ncbi.nlm.nih.gov/pubmed/37436815 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 10 %N %P e46859 %T Attitudes Toward the Adoption of 2 Artificial Intelligence–Enabled Mental Health Tools Among Prospective Psychotherapists: Cross-sectional Study %A Kleine,Anne-Kathrin %A Kokje,Eesha %A Lermer,Eva %A Gaube,Susanne %+ Department of Psychology, Ludwig Maximilian University of Munich, Geschwister-Scholl-Platz 1, Munich, 80539, Germany, 49 1709076034, Anne-Kathrin.Kleine@psy.lmu.de %K artificial intelligence %K mental health %K clinical decision support systems %K Unified Theory of Acceptance and Use of Technology %K technology acceptance model %D 2023 %7 12.7.2023 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Despite growing efforts to develop user-friendly artificial intelligence (AI) applications for clinical care, their adoption remains limited because of the barriers at individual, organizational, and system levels. There is limited research on the intention to use AI systems in mental health care. Objective: This study aimed to address this gap by examining the predictors of psychology students’ and early practitioners’ intention to use 2 specific AI-enabled mental health tools based on the Unified Theory of Acceptance and Use of Technology. Methods: This cross-sectional study included 206 psychology students and psychotherapists in training to examine the predictors of their intention to use 2 AI-enabled mental health care tools. The first tool provides feedback to the psychotherapist on their adherence to motivational interviewing techniques. The second tool uses patient voice samples to derive mood scores that the therapists may use for treatment decisions. Participants were presented with graphic depictions of the tools’ functioning mechanisms before measuring the variables of the extended Unified Theory of Acceptance and Use of Technology. In total, 2 structural equation models (1 for each tool) were specified, which included direct and mediated paths for predicting tool use intentions. Results: Perceived usefulness and social influence had a positive effect on the intention to use the feedback tool (P<.001) and the treatment recommendation tool (perceived usefulness, P=.01 and social influence, P<.001). However, trust was unrelated to use intentions for both the tools. Moreover, perceived ease of use was unrelated (feedback tool) and even negatively related (treatment recommendation tool) to use intentions when considering all predictors (P=.004). In addition, a positive relationship between cognitive technology readiness (P=.02) and the intention to use the feedback tool and a negative relationship between AI anxiety and the intention to use the feedback tool (P=.001) and the treatment recommendation tool (P<.001) were observed. Conclusions: The results shed light on the general and tool-dependent drivers of AI technology adoption in mental health care. Future research may explore the technological and user group characteristics that influence the adoption of AI-enabled tools in mental health care. %M 37436801 %R 10.2196/46859 %U https://humanfactors.jmir.org/2023/1/e46859 %U https://doi.org/10.2196/46859 %U http://www.ncbi.nlm.nih.gov/pubmed/37436801 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e44165 %T An Automatically Adaptive Digital Health Intervention to Decrease Opioid-Related Risk While Conserving Counselor Time: Quantitative Analysis of Treatment Decisions Based on Artificial Intelligence and Patient-Reported Risk Measures %A Piette,John D %A Thomas,Laura %A Newman,Sean %A Marinec,Nicolle %A Krauss,Joel %A Chen,Jenny %A Wu,Zhenke %A Bohnert,Amy S B %+ Ann Arbor Department of Veterans Affairs Center for Clinical Management Research, 2215 Fuller Road, Mail Stop 152, Ann Arbor, MI, 48105, United States, 1 734 223 0127, jpiette@umich.edu %K artificial intelligence %K opioid safety %K telehealth %K reinforcement learning %K pain management %D 2023 %7 11.7.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Some patients prescribed opioid analgesic (OA) medications for pain experience serious side effects, including dependence, sedation, and overdose. As most patients are at low risk for OA-related harms, risk reduction interventions requiring multiple counseling sessions are impractical on a large scale. Objective: This study evaluates whether an intervention based on reinforcement learning (RL), a field of artificial intelligence, learned through experience to personalize interactions with patients with pain discharged from the emergency department (ED) and decreased self-reported OA misuse behaviors while conserving counselors’ time. Methods: We used data representing 2439 weekly interactions between a digital health intervention (“Prescription Opioid Wellness and Engagement Research in the ED” [PowerED]) and 228 patients with pain discharged from 2 EDs who reported recent opioid misuse. During each patient’s 12 weeks of intervention, PowerED used RL to select from 3 treatment options: a brief motivational message delivered via an interactive voice response (IVR) call, a longer motivational IVR call, or a live call from a counselor. The algorithm selected session types for each patient each week, with the goal of minimizing OA risk, defined in terms of a dynamic score reflecting patient reports during IVR monitoring calls. When a live counseling call was predicted to have a similar impact on future risk as an IVR message, the algorithm favored IVR to conserve counselor time. We used logit models to estimate changes in the relative frequency of each session type as PowerED gained experience. Poisson regression was used to examine the changes in self-reported OA risk scores over calendar time, controlling for the ordinal session number (1st to 12th). Results: Participants on average were 40 (SD 12.7) years of age; 66.7% (152/228) were women and 51.3% (117/228) were unemployed. Most participants (175/228, 76.8%) reported chronic pain, and 46.2% (104/225) had moderate to severe depressive symptoms. As PowerED gained experience through interactions over a period of 142 weeks, it delivered fewer live counseling sessions than brief IVR sessions (P=.006) and extended IVR sessions (P<.001). Live counseling sessions were selected 33.5% of the time in the first 5 weeks of interactions (95% CI 27.4%-39.7%) but only for 16.4% of sessions (95% CI 12.7%-20%) after 125 weeks. Controlling for each patient’s changes during the course of treatment, this adaptation of treatment-type allocation led to progressively greater improvements in self-reported OA risk scores (P<.001) over calendar time, as measured by the number of weeks since enrollment began. Improvement in risk behaviors over time was especially pronounced among patients with the highest risk at baseline (P=.02). Conclusions: The RL-supported program learned which treatment modalities worked best to improve self-reported OA risk behaviors while conserving counselors’ time. RL-supported interventions represent a scalable solution for patients with pain receiving OA prescriptions. Trial Registration: Clinicaltrials.gov NCT02990377; https://classic.clinicaltrials.gov/ct2/show/NCT02990377 %M 37432726 %R 10.2196/44165 %U https://www.jmir.org/2023/1/e44165 %U https://doi.org/10.2196/44165 %U http://www.ncbi.nlm.nih.gov/pubmed/37432726 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e46344 %T Data Science as a Core Competency in Undergraduate Medical Education in the Age of Artificial Intelligence in Health Care %A Seth,Puneet %A Hueppchen,Nancy %A Miller,Steven D %A Rudzicz,Frank %A Ding,Jerry %A Parakh,Kapil %A Record,Janet D %+ Department of Family Medicine, McMaster University, 100 Main Street West, 6th Floor, Hamilton, ON, L8P 1H6, Canada, 1 4166715114, sethp1@mcmaster.ca %K data science %K medical education %K machine learning %K health data %K artificial intelligence %K AI %K application %K health care delivery %K health care %K develop %K medical educators %K physician %K education %K training %K barriers %K optimize %K integration %K competency %D 2023 %7 11.7.2023 %9 Viewpoint %J JMIR Med Educ %G English %X The increasingly sophisticated and rapidly evolving application of artificial intelligence in medicine is transforming how health care is delivered, highlighting a need for current and future physicians to develop basic competency in the data science that underlies this topic. Medical educators must consider how to incorporate central concepts in data science into their core curricula to train physicians of the future. Similar to how the advent of diagnostic imaging required the physician to understand, interpret, and explain the relevant results to patients, physicians of the future should be able to explain to patients the benefits and limitations of management plans guided by artificial intelligence. We outline major content domains and associated learning outcomes in data science applicable to medical student curricula, suggest ways to incorporate these themes into existing curricula, and note potential implementation barriers and solutions to optimize the integration of this content. %M 37432728 %R 10.2196/46344 %U https://mededu.jmir.org/2023/1/e46344 %U https://doi.org/10.2196/46344 %U http://www.ncbi.nlm.nih.gov/pubmed/37432728 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e47612 %T Artificial Intelligence–Driven Respiratory Distress Syndrome Prediction for Very Low Birth Weight Infants: Korean Multicenter Prospective Cohort Study %A Jang,Woocheol %A Choi,Yong Sung %A Kim,Ji Yoo %A Yon,Dong Keon %A Lee,Young Joo %A Chung,Sung-Hoon %A Kim,Chae Young %A Yeo,Seung Geun %A Lee,Jinseok %+ Biomedical Engineering, Kyung Hee University, 1732, Deogyeong-daero, Giheung-gu, Yongin-si, 17104, Republic of Korea, 82 312012570, gonasago@khu.ac.kr %K artificial intelligence %K deep neural network %K premature infants %K respiratory distress syndrome %K AI %K AI model %K pediatrics %K neonatal %K maternal health %K machine learning %K deep neural network %D 2023 %7 10.7.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Respiratory distress syndrome (RDS) is a disease that commonly affects premature infants whose lungs are not fully developed. RDS results from a lack of surfactant in the lungs. The more premature the infant is, the greater is the likelihood of having RDS. However, even though not all premature infants have RDS, preemptive treatment with artificial pulmonary surfactant is administered in most cases. Objective: We aimed to develop an artificial intelligence model to predict RDS in premature infants to avoid unnecessary treatment. Methods: In this study, 13,087 very low birth weight infants who were newborns weighing less than 1500 grams were assessed in 76 hospitals of the Korean Neonatal Network. To predict RDS in very low birth weight infants, we used basic infant information, maternity history, pregnancy/birth process, family history, resuscitation procedure, and test results at birth such as blood gas analysis and Apgar score. The prediction performances of 7 different machine learning models were compared, and a 5-layer deep neural network was proposed in order to enhance the prediction performance from the selected features. An ensemble approach combining multiple models from the 5-fold cross-validation was subsequently developed. Results: Our proposed ensemble 5-layer deep neural network consisting of the top 20 features provided high sensitivity (83.03%), specificity (87.50%), accuracy (84.07%), balanced accuracy (85.26%), and area under the curve (0.9187). Based on the model that we developed, a public web application that enables easy access for the prediction of RDS in premature infants was deployed. Conclusions: Our artificial intelligence model may be useful for preparations for neonatal resuscitation, particularly in cases involving the delivery of very low birth weight infants, as it can aid in predicting the likelihood of RDS and inform decisions regarding the administration of surfactant. %M 37428525 %R 10.2196/47612 %U https://www.jmir.org/2023/1/e47612 %U https://doi.org/10.2196/47612 %U http://www.ncbi.nlm.nih.gov/pubmed/37428525 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e46939 %T Putting ChatGPT’s Medical Advice to the (Turing) Test: Survey Study %A Nov,Oded %A Singh,Nina %A Mann,Devin %+ Department of Technology Management, Tandon School of Engineering, New York University, 5 Metrotech, Brooklyn, New York, NY, 11201, United States, 1 646 207 7864, onov@nyu.edu %K artificial intelligence %K AI %K ChatGPT %K large language model %K patient-provider interaction %K chatbot %K feasibility %K ethics %K privacy %K language model %K machine learning %D 2023 %7 10.7.2023 %9 Original Paper %J JMIR Med Educ %G English %X Background: Chatbots are being piloted to draft responses to patient questions, but patients’ ability to distinguish between provider and chatbot responses and patients’ trust in chatbots’ functions are not well established. Objective: This study aimed to assess the feasibility of using ChatGPT (Chat Generative Pre-trained Transformer) or a similar artificial intelligence–based chatbot for patient-provider communication. Methods: A survey study was conducted in January 2023. Ten representative, nonadministrative patient-provider interactions were extracted from the electronic health record. Patients’ questions were entered into ChatGPT with a request for the chatbot to respond using approximately the same word count as the human provider’s response. In the survey, each patient question was followed by a provider- or ChatGPT-generated response. Participants were informed that 5 responses were provider generated and 5 were chatbot generated. Participants were asked—and incentivized financially—to correctly identify the response source. Participants were also asked about their trust in chatbots’ functions in patient-provider communication, using a Likert scale from 1-5. Results: A US-representative sample of 430 study participants aged 18 and older were recruited on Prolific, a crowdsourcing platform for academic studies. In all, 426 participants filled out the full survey. After removing participants who spent less than 3 minutes on the survey, 392 respondents remained. Overall, 53.3% (209/392) of respondents analyzed were women, and the average age was 47.1 (range 18-91) years. The correct classification of responses ranged between 49% (192/392) to 85.7% (336/392) for different questions. On average, chatbot responses were identified correctly in 65.5% (1284/1960) of the cases, and human provider responses were identified correctly in 65.1% (1276/1960) of the cases. On average, responses toward patients’ trust in chatbots’ functions were weakly positive (mean Likert score 3.4 out of 5), with lower trust as the health-related complexity of the task in the questions increased. Conclusions: ChatGPT responses to patient questions were weakly distinguishable from provider responses. Laypeople appear to trust the use of chatbots to answer lower-risk health questions. It is important to continue studying patient-chatbot interaction as chatbots move from administrative to more clinical roles in health care. %M 37428540 %R 10.2196/46939 %U https://mededu.jmir.org/2023/1/e46939 %U https://doi.org/10.2196/46939 %U http://www.ncbi.nlm.nih.gov/pubmed/37428540 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e42313 %T Application of a Comprehensive Evaluation Framework to COVID-19 Studies: Systematic Review of Translational Aspects of Artificial Intelligence in Health Care %A Casey,Aaron Edward %A Ansari,Saba %A Nakisa,Bahareh %A Kelly,Blair %A Brown,Pieta %A Cooper,Paul %A Muhammad,Imran %A Livingstone,Steven %A Reddy,Sandeep %A Makinen,Ville-Petteri %+ South Australian Health and Medical Research Institute, North Terrace, Adelaide, 5000, Australia, 61 08 8128 4064, aaron.casey@sahmri.com %K artificial intelligence %K health care %K clinical translation %K translational value %K evaluation %K capability %K utility %K adoption %K COVID-19 %K AI application %K health care AI %K model validation %K AI model %K AI tools %D 2023 %7 6.7.2023 %9 Review %J JMIR AI %G English %X Background: Despite immense progress in artificial intelligence (AI) models, there has been limited deployment in health care environments. The gap between potential and actual AI applications is likely due to the lack of translatability between controlled research environments (where these models are developed) and clinical environments for which the AI tools are ultimately intended. Objective: We previously developed the Translational Evaluation of Healthcare AI (TEHAI) framework to assess the translational value of AI models and to support successful transition to health care environments. In this study, we applied the TEHAI framework to the COVID-19 literature in order to assess how well translational topics are covered. Methods: A systematic literature search for COVID-19 AI studies published between December 2019 and December 2020 resulted in 3830 records. A subset of 102 (2.7%) papers that passed the inclusion criteria was sampled for full review. The papers were assessed for translational value and descriptive data collected by 9 reviewers (each study was assessed by 2 reviewers). Evaluation scores and extracted data were compared by a third reviewer for resolution of discrepancies. The review process was conducted on the Covidence software platform. Results: We observed a significant trend for studies to attain high scores for technical capability but low scores for the areas essential for clinical translatability. Specific questions regarding external model validation, safety, nonmaleficence, and service adoption received failed scores in most studies. Conclusions: Using TEHAI, we identified notable gaps in how well translational topics of AI models are covered in the COVID-19 clinical sphere. These gaps in areas crucial for clinical translatability could, and should, be considered already at the model development stage to increase translatability into real COVID-19 health care environments. %M 37457747 %R 10.2196/42313 %U https://ai.jmir.org/2023/1/e42313 %U https://doi.org/10.2196/42313 %U http://www.ncbi.nlm.nih.gov/pubmed/37457747 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e46427 %T Deep Learning–Assisted Gait Parameter Assessment for Neurodegenerative Diseases: Model Development and Validation %A Jing,Yu %A Qin,Peinuan %A Fan,Xiangmin %A Qiang,Wei %A Wencheng,Zhu %A Sun,Wei %A Tian,Feng %A Wang,Dakuo %+ Institute of Software, Chinese Academy of Sciences, No. 4, South 4th Street, Haidian District, Beijing, 100190, China, 86 18810117223, sanqsunwei@gmail.com %K deep learning %K neurodegenerative disease %K auxiliary medical care %K gait parameter assessment %D 2023 %7 5.7.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Neurodegenerative diseases (NDDs) are prevalent among older adults worldwide. Early diagnosis of NDD is challenging yet crucial. Gait status has been identified as an indicator of early-stage NDD changes and can play a significant role in diagnosis, treatment, and rehabilitation. Historically, gait assessment has relied on intricate but imprecise scales by trained professionals or required patients to wear additional equipment, causing discomfort. Advancements in artificial intelligence may completely transform this and offer a novel approach to gait evaluation. Objective: This study aimed to use cutting-edge machine learning techniques to offer patients a noninvasive, entirely contactless gait assessment and provide health care professionals with precise gait assessment results covering all common gait-related parameters to assist in diagnosis and rehabilitation planning. Methods: Data collection involved motion data from 41 different participants aged 25 to 85 (mean 57.51, SD 12.93) years captured in motion sequences using the Azure Kinect (Microsoft Corp; a 3D camera with a 30-Hz sampling frequency). Support vector machine (SVM) and bidirectional long short-term memory (Bi-LSTM) classifiers trained using spatiotemporal features extracted from raw data were used to identify gait types in each walking frame. Gait semantics could then be obtained from the frame labels, and all the gait parameters could be calculated accordingly. For optimal generalization performance of the model, the classifiers were trained using a 10-fold cross-validation strategy. The proposed algorithm was also compared with the previous best heuristic method. Qualitative and quantitative feedback from medical staff and patients in actual medical scenarios was extensively collected for usability analysis. Results: The evaluations comprised 3 aspects. Regarding the classification results from the 2 classifiers, Bi-LSTM achieved an average precision, recall, and F1-score of 90.54%, 90.41%, and 90.38%, respectively, whereas these metrics were 86.99%, 86.62%, and 86.67%, respectively, for SVM. Moreover, the Bi-LSTM–based method attained 93.2% accuracy in gait segmentation evaluation (tolerance set to 2), whereas that of the SVM-based method achieved only 77.5% accuracy. For the final gait parameter calculation result, the average error rate of the heuristic method, SVM, and Bi-LSTM was 20.91% (SD 24.69%), 5.85% (SD 5.45%), and 3.17% (SD 2.75%), respectively. Conclusions: This study demonstrated that the Bi-LSTM–based approach can effectively support accurate gait parameter assessment, assisting medical professionals in making early diagnoses and reasonable rehabilitation plans for patients with NDD. %M 37405831 %R 10.2196/46427 %U https://www.jmir.org/2023/1/e46427 %U https://doi.org/10.2196/46427 %U http://www.ncbi.nlm.nih.gov/pubmed/37405831 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e45477 %T Internet of Things and New Technologies for Tracking Perioperative Patients With an Innovative Model for Operating Room Scheduling: Protocol for a Development and Feasibility Study %A Bottani,Eleonora %A Bellini,Valentina %A Mordonini,Monica %A Pellegrino,Mattia %A Lombardo,Gianfranco %A Franchi,Beatrice %A Craca,Michelangelo %A Bignami,Elena %+ Anesthesiology, Critical Care and Pain Medicine Division, Department of Medicine and Surgery, University of Parma, Viale Gramsci 14, Parma, 43126, Italy, 39 0521 033609, elenagiovanna.bignami@unipr.it %K internet of things %K artificial intelligence %K machine learning %K perioperative organization %K operating rooms %D 2023 %7 5.7.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: Management of operating rooms is a critical point in health care organizations because surgical departments represent a significant cost in hospital budgets. Therefore, it is increasingly important that there is effective planning of elective, emergency, and day surgery and optimization of both the human and physical resources available, always maintaining a high level of care and health treatment. This would lead to a reduction in patient waiting lists and better performance not only of surgical departments but also of the entire hospital. Objective: This study aims to automatically collect data from a real surgical scenario to develop an integrated technological-organizational model that optimizes operating block resources. Methods: Each patient is tracked and located in real time by wearing a bracelet sensor with a unique identifier. Exploiting the indoor location, the software architecture is able to collect the time spent for every step inside the surgical block. This method does not in any way affect the level of assistance that the patient receives and always protects their privacy; in fact, after expressing informed consent, each patient will be associated with an anonymous identification number. Results: The preliminary results are promising, making the study feasible and functional. Times automatically recorded are much more precise than those collected by humans and reported in the organization’s information system. In addition, machine learning can exploit the historical data collection to predict the surgery time required for each patient according to the patient’s specific profile. Simulation can also be applied to reproduce the system’s functioning, evaluate current performance, and identify strategies to improve the efficiency of the operating block. Conclusions: This functional approach improves short- and long-term surgical planning, facilitating interaction between the various professionals involved in the operating block, optimizing the management of available resources, and guaranteeing a high level of patient care in an increasingly efficient health care system. Trial Registration: ClinicalTrials.gov NCT05106621; https://clinicaltrials.gov/ct2/show/NCT05106621 International Registered Report Identifier (IRRID): DERR1-10.2196/45477 %M 37405821 %R 10.2196/45477 %U https://www.researchprotocols.org/2023/1/e45477 %U https://doi.org/10.2196/45477 %U http://www.ncbi.nlm.nih.gov/pubmed/37405821 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e46484 %T Automatically Identifying Self-Reports of COVID-19 Diagnosis on Twitter: An Annotated Data Set, Deep Neural Network Classifiers, and a Large-Scale Cohort %A Klein,Ari Z %A Kunatharaju,Shriya %A O'Connor,Karen %A Gonzalez-Hernandez,Graciela %+ Department of Computational Biomedicine, Cedars-Sinai Medical Center, Pacific Design Center, Ste G549F, 700 N San Vicente Blvd, West Hollywood, CA, 90069, United States, 1 310 423 3521, Graciela.GonzalezHernandez@csmc.edu %K natural language processing %K data mining %K social media %K COVID-19 %K Twitter %D 2023 %7 3.7.2023 %9 Research Letter %J J Med Internet Res %G English %X %M 37399062 %R 10.2196/46484 %U https://www.jmir.org/2023/1/e46484 %U https://doi.org/10.2196/46484 %U http://www.ncbi.nlm.nih.gov/pubmed/37399062 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e43154 %T Machine and Deep Learning for Tuberculosis Detection on Chest X-Rays: Systematic Literature Review %A Hansun,Seng %A Argha,Ahmadreza %A Liaw,Siaw-Teng %A Celler,Branko G %A Marks,Guy B %+ South West Sydney (SWS), School of Clinical Medicine, University of New South Wales, Burnside Drive, Warwick Farm, New South Wales, Sydney, 2170, Australia, 61 456541224, s.hansun@unsw.edu.au %K chest x-rays %K convolutional neural networks %K diagnostic test accuracy %K machine and deep learning %K PRISMA guidelines %K risk of bias %K QUADAS-2 %K sensitivity and specificity %K systematic literature review %K tuberculosis detection %D 2023 %7 3.7.2023 %9 Review %J J Med Internet Res %G English %X Background: Tuberculosis (TB) was the leading infectious cause of mortality globally prior to COVID-19 and chest radiography has an important role in the detection, and subsequent diagnosis, of patients with this disease. The conventional experts reading has substantial within- and between-observer variability, indicating poor reliability of human readers. Substantial efforts have been made in utilizing various artificial intelligence–based algorithms to address the limitations of human reading of chest radiographs for diagnosing TB. Objective: This systematic literature review (SLR) aims to assess the performance of machine learning (ML) and deep learning (DL) in the detection of TB using chest radiography (chest x-ray [CXR]). Methods: In conducting and reporting the SLR, we followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. A total of 309 records were identified from Scopus, PubMed, and IEEE (Institute of Electrical and Electronics Engineers) databases. We independently screened, reviewed, and assessed all available records and included 47 studies that met the inclusion criteria in this SLR. We also performed the risk of bias assessment using Quality Assessment of Diagnostic Accuracy Studies version 2 (QUADAS-2) and meta-analysis of 10 included studies that provided confusion matrix results. Results: Various CXR data sets have been used in the included studies, with 2 of the most popular ones being Montgomery County (n=29) and Shenzhen (n=36) data sets. DL (n=34) was more commonly used than ML (n=7) in the included studies. Most studies used human radiologist’s report as the reference standard. Support vector machine (n=5), k-nearest neighbors (n=3), and random forest (n=2) were the most popular ML approaches. Meanwhile, convolutional neural networks were the most commonly used DL techniques, with the 4 most popular applications being ResNet-50 (n=11), VGG-16 (n=8), VGG-19 (n=7), and AlexNet (n=6). Four performance metrics were popularly used, namely, accuracy (n=35), area under the curve (AUC; n=34), sensitivity (n=27), and specificity (n=23). In terms of the performance results, ML showed higher accuracy (mean ~93.71%) and sensitivity (mean ~92.55%), while on average DL models achieved better AUC (mean ~92.12%) and specificity (mean ~91.54%). Based on data from 10 studies that provided confusion matrix results, we estimated the pooled sensitivity and specificity of ML and DL methods to be 0.9857 (95% CI 0.9477-1.00) and 0.9805 (95% CI 0.9255-1.00), respectively. From the risk of bias assessment, 17 studies were regarded as having unclear risks for the reference standard aspect and 6 studies were regarded as having unclear risks for the flow and timing aspect. Only 2 included studies had built applications based on the proposed solutions. Conclusions: Findings from this SLR confirm the high potential of both ML and DL for TB detection using CXR. Future studies need to pay a close attention on 2 aspects of risk of bias, namely, the reference standard and the flow and timing aspects. Trial Registration: PROSPERO CRD42021277155; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=277155 %M 37399055 %R 10.2196/43154 %U https://www.jmir.org/2023/1/e43154 %U https://doi.org/10.2196/43154 %U http://www.ncbi.nlm.nih.gov/pubmed/37399055 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e47479 %T Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument %A Walker,Harriet Louise %A Ghani,Shahi %A Kuemmerli,Christoph %A Nebiker,Christian Andreas %A Müller,Beat Peter %A Raptis,Dimitri Aristotle %A Staubli,Sebastian Manuel %+ Royal Free London NHS Foundation Trust, Pond Street, London, NW3 2QG, United Kingdom, 44 20 7794 0500, s.staubli@nhs.net %K artificial intelligence %K internet information %K patient information %K ChatGPT %K EQIP tool %K chatbot %K chatbots %K conversational agent %K conversational agents %K internal medicine %K pancreas %K liver %K hepatic %K biliary %K gall %K bile %K gallstone %K pancreatitis %K pancreatic %K medical information %D 2023 %7 30.6.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: ChatGPT-4 is the latest release of a novel artificial intelligence (AI) chatbot able to answer freely formulated and complex questions. In the near future, ChatGPT could become the new standard for health care professionals and patients to access medical information. However, little is known about the quality of medical information provided by the AI. Objective: We aimed to assess the reliability of medical information provided by ChatGPT. Methods: Medical information provided by ChatGPT-4 on the 5 hepato-pancreatico-biliary (HPB) conditions with the highest global disease burden was measured with the Ensuring Quality Information for Patients (EQIP) tool. The EQIP tool is used to measure the quality of internet-available information and consists of 36 items that are divided into 3 subsections. In addition, 5 guideline recommendations per analyzed condition were rephrased as questions and input to ChatGPT, and agreement between the guidelines and the AI answer was measured by 2 authors independently. All queries were repeated 3 times to measure the internal consistency of ChatGPT. Results: Five conditions were identified (gallstone disease, pancreatitis, liver cirrhosis, pancreatic cancer, and hepatocellular carcinoma). The median EQIP score across all conditions was 16 (IQR 14.5-18) for the total of 36 items. Divided by subsection, median scores for content, identification, and structure data were 10 (IQR 9.5-12.5), 1 (IQR 1-1), and 4 (IQR 4-5), respectively. Agreement between guideline recommendations and answers provided by ChatGPT was 60% (15/25). Interrater agreement as measured by the Fleiss κ was 0.78 (P<.001), indicating substantial agreement. Internal consistency of the answers provided by ChatGPT was 100%. Conclusions: ChatGPT provides medical information of comparable quality to available static internet information. Although currently of limited quality, large language models could become the future standard for patients and health care professionals to gather medical information. %M 37389908 %R 10.2196/47479 %U https://www.jmir.org/2023/1/e47479 %U https://doi.org/10.2196/47479 %U http://www.ncbi.nlm.nih.gov/pubmed/37389908 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e48447 %T Evaluating an Innovative HIV Self-Testing Service With Web-Based, Real-Time Counseling Provided by an Artificial Intelligence Chatbot (HIVST-Chatbot) in Increasing HIV Self-Testing Use Among Chinese Men Who Have Sex With Men: Protocol for a Noninferiority Randomized Controlled Trial %A Chen,Siyu %A Zhang,Qingpeng %A Chan,Chee-kit %A Yu,Fuk-yuen %A Chidgey,Andrew %A Fang,Yuan %A Mo,Phoenix K H %A Wang,Zixin %+ Centre for Health Behaviours Research, Jockey Club School of Public Health and Primary Care, Faculty of Medicine, The Chinese University of Hong Kong, Room 508, School of Public Health, Prince of Wales Hospital, Shatin, N.T., Hong Kong, 999077, Hong Kong, 852 2252 8740, wangzx@cuhk.edu.hk %K Chatbot %K counseling %K HIV self-testing %K men who have sex with men %K non-inferiority randomized controlled trial %D 2023 %7 30.6.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: Counseling support for HIV self-testing (HIVST) users is essential to ensure support and linkage to care among men who have sex with men (MSM). An HIVST service with web-based real-time instruction, pretest, and posttest counseling provided by trained administrators (HIVST-OIC) was developed by previous projects. Although the HIVST-OIC was highly effective in increasing HIVST uptake and the proportion of HIVST users receiving counseling along with testing, it required intensive resources to implement and sustain. The service capacity of HIVST-OIC cannot meet the increasing demands of HIVST. Objective: This randomized controlled trial primarily aims to establish whether HIVST-chatbot, an innovative HIVST service with web-based real-time instruction and counseling provided by a fully automated chatbot, would produce effects that are similar to HIVST-OIC in increasing HIVST uptake and the proportion of HIVST users receiving counseling alongside testing among MSM within a 6-month follow-up period. Methods: A parallel-group, noninferiority randomized controlled trial will be conducted with Chinese-speaking MSM aged ≥18 years with access to live-chat applications. A total of 528 participants will be recruited through multiple sources, including outreach in gay venues, web-based advertisement, and peer referral. After completing the baseline telephone survey, participants will be randomized evenly into the intervention or control groups. Intervention group participants will watch a web-based video promoting HIVST-chatbot and receive a free HIVST kit. The chatbot will contact the participant to implement HIVST and provide standard-of-care, real-time pretest and posttest counseling and instructions on how to use the HIVST kit through WhatsApp. Control group participants will watch a web-based video promoting HIVST-OIC and receive a free HIVST kit in the same manner. Upon appointment, a trained testing administrator will implement HIVST and provide standard-of-care, real-time pretest and posttest counseling and instructions on how to use the HIVST kit through live-chat applications. All participants will complete a telephone follow-up survey 6 months after the baseline. The primary outcomes are HIVST uptake and the proportion of HIVST users receiving counseling support along with testing in the past 6 months, measured at month 6. Secondary outcomes include sexual risk behaviors and uptake of HIV testing other than HIVST during the follow-up period. Intention-to-treat analysis will be used. Results: Recruitment and enrollment of participants started in April 2023. Conclusions: This study will generate important research and policy implications regarding chatbot use in HIVST services. If HIVST-chatbot is proven noninferior to HIVST-OIC, it can be easily integrated into existing HIVST services in Hong Kong, given its relatively low resource requirements for implementation and maintenance. HIVST-chatbot can potentially overcome the barriers to using HIVST. Therefore, the coverage of HIV testing, the level of support, and the linkage to care for MSM HIVST users will be increased. Trial Registration: ClinicalTrial.gov NCT05796622; https://clinicaltrials.gov/ct2/show/NCT05796622 International Registered Report Identifier (IRRID): PRR1-10.2196/48447 %M 37389935 %R 10.2196/48447 %U https://www.researchprotocols.org/2023/1/e48447 %U https://doi.org/10.2196/48447 %U http://www.ncbi.nlm.nih.gov/pubmed/37389935 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e48002 %T Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study %A Takagi,Soshi %A Watari,Takashi %A Erabi,Ayano %A Sakaguchi,Kota %+ General Medicine Center, Shimane University Hospital, 89-1, Enya, Izumo, 693-8501, Japan, 81 0853 20 2217, wataritari@gmail.com %K ChatGPT %K Chat Generative Pre-trained Transformer %K GPT-4 %K Generative Pre-trained Transformer 4 %K artificial intelligence %K AI %K medical education %K Japanese Medical Licensing Examination %K medical licensing %K clinical support %K learning model %D 2023 %7 29.6.2023 %9 Original Paper %J JMIR Med Educ %G English %X Background: The competence of ChatGPT (Chat Generative Pre-Trained Transformer) in non-English languages is not well studied. Objective: This study compared the performances of GPT-3.5 (Generative Pre-trained Transformer) and GPT-4 on the Japanese Medical Licensing Examination (JMLE) to evaluate the reliability of these models for clinical reasoning and medical knowledge in non-English languages. Methods: This study used the default mode of ChatGPT, which is based on GPT-3.5; the GPT-4 model of ChatGPT Plus; and the 117th JMLE in 2023. A total of 254 questions were included in the final analysis, which were categorized into 3 types, namely general, clinical, and clinical sentence questions. Results: The results indicated that GPT-4 outperformed GPT-3.5 in terms of accuracy, particularly for general, clinical, and clinical sentence questions. GPT-4 also performed better on difficult questions and specific disease questions. Furthermore, GPT-4 achieved the passing criteria for the JMLE, indicating its reliability for clinical reasoning and medical knowledge in non-English languages. Conclusions: GPT-4 could become a valuable tool for medical education and clinical support in non–English-speaking regions, such as Japan. %M 37384388 %R 10.2196/48002 %U https://mededu.jmir.org/2023/1/e48002 %U https://doi.org/10.2196/48002 %U http://www.ncbi.nlm.nih.gov/pubmed/37384388 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e48568 %T Utility of ChatGPT in Clinical Practice %A Liu,Jialin %A Wang,Changyu %A Liu,Siru %+ Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End Ave #1475, Nashville, TN, 37212, United States, 1 615 875 5216, siru.liu@vumc.org %K ChatGPT %K artificial intelligence %K large language models %K clinical practice %K large language model %K natural language processing %K NLP %K doctor-patient %K patient-physician %K communication %K challenges %K barriers %K recommendations %K guidance %K guidelines %K best practices %K risks %D 2023 %7 28.6.2023 %9 Viewpoint %J J Med Internet Res %G English %X ChatGPT is receiving increasing attention and has a variety of application scenarios in clinical practice. In clinical decision support, ChatGPT has been used to generate accurate differential diagnosis lists, support clinical decision-making, optimize clinical decision support, and provide insights for cancer screening decisions. In addition, ChatGPT has been used for intelligent question-answering to provide reliable information about diseases and medical queries. In terms of medical documentation, ChatGPT has proven effective in generating patient clinical letters, radiology reports, medical notes, and discharge summaries, improving efficiency and accuracy for health care providers. Future research directions include real-time monitoring and predictive analytics, precision medicine and personalized treatment, the role of ChatGPT in telemedicine and remote health care, and integration with existing health care systems. Overall, ChatGPT is a valuable tool that complements the expertise of health care providers and improves clinical decision-making and patient care. However, ChatGPT is a double-edged sword. We need to carefully consider and study the benefits and potential dangers of ChatGPT. In this viewpoint, we discuss recent advances in ChatGPT research in clinical practice and suggest possible risks and challenges of using ChatGPT in clinical practice. It will help guide and support future artificial intelligence research similar to ChatGPT in health. %M 37379067 %R 10.2196/48568 %U https://www.jmir.org/2023/1/e48568 %U https://doi.org/10.2196/48568 %U http://www.ncbi.nlm.nih.gov/pubmed/37379067 %0 Journal Article %@ 2564-1891 %I JMIR Publications %V 3 %N %P e39895 %T Open-Source Intelligence for Detection of Radiological Events and Syndromes Following the Invasion of Ukraine in 2022: Observational Study %A Stone,Haley %A Heslop,David %A Lim,Samsung %A Sarmiento,Ines %A Kunasekaran,Mohana %A MacIntyre,C Raina %+ Biosecurity Program, The Kirby Institute, Faculty of Medicine, University of New South Wales, Level 6 Wallace Wurth Building, High Street, Kensington, 2052, Australia, 61 2 9348 0672, haley.c.stone@protonmail.com %K artificial intelligence %K contamination %K data source %K early warning %K emergency response %K environmental health %K open source %K open-source intelligence %K OSINT %K power plant %K public health %K radiation %K radiobiological events %K radiological %K sensor %K Ukraine %D 2023 %7 28.6.2023 %9 Original Paper %J JMIR Infodemiology %G English %X Background: On February 25, 2022, Russian forces took control of the Chernobyl power plant after continuous fighting within the Chernobyl exclusion zone. Continual events occurred in the month of March, which raised the risk of potential contamination of previously uncontaminated areas and the potential for impacts on human and environmental health. The disruption of war has caused interruptions to normal preventive activities, and radiation monitoring sensors have been nonfunctional. Open-source intelligence can be informative when formal reporting and data are unavailable. Objective: This paper aimed to demonstrate the value of open-source intelligence in Ukraine to identify signals of potential radiological events of health significance during the Ukrainian conflict. Methods: Data were collected from search terminology for radiobiological events and acute radiation syndrome detection between February 1 and March 20, 2022, using 2 open-source intelligence (OSINT) systems, EPIWATCH and Epitweetr. Results: Both EPIWATCH and Epitweetr identified signals of potential radiobiological events throughout Ukraine, particularly on March 4 in Kyiv, Bucha, and Chernobyl. Conclusions: Open-source data can provide valuable intelligence and early warning about potential radiation hazards in conditions of war, where formal reporting and mitigation may be lacking, to enable timely emergency and public health responses. %M 37379069 %R 10.2196/39895 %U https://infodemiology.jmir.org/2023/1/e39895 %U https://doi.org/10.2196/39895 %U http://www.ncbi.nlm.nih.gov/pubmed/37379069 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e43633 %T Predicting Disengagement to Better Support Outcomes in a Web-Based Weight Loss Program Using Machine Learning Models: Cross-Sectional Study %A Brankovic,Aida %A Hendrie,Gilly A %A Baird,Danielle L %A Khanna,Sankalp %+ The Australian e-Health Research Centre, Health & Biosecurity, Commonwealth Scientific Industrial Research Organisation, STARS building Level 7, Herston, Brisbane, 4029, Australia, 61 732533629, aida.brankovic@csiro.au %K web-based weight loss program %K predicting engagement %K machine learning–driven intervention %K machine learning %K artificial intelligence %D 2023 %7 26.6.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Engagement is key to interventions that achieve successful behavior change and improvements in health. There is limited literature on the application of predictive machine learning (ML) models to data from commercially available weight loss programs to predict disengagement. Such data could help participants achieve their goals. Objective: This study aimed to use explainable ML to predict the risk of member disengagement week by week over 12 weeks on a commercially available web-based weight loss program. Methods: Data were available from 59,686 adults who participated in the weight loss program between October 2014 and September 2019. Data included year of birth, sex, height, weight, motivation to join the program, use statistics (eg, weight entries, entries into the food diary, views of the menu, and program content), program type, and weight loss. Random forest, extreme gradient boosting, and logistic regression with L1 regularization models were developed and validated using a 10-fold cross-validation approach. In addition, temporal validation was performed on a test cohort of 16,947 members who participated in the program between April 2018 and September 2019, and the remaining data were used for model development. Shapley values were used to identify globally relevant features and explain individual predictions. Results: The average age of the participants was 49.60 (SD 12.54) years, the average starting BMI was 32.43 (SD 6.19), and 81.46% (39,594/48,604) of the participants were female. The class distributions (active and inactive members) changed from 39,369 and 9235 in week 2 to 31,602 and 17,002 in week 12, respectively. With 10-fold-cross-validation, extreme gradient boosting models had the best predictive performance, which ranged from 0.85 (95% CI 0.84-0.85) to 0.93 (95% CI 0.93-0.93) for area under the receiver operating characteristic curve and from 0.57 (95% CI 0.56-0.58) to 0.95 (95% CI 0.95-0.96) for area under the precision-recall curve (across 12 weeks of the program). They also presented a good calibration. Results obtained with temporal validation ranged from 0.51 to 0.95 for area under a precision-recall curve and 0.84 to 0.93 for area under the receiver operating characteristic curve across the 12 weeks. There was a considerable improvement in area under a precision-recall curve of 20% in week 3 of the program. On the basis of the computed Shapley values, the most important features for predicting disengagement in the following week were those related to the total activity on the platform and entering a weight in the previous weeks. Conclusions: This study showed the potential of applying ML predictive algorithms to help predict and understand participants’ disengagement with a web-based weight loss program. Given the association between engagement and health outcomes, these findings can prove valuable in providing better support to individuals to enhance their engagement and potentially achieve greater weight loss. %M 37358890 %R 10.2196/43633 %U https://www.jmir.org/2023/1/e43633 %U https://doi.org/10.2196/43633 %U http://www.ncbi.nlm.nih.gov/pubmed/37358890 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e43107 %T Machine Learning Approaches to Classify Self-Reported Rheumatoid Arthritis Health Scores Using Activity Tracker Data: Longitudinal Observational Study %A Rao,Kaushal %A Speier,William %A Meng,Yiwen %A Wang,Jinhan %A Ramesh,Nidhi %A Xie,Fenglong %A Su,Yujie %A Nowell,W Benjamin %A Curtis,Jeffrey R %A Arnold,Corey %+ Division of Clinical Immunology and Rheumatology, University of Alabama at Birmingham, 1825 University Blvd, Shelby 121H, Birmingham, AL, 35233, United States, 1 205 937 0585, jcurtis@uab.edu %K rheumatoid arthritis %K rheumatic %K rheumatism %K Fitbit %K classification %K physical data %K digital health %K activity tracker %K mobile health %K machine learning %K model %K patient reported %K outcome measure %K PROMIS %K nonclinical monitoring %K mHealth %K tracker %K wearable %K arthritis %K mobile phone %D 2023 %7 26.6.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: The increasing use of activity trackers in mobile health studies to passively collect physical data has shown promise in lessening participation burden to provide actively contributed patient-reported outcome (PRO) information. Objective: The aim of this study was to develop machine learning models to classify and predict PRO scores using Fitbit data from a cohort of patients with rheumatoid arthritis. Methods: Two different models were built to classify PRO scores: a random forest classifier model that treated each week of observations independently when making weekly predictions of PRO scores, and a hidden Markov model that additionally took correlations between successive weeks into account. Analyses compared model evaluation metrics for (1) a binary task of distinguishing a normal PRO score from a severe PRO score and (2) a multiclass task of classifying a PRO score state for a given week. Results: For both the binary and multiclass tasks, the hidden Markov model significantly (P<.05) outperformed the random forest model for all PRO scores, and the highest area under the curve, Pearson correlation coefficient, and Cohen κ coefficient were 0.750, 0.479, and 0.471, respectively. Conclusions: While further validation of our results and evaluation in a real-world setting remains, this study demonstrates the ability of physical activity tracker data to classify health status over time in patients with rheumatoid arthritis and enables the possibility of scheduling preventive clinical interventions as needed. If patient outcomes can be monitored in real time, there is potential to improve clinical care for patients with other chronic conditions. %M 37017471 %R 10.2196/43107 %U https://formative.jmir.org/2023/1/e43107 %U https://doi.org/10.2196/43107 %U http://www.ncbi.nlm.nih.gov/pubmed/37017471 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e46684 %T Risk of Bias Mitigation for Vulnerable and Diverse Groups in Community-Based Primary Health Care Artificial Intelligence Models: Protocol for a Rapid Review %A Sasseville,Maxime %A Ouellet,Steven %A Rhéaume,Caroline %A Couture,Vincent %A Després,Philippe %A Paquette,Jean-Sébastien %A Gentelet,Karine %A Darmon,David %A Bergeron,Frédéric %A Gagnon,Marie-Pierre %+ Faculté des sciences infirmières, Université Laval, 1050, avenue de la Médecine, Quebec, QC, G1V 0A6, Canada, 1 418 656 3356, maxime.sasseville@fsi.ulaval.ca %K health equity %K health disparity %K algorithms %K artificial intelligence %K primary care %K rapid review %K systematic review %D 2023 %7 26.6.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: The current literature identifies several potential benefits of artificial intelligence models for populations’ health and health care systems' efficiency. However, there is a lack of understanding on how the risk of bias is considered in the development of primary health care and community health service artificial intelligence algorithms and to what extent they perpetuate or introduce potential biases toward groups that could be considered vulnerable in terms of their characteristics. To the best of our knowledge, no reviews are currently available to identify relevant methods to assess the risk of bias in these algorithms. The primary research question of this review is which strategies can assess the risk of bias in primary health care algorithms toward vulnerable or diverse groups? Objective: This review aims to identify relevant methods to assess the risk of bias toward vulnerable or diverse groups in the development or deployment of algorithms in community-based primary health care and mitigation interventions deployed to promote and increase equity, diversity, and inclusion. This review looks at what attempts to mitigate bias have been documented and which vulnerable or diverse groups have been considered. Methods: A rapid systematic review of the scientific literature will be conducted. In November 2022, an information specialist developed a specific search strategy based on the main concepts of our primary review question in 4 relevant databases in the last 5 years. We completed the search strategy in December 2022, and 1022 sources were identified. Since February 2023, two reviewers independently screened the titles and abstracts on the Covidence systematic review software. Conflicts are solved through consensus and discussion with a senior researcher. We include all studies on methods developed or tested to assess the risk of bias in algorithms that are relevant in community-based primary health care. Results: In early May 2023, almost 47% (479/1022) of the titles and abstracts have been screened. We completed this first stage in May 2023. In June and July 2023, two reviewers will independently apply the same criteria to full texts, and all exclusion motives will be recorded. Data from selected studies will be extracted using a validated grid in August and analyzed in September 2023. Results will be presented using structured qualitative narrative summaries and submitted for publication by the end of 2023. Conclusions: The approach to identifying methods and target populations of this review is primarily qualitative. However, we will consider a meta-analysis if quantitative data and results are sufficient. This review will develop structured qualitative summaries of strategies to mitigate bias toward vulnerable populations and diverse groups in artificial intelligence models. This could be useful to researchers and other stakeholders to identify potential sources of bias in algorithms and try to reduce or eliminate them. Trial Registration: OSF Registries qbph8; https://osf.io/qbph8 International Registered Report Identifier (IRRID): DERR1-10.2196/46684 %M 37358896 %R 10.2196/46684 %U https://www.researchprotocols.org/2023/1/e46684 %U https://doi.org/10.2196/46684 %U http://www.ncbi.nlm.nih.gov/pubmed/37358896 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e46014 %T Application of Artificial Intelligence in Geriatric Care: Bibliometric Analysis %A Wang,Jingjing %A Liang,Yiqing %A Cao,Songmei %A Cai,Peixuan %A Fan,Yimeng %+ Department of Nursing, The Affiliated Hospital of Jiangsu University, No 438 North Jiefang Road, Jingkou District, Jiangsu Province, Zhenjiang, 212000, China, 86 13815159881, caosongmei75@126.com %K artificial intelligence %K older adults %K geriatric care %K bibliometric analysis %D 2023 %7 23.6.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) can improve the health and well-being of older adults and has the potential to assist and improve nursing care. In recent years, research in this area has been increasing. Therefore, it is necessary to understand the status of development and main research hotspots and identify the main contributors and their relationships in the application of AI in geriatric care via bibliometric analysis. Objective: Using bibliometric analysis, this study aims to examine the current research hotspots and collaborative networks in the application of AI in geriatric care over the past 23 years. Methods: The Web of Science Core Collection database was used as a source. All publications from inception to August 2022 were downloaded. The external characteristics of the publications were summarized through HistCite and the Web of Science. Keywords and collaborative networks were analyzed using VOSviewers and Citespace. Results: We obtained a total of 230 publications. The works originated in 499 institutions in 39 countries, were published in 124 journals, and were written by 1216 authors. Publications increased sharply from 2014 to 2022, accounting for 90.87% (209/230) of all publications. The United States and the International Journal of Social Robotics had the highest number of publications on this topic. The 1216 authors were divided into 5 main clusters. Among the 230 publications, 4 clusters were modeled, including Alzheimer disease, aged care, acceptance, and the surveillance and treatment of diseases. Machine learning, deep learning, and rehabilitation had also become recent research hotspots. Conclusions: Research on the application of AI in geriatric care has developed rapidly. The development of research and cooperation among countries/regions and institutions are limited. In the future, strengthening the cooperation and communication between different countries/regions and institutions may further drive this field’s development. This study provides researchers with the information necessary to understand the current state, collaborative networks, and main research hotspots of the field. In addition, our results suggest a series of recommendations for future research. %M 37351923 %R 10.2196/46014 %U https://www.jmir.org/2023/1/e46014 %U https://doi.org/10.2196/46014 %U http://www.ncbi.nlm.nih.gov/pubmed/37351923 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 6 %N %P e44913 %T Experiences Regarding Use and Implementation of Artificial Intelligence–Supported Follow-Up of Atypical Moles at a Dermatological Outpatient Clinic: Qualitative Study %A Haugsten,Elisabeth Rygvold %A Vestergaard,Tine %A Trettin,Bettina %+ Department of Dermatology and Allergy Centre, Odense University Hospital, Kløvervænget 15, Odense, 5000, Denmark, 45 91856560, elisabeth_rh@hotmail.com %K artificial intelligence %K AI %K computer-assisted diagnosis %K CAD %K dermatology %K diagnostic tool %K FotoFinder %K implementation %K interview %K melanoma %K Moleanalyzer Pro %K total body dermoscopy %K TBD %D 2023 %7 23.6.2023 %9 Original Paper %J JMIR Dermatol %G English %X Background: Artificial intelligence (AI) is increasingly used in numerous medical fields. In dermatology, AI can be used in the form of computer-assisted diagnosis (CAD) systems when assessing and diagnosing skin lesions suspicious of melanoma, a potentially lethal skin cancer with rising incidence all over the world. In particular, CAD may be a valuable tool in the follow-up of patients with high risk of developing melanoma, such as patients with multiple atypical moles. One such CAD system, ATBM Master (FotoFinder), can execute total body dermoscopy (TBD). This process comprises automatically photographing a patient´s entire body and then neatly displaying moles on a computer screen, grouped according to their clinical relevance. Proprietary FotoFinder algorithms underlie this organized presentation of moles. In addition, ATBM Master’s optional convoluted neural network (CNN)-based Moleanalyzer Pro software can be used to further assess moles and estimate their probability of malignancy. Objective: Few qualitative studies have been conducted on the implementation of AI-supported procedures in dermatology. Therefore, the purpose of this study was to investigate how health care providers experience the use and implementation of a CAD system like ATBM Master, in particular its TBD module. In this way, the study aimed to elucidate potential barriers to the application of such new technology. Methods: We conducted a thematic analysis based on 2 focus group interviews with 14 doctors and nurses regularly working in an outpatient pigmented lesions clinic. Results: Surprisingly, the study revealed that only 3 participants had actual experience using the TBD module. Even so, all participants were able to provide many notions and anticipations about its use, resulting in 3 major themes emerging from the interviews. First, several organizational matters were revealed to be a barrier to consistent use of the ATBM Master’s TBD module, namely lack of guidance, time pressure, and insufficient training. Second, the study found that the perceived benefits of TBD were the ability to objectively detect and monitor subtle lesion changes and unbiasedness of the procedure. Imprecise identification of moles, inability to photograph certain areas, and substandard technical aspects were the perceived weaknesses. Lastly, the study found that clinicians were open to use AI-powered technology and that the TBD module was considered a supplementary tool to aid the medical staff, rather than a replacement of the clinician. Conclusions: Demonstrated by how few of the participants had actual experience with the TBD module, this study showed that implementation of new technology does not occur automatically. It highlights the importance of having a strategy for implementation to ensure the optimized application of CAD tools. The study identified areas that could be improved when implementing AI-powered technology, as well as providing insight on how medical staff anticipated and experienced the use of a CAD device in dermatology. %M 37632937 %R 10.2196/44913 %U https://derma.jmir.org/2023/1/e44913 %U https://doi.org/10.2196/44913 %U http://www.ncbi.nlm.nih.gov/pubmed/37632937 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e45334 %T Graded Response Model Analysis and Computer Adaptive Test Simulation of the Depression Anxiety Stress Scale 21: Evaluation and Validation Study %A Kraska,Jake %A Bell,Karen %A Costello,Shane %+ School of Educational Psychology and Counselling, Faculty of Education, Monash University, 19 Ancora Imparo Way, Clayton, 3800, Australia, 61 399052896, jake.kraska@gmail.com %K graded response model %K DASS-21 %K CAT %K computer adaptive testing %K simulation %K psychological distress %K depression %K anxiety %K stress %K simulation %K mental health %K screening tool %K tool %K reliability %K development %K model %D 2023 %7 22.6.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: The Depression Anxiety Stress Scale 21 (DASS-21) is a mental health screening tool with conflicting studies regarding its factor structure. No studies have yet attempted to develop a computer adaptive test (CAT) version of it. Objective: This study calibrated items for, and simulated, a DASS-21 CAT using a nonclinical sample. Methods: An evaluation sample (n=580) was used to evaluate the DASS-21 scales via confirmatory factor analysis, Mokken analysis, and graded response modeling. A CAT was simulated with a validation sample (n=248) and a simulated sample (n=10,000) to confirm the generalizability of the model developed. Results: A bifactor model, also known as the “quadripartite” model (1 general factor with 3 specific factors) in the context of the DASS-21, displayed good fit. All scales displayed acceptable fit with the graded response model. Simulation of 3 unidimensional (depression, anxiety, and stress) CATs resulted in an average 17% to 48% reduction in items administered when a reliability of 0.80 was acceptable. Conclusions: This study clarifies previous conflicting findings regarding the DASS-21 factor structure and suggests that the quadripartite model for the DASS-21 items fits best. Item response theory modeling suggests that the items measure their respective constructs best between 0θ and 3θ (mild to moderate severity). %M 37347530 %R 10.2196/45334 %U https://www.jmir.org/2023/1/e45334 %U https://doi.org/10.2196/45334 %U http://www.ncbi.nlm.nih.gov/pubmed/37347530 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e43333 %T Digital Education for the Deployment of Artificial Intelligence in Health Care %A Malerbi,Fernando Korn %A Nakayama,Luis Filipe %A Gayle Dychiao,Robyn %A Zago Ribeiro,Lucas %A Villanueva,Cleva %A Celi,Leo Anthony %A Regatieri,Caio Vinicius %+ Laboratory for Computational Physiology, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA, 02139, United States, 1 617 253 7818, luisnaka@mit.edu %K artificial intelligence %K digital health %K health education %K machine learning %K digital education %K digital %K education %K transformation %K neural %K network %K evaluation %K dataset %K data %K set %K clinical %D 2023 %7 22.6.2023 %9 Viewpoint %J J Med Internet Res %G English %X Artificial Intelligence (AI) represents a significant milestone in health care's digital transformation. However, traditional health care education and training often lack digital competencies. To promote safe and effective AI implementation, health care professionals must acquire basic knowledge of machine learning and neural networks, critical evaluation of data sets, integration within clinical workflows, bias control, and human-machine interaction in clinical settings. Additionally, they should understand the legal and ethical aspects of digital health care and the impact of AI adoption. Misconceptions and fears about AI systems could jeopardize its real-life implementation. However, there are multiple barriers to promoting electronic health literacy, including time constraints, overburdened curricula, and the shortage of capacitated professionals. To overcome these challenges, partnerships among developers, professional societies, and academia are essential. Integrating specialists from different backgrounds, including data specialists, lawyers, and social scientists, can significantly contribute to combating digital illiteracy and promoting safe AI implementation in health care. %M 37347537 %R 10.2196/43333 %U https://www.jmir.org/2023/1/e43333 %U https://doi.org/10.2196/43333 %U http://www.ncbi.nlm.nih.gov/pubmed/37347537 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e41089 %T Artificial Intelligence Bias in Health Care: Web-Based Survey %A Vorisek,Carina Nina %A Stellmach,Caroline %A Mayer,Paula Josephine %A Klopfenstein,Sophie Anne Ines %A Bures,Dominik Martin %A Diehl,Anke %A Henningsen,Maike %A Ritter,Kerstin %A Thun,Sylvia %+ Core Facility Digital Medicine and Interoperability, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Anna-Louisa-Karsch-Str 2, Berlin, 10178, Germany, 49 30450543049, carina-nina.vorisek@charite.de %K bias %K artificial intelligence %K machine learning %K deep learning %K FAIR data %K digital health %K health care %K online %K survey %K AI %K application %K diagnosis %K treatment %K prevention %K disease %K age %K gender %K development %K clinical %D 2023 %7 22.6.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Resources are increasingly spent on artificial intelligence (AI) solutions for medical applications aiming to improve diagnosis, treatment, and prevention of diseases. While the need for transparency and reduction of bias in data and algorithm development has been addressed in past studies, little is known about the knowledge and perception of bias among AI developers. Objective: This study’s objective was to survey AI specialists in health care to investigate developers’ perceptions of bias in AI algorithms for health care applications and their awareness and use of preventative measures. Methods: A web-based survey was provided in both German and English language, comprising a maximum of 41 questions using branching logic within the REDCap web application. Only the results of participants with experience in the field of medical AI applications and complete questionnaires were included for analysis. Demographic data, technical expertise, and perceptions of fairness, as well as knowledge of biases in AI, were analyzed, and variations among gender, age, and work environment were assessed. Results: A total of 151 AI specialists completed the web-based survey. The median age was 30 (IQR 26-39) years, and 67% (101/151) of respondents were male. One-third rated their AI development projects as fair (47/151, 31%) or moderately fair (51/151, 34%), 12% (18/151) reported their AI to be barely fair, and 1% (2/151) not fair at all. One participant identifying as diverse rated AI developments as barely fair, and among the 2 undefined gender participants, AI developments were rated as barely fair or moderately fair, respectively. Reasons for biases selected by respondents were lack of fair data (90/132, 68%), guidelines or recommendations (65/132, 49%), or knowledge (60/132, 45%). Half of the respondents worked with image data (83/151, 55%) from 1 center only (76/151, 50%), and 35% (53/151) worked with national data exclusively. Conclusions: This study shows that the perception of biases in AI overall is moderately fair. Gender minorities did not once rate their AI development as fair or very fair. Therefore, further studies need to focus on minorities and women and their perceptions of AI. The results highlight the need to strengthen knowledge about bias in AI and provide guidelines on preventing biases in AI health care applications. %M 37347528 %R 10.2196/41089 %U https://www.jmir.org/2023/1/e41089 %U https://doi.org/10.2196/41089 %U http://www.ncbi.nlm.nih.gov/pubmed/37347528 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e44876 %T Identifying Patient Populations in Texts Describing Drug Approvals Through Deep Learning–Based Information Extraction: Development of a Natural Language Processing Algorithm %A Gendrin,Aline %A Souliotis,Leonidas %A Loudon-Griffiths,James %A Aggarwal,Ravisha %A Amoako,Daniel %A Desouza,Gregory %A Dimitrievska,Sashka %A Metcalfe,Paul %A Louvet,Emilie %A Sahni,Harpreet %+ AstraZeneca, City house, 126-130 Hills Rd, Cambridge, CB2 1RY, United Kingdom, 44 7814585004, aline.gendrinbrokmann@astrazeneca.com %K algorithm %K artificial intelligence %K BERT %K cancer %K classification %K data extraction %K data mining %K deep-learning %K development %K drug approval %K free text %K information retrieval %K line of therapy %K machine learning %K natural language processing %K NLP %K oncology %K pharmaceutic %K pharmacology %K pharmacy %K stage of cancer %K text extraction %K text mining %K unstructured data %D 2023 %7 22.6.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: New drug treatments are regularly approved, and it is challenging to remain up-to-date in this rapidly changing environment. Fast and accurate visualization is important to allow a global understanding of the drug market. Automation of this information extraction provides a helpful starting point for the subject matter expert, helps to mitigate human errors, and saves time. Objective: We aimed to semiautomate disease population extraction from the free text of oncology drug approval descriptions from the BioMedTracker database for 6 selected drug targets. More specifically, we intended to extract (1) line of therapy, (2) stage of cancer of the patient population described in the approval, and (3) the clinical trials that provide evidence for the approval. We aimed to use these results in downstream applications, aiding the searchability of relevant content against related drug project sources. Methods: We fine-tuned a state-of-the-art deep learning model, Bidirectional Encoder Representations from Transformers, for each of the 3 desired outputs. We independently applied rule-based text mining approaches. We compared the performances of deep learning and rule-based approaches and selected the best method, which was then applied to new entries. The results were manually curated by a subject matter expert and then used to train new models. Results: The training data set is currently small (433 entries) and will enlarge over time when new approval descriptions become available or if a choice is made to take another drug target into account. The deep learning models achieved 61% and 56% 5-fold cross-validated accuracies for line of therapy and stage of cancer, respectively, which were treated as classification tasks. Trial identification is treated as a named entity recognition task, and the 5-fold cross-validated F1-score is currently 87%. Although the scores of the classification tasks could seem low, the models comprise 5 classes each, and such scores are a marked improvement when compared to random classification. Moreover, we expect improved performance as the input data set grows, since deep learning models need to be trained on a large enough amount of data to be able to learn the task they are taught. The rule-based approach achieved 60% and 74% 5-fold cross-validated accuracies for line of therapy and stage of cancer, respectively. No attempt was made to define a rule-based approach for trial identification. Conclusions: We developed a natural language processing algorithm that is currently assisting subject matter experts in disease population extraction, which supports health authority approvals. This algorithm achieves semiautomation, enabling subject matter experts to leverage the results for deeper analysis and to accelerate information retrieval in a crowded clinical environment such as oncology. %M 37347514 %R 10.2196/44876 %U https://formative.jmir.org/2023/1/e44876 %U https://doi.org/10.2196/44876 %U http://www.ncbi.nlm.nih.gov/pubmed/37347514 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e44538 %T Novel Software for High-level Virological Testing: Self-Designed Immersive Virtual Reality Training Approach %A Tsai,Huey-Pin %A Lin,Che-Wei %A Lin,Ying-Jun %A Yeh,Chun-Sheng %A Shan,Yan-Shen %+ Department of Pathology, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, 138 Sheng-Li Road, Tainan, 70428, Taiwan, 886 62353535 ext 2616, tsaihp@mail.ncku.edu.tw %K design %K immersive %K virtual reality %K VR %K high-level clinical virology %K skill training %K testing %K virology %K virological %K medical education %K clinical practice %K simulation %K biotechnology %K molecular %K detection %K pathogen %K development %K software %K teaching %D 2023 %7 21.6.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: To ensure the timely diagnosis of emerging infectious diseases, high-tech molecular biotechnology is often used to detect pathogens and has gradually become the gold standard for virological testing. However, beginners and students are often unable to practice their skills due to the higher costs associated with high-level virological testing, the increasing complexity of the equipment, and the limited number of specimens from patients. Therefore, a new training program is necessary to increase training and reduce the risk of test failure. Objective: The aim of the study is to (1) develop and implement a virtual reality (VR) software for simulated and interactive high-level virological testing that can be applied in clinical practice and skills building or training settings and (2) evaluate the VR simulation’s effectiveness on reaction, learning, and behavior of the students (trainees). Methods: Viral nucleic acid tests on a BD MAX instrument were selected for our VR project because it is a high-tech automatic detection system. There was cooperation between teachers of medical technology and biomedical engineering. Medical technology teachers were responsible for designing the lesson plan, and the biomedical engineering personnel developed the VR software. We designed a novel VR teaching software to simulate cognitive learning via various procedure scenarios and interactive models. The VR software contains 2D VR “cognitive test and learning” lessons and 3D VR “practical skills training” lessons. We evaluated students’ learning effectiveness pre- and posttraining and then recorded their behavior patterns when answering questions, performing repeated exercises, and engaging in clinical practice. Results: The results showed that the use of the VR software met participants’ needs and enhanced their interest in learning. The average posttraining scores of participants exposed to 2D and 3D VR training were significantly higher than participants who were exposed solely to traditional demonstration teaching (P<.001). Behavioral assessments of students pre- and posttraining showed that students exposed to VR-based training to acquire relevant knowledge of advanced virological testing exhibited significantly improved knowledge of specific items posttraining (P<.01). A higher participant score led to fewer attempts when responding to each item in a matching task. Thus, VR can enhance students’ understanding of difficult topics. Conclusions: The VR program designed for this study can reduce the costs associated with virological testing training, thus, increasing their accessibility for students and beginners. It can also reduce the risk of viral infections particularly during disease outbreaks (eg, the COVID-19 pandemic) and also enhance students’ learning motivation to strengthen their practical skills. %M 37342081 %R 10.2196/44538 %U https://www.jmir.org/2023/1/e44538 %U https://doi.org/10.2196/44538 %U http://www.ncbi.nlm.nih.gov/pubmed/37342081 %0 Journal Article %@ 2561-1011 %I JMIR Publications %V 7 %N %P e45352 %T Prediction of Outcomes After Heart Transplantation in Pediatric Patients Using National Registry Data: Evaluation of Machine Learning Approaches %A Killian,Michael O %A Tian,Shubo %A Xing,Aiwen %A Hughes,Dana %A Gupta,Dipankar %A Wang,Xiaoyu %A He,Zhe %+ School of Information, Florida State University, 142 Collegiate Loop, Tallahassee, FL, 32306, United States, 1 850 644 5775, zhe@fsu.edu %K explainable artificial intelligence %K machine learning %K mortality %K outcome prediction %K organ rejection %K organ transplantation %K pediatrics %K United Network for Organ Sharing %D 2023 %7 20.6.2023 %9 Original Paper %J JMIR Cardio %G English %X Background: The prediction of posttransplant health outcomes for pediatric heart transplantation is critical for risk stratification and high-quality posttransplant care. Objective: The purpose of this study was to examine the use of machine learning (ML) models to predict rejection and mortality for pediatric heart transplant recipients. Methods: Various ML models were used to predict rejection and mortality at 1, 3, and 5 years after transplantation in pediatric heart transplant recipients using United Network for Organ Sharing data from 1987 to 2019. The variables used for predicting posttransplant outcomes included donor and recipient as well as medical and social factors. We evaluated 7 ML models—extreme gradient boosting (XGBoost), logistic regression, support vector machine, random forest (RF), stochastic gradient descent, multilayer perceptron, and adaptive boosting (AdaBoost)—as well as a deep learning model with 2 hidden layers with 100 neurons and a rectified linear unit (ReLU) activation function followed by batch normalization for each and a classification head with a softmax activation function. We used 10-fold cross-validation to evaluate model performance. Shapley additive explanations (SHAP) values were calculated to estimate the importance of each variable for prediction. Results: RF and AdaBoost models were the best-performing algorithms for different prediction windows across outcomes. RF outperformed other ML algorithms in predicting 5 of the 6 outcomes (area under the receiver operating characteristic curve [AUROC] 0.664 and 0.706 for 1-year and 3-year rejection, respectively, and AUROC 0.697, 0.758, and 0.763 for 1-year, 3-year, and 5-year mortality, respectively). AdaBoost achieved the best performance for prediction of 5-year rejection (AUROC 0.705). Conclusions: This study demonstrates the comparative utility of ML approaches for modeling posttransplant health outcomes using registry data. ML approaches can identify unique risk factors and their complex relationship with outcomes, thereby identifying patients considered to be at risk and informing the transplant community about the potential of these innovative approaches to improve pediatric care after heart transplantation. Future studies are required to translate the information derived from prediction models to optimize counseling, clinical care, and decision-making within pediatric organ transplant centers. %M 37338974 %R 10.2196/45352 %U https://cardio.jmir.org/2023/1/e45352 %U https://doi.org/10.2196/45352 %U http://www.ncbi.nlm.nih.gov/pubmed/37338974 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e43311 %T Artificial Intelligence Supporting the Training of Communication Skills in the Education of Health Care Professions: Scoping Review %A Stamer,Tjorven %A Steinhäuser,Jost %A Flägel,Kristina %+ Institute of Family Medicine, University Hospital Schleswig-Holstein Luebeck Campus, Ratzeburger Allee 160, Luebeck, 23562, Germany, 49 451 3101 8013, t.stamer@uni-luebeck.de %K communication %K education %K artificial intelligence %K machine learning %K health care %K skill %K use %K academic %K students %K training %K cost %K cost-effective %K health care professional %D 2023 %7 19.6.2023 %9 Review %J J Med Internet Res %G English %X Background: Communication is a crucial element of every health care profession, rendering communication skills training in all health care professions as being of great importance. Technological advances such as artificial intelligence (AI) and particularly machine learning (ML) may support this cause: it may provide students with an opportunity for easily accessible and readily available communication training. Objective: This scoping review aimed to summarize the status quo regarding the use of AI or ML in the acquisition of communication skills in academic health care professions. Methods: We conducted a comprehensive literature search across the PubMed, Scopus, Cochrane Library, Web of Science Core Collection, and CINAHL databases to identify articles that covered the use of AI or ML in communication skills training of undergraduate students pursuing health care profession education. Using an inductive approach, the included studies were organized into distinct categories. The specific characteristics of the studies, methods and techniques used by AI or ML applications, and main outcomes of the studies were evaluated. Furthermore, supporting and hindering factors in the use of AI and ML for communication skills training of health care professionals were outlined. Results: The titles and abstracts of 385 studies were identified, of which 29 (7.5%) underwent full-text review. Of the 29 studies, based on the inclusion and exclusion criteria, 12 (3.1%) were included. The studies were organized into 3 distinct categories: studies using AI and ML for text analysis and information extraction, studies using AI and ML and virtual reality, and studies using AI and ML and the simulation of virtual patients, each within the academic training of the communication skills of health care professionals. Within these thematic domains, AI was also used for the provision of feedback. The motivation of the involved agents played a major role in the implementation process. Reported barriers to the use of AI and ML in communication skills training revolved around the lack of authenticity and limited natural flow of language exhibited by the AI- and ML-based virtual patient systems. Furthermore, the use of educational AI- and ML-based systems in communication skills training for health care professionals is currently limited to only a few cases, topics, and clinical domains. Conclusions: The use of AI and ML in communication skills training for health care professionals is clearly a growing and promising field with a potential to render training more cost-effective and less time-consuming. Furthermore, it may serve learners as an individualized and readily available exercise method. However, in most cases, the outlined applications and technical solutions are limited in terms of access, possible scenarios, the natural flow of a conversation, and authenticity. These issues still stand in the way of any widespread implementation ambitions. %M 37335593 %R 10.2196/43311 %U https://www.jmir.org/2023/1/e43311 %U https://doi.org/10.2196/43311 %U http://www.ncbi.nlm.nih.gov/pubmed/37335593 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e46487 %T Artificial Intelligence in Health Care—Understanding Patient Information Needs and Designing Comprehensible Transparency: Qualitative Study %A Robinson,Renee %A Liday,Cara %A Lee,Sarah %A Williams,Ishan C %A Wright,Melanie %A An,Sungjoon %A Nguyen,Elaine %+ College of Pharmacy, Idaho State University, 1311 E Central Dr, Meridian, ID, 83642, United States, 1 208 373 1829, elainenguyen@isu.edu %K artificial intelligence %K machine learning %K diabetes %K equipment safety %K equipment design %K health care %D 2023 %7 19.6.2023 %9 Original Paper %J JMIR AI %G English %X Background: Artificial intelligence (AI) is a branch of computer science that uses advanced computational methods, such as machine learning (ML), to calculate and predict health outcomes and address patient and provider health needs. While these technologies show great promise for improving health care, especially in diabetes management, there are usability and safety concerns for both patients and providers about the use of AI/ML in health care management. Objective: We aimed to support and ensure safe use of AI/ML technologies in health care; thus, the team worked to better understand (1) patient information and training needs, (2) the factors that influence patients’ perceived value and trust in AI/ML health care applications, and (3) how best to support safe and appropriate use of AI/ML-enabled devices and applications among people living with diabetes. Methods: To understand general patient perspectives and information needs related to the use of AI/ML in health care, we conducted a series of focus groups (n=9) and interviews (n=3) with patients (n=41) and interviews with providers (n=6) in Alaska, Idaho, and Virginia. Grounded theory guided data gathering, synthesis, and analysis. Thematic content and constant comparison analysis were used to identify relevant themes and subthemes. Inductive approaches were used to link data to key concepts, including preferred patient-provider interactions and patient perceptions of trust, accuracy, value, assurances, and information transparency. Results: Key summary themes and recommendations focused on (1) patient preferences for AI/ML-enabled device and application information, (2) patient and provider AI/ML-related device and application training needs, (3) factors contributing to patient and provider trust in AI/ML-enabled devices and applications, and (4) AI/ML-related device and application functionality and safety considerations. A number of participants (patients and providers) made recommendations to improve device functionality to guide information and labeling mandates (eg, link to online video resources and provide access to 24/7 live in-person or virtual emergency support). Other patient recommendations included (1) providing access to practice devices, (2) providing connections to local supports and reputable community resources, and (3) simplifying the display and alert limits. Conclusions: Recommendations from both patients and providers could be used by federal oversight agencies to improve utilization of AI/ML monitoring of technology use in diabetes, improving device safety and efficacy. %M 38333424 %R 10.2196/46487 %U https://ai.jmir.org/2023/1/e46487 %U https://doi.org/10.2196/46487 %U http://www.ncbi.nlm.nih.gov/pubmed/38333424 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e44042 %T Development of an Anticipatory Triage-Ranking Algorithm Using Dynamic Simulation of the Expected Time Course of Patients With Trauma: Modeling and Simulation Study %A Sigle,Manuel %A Berliner,Leon %A Richter,Erich %A van Iersel,Mart %A Gorgati,Eleonora %A Hubloue,Ives %A Bamberg,Maximilian %A Grasshoff,Christian %A Rosenberger,Peter %A Wunderlich,Robert %+ University Department of Anesthesiology and Intensive Care Medicine, University Hospital Tübingen, Eberhard Karls University, Hoppe-Seyler-Str.3, Tübingen, 72076, Germany, 49 7071 29 86564, Robert.Wunderlich@med.uni-tuebingen.de %K novel triage algorithm %K patient with trauma %K dynamic patient simulation %K mathematic model %K artificial patient database %K semisupervised generation of patients with artificial trauma %K high-dimensional analysis of patient database %K Germany %K algorithm %K trauma %K proof-of-concept %K model %K emergency %K triage %K simulation %K urgency %K urgent %K severity %K rank %K vital sign %D 2023 %7 15.6.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: In cases of terrorism, disasters, or mass casualty incidents, far-reaching life-and-death decisions about prioritizing patients are currently made using triage algorithms that focus solely on the patient’s current health status rather than their prognosis, thus leaving a fatal gap of patients who are under- or overtriaged. Objective: The aim of this proof-of-concept study is to demonstrate a novel approach for triage that no longer classifies patients into triage categories but ranks their urgency according to the anticipated survival time without intervention. Using this approach, we aim to improve the prioritization of casualties by respecting individual injury patterns and vital signs, survival likelihoods, and the availability of rescue resources. Methods: We designed a mathematical model that allows dynamic simulation of the time course of a patient’s vital parameters, depending on individual baseline vital signs and injury severity. The 2 variables were integrated using the well-established Revised Trauma Score (RTS) and the New Injury Severity Score (NISS). An artificial patient database of unique patients with trauma (N=82,277) was then generated and used for analysis of the time course modeling and triage classification. Comparative performance analysis of different triage algorithms was performed. In addition, we applied a sophisticated, state-of-the-art clustering method using the Gower distance to visualize patient cohorts at risk for mistriage. Results: The proposed triage algorithm realistically modeled the time course of a patient’s life, depending on injury severity and current vital parameters. Different casualties were ranked by their anticipated time course, reflecting their priority for treatment. Regarding the identification of patients at risk for mistriage, the model outperformed the Simple Triage And Rapid Treatment’s triage algorithm but also exclusive stratification by the RTS or the NISS. Multidimensional analysis separated patients with similar patterns of injuries and vital parameters into clusters with different triage classifications. In this large-scale analysis, our algorithm confirmed the previously mentioned conclusions during simulation and descriptive analysis and underlined the significance of this novel approach to triage. Conclusions: The findings of this study suggest the feasibility and relevance of our model, which is unique in terms of its ranking system, prognosis outline, and time course anticipation. The proposed triage-ranking algorithm could offer an innovative triage method with a wide range of applications in prehospital, disaster, and emergency medicine, as well as simulation and research. %M 37318826 %R 10.2196/44042 %U https://www.jmir.org/2023/1/e44042 %U https://doi.org/10.2196/44042 %U http://www.ncbi.nlm.nih.gov/pubmed/37318826 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e46581 %T Artificial Intelligence Applications for Assessment, Monitoring, and Management of Parkinson Disease Symptoms: Protocol for a Systematic Review %A Bounsall,Katie %A Milne-Ives,Madison %A Hall,Andrew %A Carroll,Camille %A Meinert,Edward %+ Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, NE1 7RU, United Kingdom, 44 (0)191 208 6000, edward.meinert@newcastle.ac.uk %K artificial intelligence %K machine learning %K Parkinson disease %K Parkinson %K neurodegenerative %K review method %K systematic review %D 2023 %7 14.6.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: Parkinson disease (PD) is the second most prevalent neurodegenerative disease, with around 10 million people with PD worldwide. Current assessments of PD symptoms are conducted by questionnaires and clinician assessments and have many limitations, including unreliable reporting of symptoms, little autonomy for patients over their disease management, and standard clinical review intervals regardless of disease status or clinical need. To address these limitations, digital technologies including wearable sensors, smartphone apps, and artificial intelligence (AI) methods have been implemented for this population. Many reviews have explored the use of AI in the diagnosis of PD and management of specific symptoms; however, there is limited research on the application of AI to the monitoring and management of the range of PD symptoms. A comprehensive review of the application of AI methods is necessary to address the gap of high-quality reviews and highlight the developments of the use of AI within PD care. Objective: The purpose of this protocol is to guide a systematic review to identify and summarize the current applications of AI applied to the assessment, monitoring, and management of PD symptoms. Methods: This review protocol was structured using the PRISMA-P (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols) and the Population, Intervention, Comparator, Outcome, and Study (PICOS) frameworks. The following 5 databases will be systematically searched: PubMed, IEEE Xplore, Institute for Scientific Information’s Web of Science, Scopus, and the Cochrane Library. Title and abstract screening, full-text review, and data extraction will be conducted by 2 independent reviewers. Data will be extracted into a predetermined form, and any disagreements in screening or extraction will be discussed. Risk of bias will be assessed using the Cochrane Collaboration Risk of Bias 2 tool for randomized trials and the Mixed Methods Appraisal Tool for nonrandomized trials. Results: As of April 2023, this systematic review has not yet been started. It is expected to begin in May 2023, with the aim to complete by September 2023. Conclusions: The systematic review subsequently conducted as a product of this protocol will provide an overview of the AI methods being used for the assessment, monitoring, and management of PD symptoms. This will identify areas for further research in which AI methods can be applied to the assessment or management of PD symptoms and could support the future implementation of AI-based tools for the effective management of PD. International Registered Report Identifier (IRRID): PRR1-10.2196/46581 %M 37314853 %R 10.2196/46581 %U https://www.researchprotocols.org/2023/1/e46581 %U https://doi.org/10.2196/46581 %U http://www.ncbi.nlm.nih.gov/pubmed/37314853 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e44191 %T Automated Identification of Aspirin-Exacerbated Respiratory Disease Using Natural Language Processing and Machine Learning: Algorithm Development and Evaluation Study %A Pongdee,Thanai %A Larson,Nicholas B %A Divekar,Rohit %A Bielinski,Suzette J %A Liu,Hongfang %A Moon,Sungrim %+ Division of Allergic Diseases, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, United States, 1 5072843783, pongdee.thanai@mayo.edu %K aspirin exacerbated respiratory disease %K natural language processing %K electronic health record %K identification %K machine learning %K aspirin %K asthma %K respiratory illness %K artificial intelligence %K natural language processing algorithm %D 2023 %7 12.6.2023 %9 Original Paper %J JMIR AI %G English %X Background: Aspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despite AERD having a classic constellation of symptoms, the diagnosis is often overlooked, with an average of greater than 10 years between the onset of symptoms and diagnosis of AERD. Without a diagnosis, individuals will lack opportunities to receive effective treatments, such as aspirin desensitization or biologic medications. Objective: Our aim was to develop a combined algorithm that integrates both natural language processing (NLP) and machine learning (ML) techniques to identify patients with AERD from an electronic health record (EHR). Methods: A rule-based decision tree algorithm incorporating NLP-based features was developed using clinical documents from the EHR at Mayo Clinic. From clinical notes, using NLP techniques, 7 features were extracted that included the following: AERD, asthma, NSAID allergy, nasal polyps, chronic sinusitis, elevated urine leukotriene E4 level, and documented no-NSAID allergy. MedTagger was used to extract these 7 features from the unstructured clinical text given a set of keywords and patterns based on the chart review of 2 allergy and immunology experts for AERD. The status of each extracted feature was quantified by assigning the frequency of its occurrence in clinical documents per subject. We optimized the decision tree classifier’s hyperparameters cutoff threshold on the training set to determine the representative feature combination to discriminate AERD. We then evaluated the resulting model on the test set. Results: The AERD algorithm, which combines NLP and ML techniques, achieved an area under the receiver operating characteristic curve score, sensitivity, and specificity of 0.86 (95% CI 0.78-0.94), 80.00 (95% CI 70.82-87.33), and 88.00 (95% CI 79.98-93.64) for the test set, respectively. Conclusions: We developed a promising AERD algorithm that needs further refinement to improve AERD diagnosis. Continued development of NLP and ML technologies has the potential to reduce diagnostic delays for AERD and improve the health of our patients. %M 39105270 %R 10.2196/44191 %U https://ai.jmir.org/2023/1/e44191 %U https://doi.org/10.2196/44191 %U http://www.ncbi.nlm.nih.gov/pubmed/39105270 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e43896 %T Expectations of Anesthesiology and Intensive Care Professionals Toward Artificial Intelligence: Observational Study %A Kloka,Jan Andreas %A Holtmann,Sophie C %A Nürenberg-Goloub,Elina %A Piekarski,Florian %A Zacharowski,Kai %A Friedrichson,Benjamin %+ Department of Anaesthesiology, Intensive Care Medicine and Pain Therapy, University Hospital Frankfurt, Goethe University, Theodor-Stern Kai 7, Frankfurt, 60590, Germany, 49 630183876, JanAndreas.Kloka@kgu.de %K anesthesiology %K artificial intelligence %K health care %K intensive care %K medical informatics %K technology acceptance %K Europe-wide survey %D 2023 %7 12.6.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Artificial intelligence (AI) applications offer numerous opportunities to improve health care. To be used in the intensive care unit, AI must meet the needs of staff, and potential barriers must be addressed through joint action by all stakeholders. It is thus critical to assess the needs and concerns of anesthesiologists and intensive care physicians related to AI in health care throughout Europe. Objective: This Europe-wide, cross-sectional observational study investigates how potential users of AI systems in anesthesiology and intensive care assess the opportunities and risks of the new technology. The web-based questionnaire was based on the established analytic model of acceptance of innovations by Rogers to record 5 stages of innovation acceptance. Methods: The questionnaire was sent twice in 2 months (March 11, 2021, and November 5, 2021) through the European Society of Anaesthesiology and Intensive Care (ESAIC) member email distribution list. A total of 9294 ESAIC members were reached, of whom 728 filled out the questionnaire (response rate 728/9294, 8%). Due to missing data, 27 questionnaires were excluded. The analyses were conducted with 701 participants. Results: A total of 701 questionnaires (female: n=299, 42%) were analyzed. Overall, 265 (37.8%) of the participants have been in contact with AI and evaluated the benefits of this technology higher (mean 3.22, SD 0.39) than participants who stated no previous contact (mean 3.01, SD 0.48). Physicians see the most benefits of AI application in early warning systems (335/701, 48% strongly agreed, and 358/701, 51% agreed). Major potential disadvantages were technical problems (236/701, 34% strongly agreed, and 410/701, 58% agreed) and handling difficulties (126/701, 18% strongly agreed, and 462/701, 66% agreed), both of which could be addressed by Europe-wide digitalization and education. In addition, the lack of a secure legal basis for the research and use of medical AI in the European Union leads doctors to expect problems with legal liability (186/701, 27% strongly agreed, and 374/701, 53% agreed) and data protection (148/701, 21% strongly agreed, and 343/701, 49% agreed). Conclusions: Anesthesiologists and intensive care personnel are open to AI applications in their professional field and expect numerous benefits for staff and patients. Regional differences in the digitalization of the private sector are not reflected in the acceptance of AI among health care professionals. Physicians anticipate technical difficulties and lack a stable legal basis for the use of AI. Training for medical staff could increase the benefits of AI in professional medicine. Therefore, we suggest that the development and implementation of AI in health care require a solid technical, legal, and ethical basis, as well as adequate education and training of users. %M 37307038 %R 10.2196/43896 %U https://formative.jmir.org/2023/1/e43896 %U https://doi.org/10.2196/43896 %U http://www.ncbi.nlm.nih.gov/pubmed/37307038 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e44356 %T Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter %A Morita,Plinio Pelegrini %A Zakir Hussain,Irfhana %A Kaur,Jasleen %A Lotto,Matheus %A Butt,Zahid Ahmad %+ School of Public Health Sciences, Faculty of Health, University of Waterloo, 200 University Avenue West, Waterloo, ON, N2L 3G1, Canada, 1 5198884567 ext 41372, plinio.morita@uwaterloo.ca %K big data %K deep learning %K infodemics %K misinformation %K social media %K infoveillance %D 2023 %7 9.6.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Digital misinformation, primarily on social media, has led to harmful and costly beliefs in the general population. Notably, these beliefs have resulted in public health crises to the detriment of governments worldwide and their citizens. However, public health officials need access to a comprehensive system capable of mining and analyzing large volumes of social media data in real time. Objective: This study aimed to design and develop a big data pipeline and ecosystem (UbiLab Misinformation Analysis System [U-MAS]) to identify and analyze false or misleading information disseminated via social media on a certain topic or set of related topics. Methods: U-MAS is a platform-independent ecosystem developed in Python that leverages the Twitter V2 application programming interface and the Elastic Stack. The U-MAS expert system has 5 major components: data extraction framework, latent Dirichlet allocation (LDA) topic model, sentiment analyzer, misinformation classification model, and Elastic Cloud deployment (indexing of data and visualizations). The data extraction framework queries the data through the Twitter V2 application programming interface, with queries identified by public health experts. The LDA topic model, sentiment analyzer, and misinformation classification model are independently trained using a small, expert-validated subset of the extracted data. These models are then incorporated into U-MAS to analyze and classify the remaining data. Finally, the analyzed data are loaded into an index in the Elastic Cloud deployment and can then be presented on dashboards with advanced visualizations and analytics pertinent to infodemiology and infoveillance analysis. Results: U-MAS performed efficiently and accurately. Independent investigators have successfully used the system to extract significant insights into a fluoride-related health misinformation use case (2016 to 2021). The system is currently used for a vaccine hesitancy use case (2007 to 2022) and a heat wave–related illnesses use case (2011 to 2022). Each component in the system for the fluoride misinformation use case performed as expected. The data extraction framework handles large amounts of data within short periods. The LDA topic models achieved relatively high coherence values (0.54), and the predicted topics were accurate and befitting to the data. The sentiment analyzer performed at a correlation coefficient of 0.72 but could be improved in further iterations. The misinformation classifier attained a satisfactory correlation coefficient of 0.82 against expert-validated data. Moreover, the output dashboard and analytics hosted on the Elastic Cloud deployment are intuitive for researchers without a technical background and comprehensive in their visualization and analytics capabilities. In fact, the investigators of the fluoride misinformation use case have successfully used the system to extract interesting and important insights into public health, which have been published separately. Conclusions: The novel U-MAS pipeline has the potential to detect and analyze misleading information related to a particular topic or set of related topics. %M 37294603 %R 10.2196/44356 %U https://www.jmir.org/2023/1/e44356 %U https://doi.org/10.2196/44356 %U http://www.ncbi.nlm.nih.gov/pubmed/37294603 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e42637 %T Combination of Paper and Electronic Trail Making Tests for Automatic Analysis of Cognitive Impairment: Development and Validation Study %A Zhang,Wei %A Zheng,Xiaoran %A Tang,Zeshen %A Wang,Haoran %A Li,Renren %A Xie,Zengmai %A Yan,Jiaxin %A Zhang,Xiaochen %A Yu,Qing %A Wang,Fei %A Li,Yunxia %+ Department of Neurology, Tongji Hospital, School of Medicine, Tongji University, 389 Xincun Road, Shanghai, 200065, China, 86 13122868963, doctorliyunxia@163.com %K cognition impairment %K Trail Making Test %K vector quantization %K screening %K mixed mode %K paper and electronic devices %D 2023 %7 9.6.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Computer-aided detection, used in the screening and diagnosing of cognitive impairment, provides an objective, valid, and convenient assessment. Particularly, digital sensor technology is a promising detection method. Objective: This study aimed to develop and validate a novel Trail Making Test (TMT) using a combination of paper and electronic devices. Methods: This study included community-dwelling older adult individuals (n=297), who were classified into (1) cognitively healthy controls (HC; n=100 participants), (2) participants diagnosed with mild cognitive impairment (MCI; n=98 participants), and (3) participants with Alzheimer disease (AD; n=99 participants). An electromagnetic tablet was used to record each participant’s hand-drawn stroke. A sheet of A4 paper was placed on top of the tablet to maintain the traditional interaction style for participants who were not familiar or comfortable with electronic devices (such as touchscreens). In this way, all participants were instructed to perform the TMT-square and circle. Furthermore, we developed an efficient and interpretable cognitive impairment–screening model to automatically analyze cognitive impairment levels that were dependent on demographic characteristics and time-, pressure-, jerk-, and template-related features. Among these features, novel template-based features were based on a vector quantization algorithm. First, the model identified a candidate trajectory as the standard answer (template) from the HC group. The distance between the recorded trajectories and reference was computed as an important evaluation index. To verify the effectiveness of our method, we compared the performance of a well-trained machine learning model using the extracted evaluation index with conventional demographic characteristics and time-related features. The well-trained model was validated using follow-up data (HC group: n=38; MCI group: n=32; and AD group: n=22). Results: We compared 5 candidate machine learning methods and selected random forest as the ideal model with the best performance (accuracy: 0.726 for HC vs MCI, 0.929 for HC vs AD, and 0.815 for AD vs MCI). Meanwhile, the well-trained classifier achieved better performance than the conventional assessment method, with high stability and accuracy of the follow-up data. Conclusions: The study demonstrated that a model combining both paper and electronic TMTs increases the accuracy of evaluating participants’ cognitive impairment compared to conventional paper-based feature assessment. %M 37294606 %R 10.2196/42637 %U https://www.jmir.org/2023/1/e42637 %U https://doi.org/10.2196/42637 %U http://www.ncbi.nlm.nih.gov/pubmed/37294606 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 9 %N %P e40113 %T Therapist Feedback and Implications on Adoption of an Artificial Intelligence–Based Co-Facilitator for Online Cancer Support Groups: Mixed Methods Single-Arm Usability Study %A Leung,Yvonne W %A Ng,Steve %A Duan,Lauren %A Lam,Claire %A Chan,Kenith %A Gancarz,Mathew %A Rennie,Heather %A Trachtenberg,Lianne %A Chan,Kai P %A Adikari,Achini %A Fang,Lin %A Gratzer,David %A Hirst,Graeme %A Wong,Jiahui %A Esplen,Mary Jane %+ de Souza Institute, University Health Network, 222 St Patrick Street, Suite 503, Toronto, ON, M5T 1V4, Canada, 1 844 758 6891, yw.leung@utoronto.ca %K cancer %K recommender system %K natural language processing %K LIWC %K natural language processing %K emotion analysis %K therapist adoption %K therapist attitudes %K legal implications of AI %K therapist liability %D 2023 %7 9.6.2023 %9 Original Paper %J JMIR Cancer %G English %X Background: The recent onset of the COVID-19 pandemic and the social distancing requirement have created an increased demand for virtual support programs. Advances in artificial intelligence (AI) may offer novel solutions to management challenges such as the lack of emotional connections within virtual group interventions. Using typed text from online support groups, AI can help identify the potential risk of mental health concerns, alert group facilitator(s), and automatically recommend tailored resources while monitoring patient outcomes. Objective: The aim of this mixed methods, single-arm study was to evaluate the feasibility, acceptability, validity, and reliability of an AI-based co-facilitator (AICF) among CancerChatCanada therapists and participants to monitor online support group participants’ distress through a real-time analysis of texts posted during the support group sessions. Specifically, AICF (1) generated participant profiles with discussion topic summaries and emotion trajectories for each session, (2) identified participant(s) at risk for increased emotional distress and alerted the therapist for follow-up, and (3) automatically suggested tailored recommendations based on participant needs. Online support group participants consisted of patients with various types of cancer, and the therapists were clinically trained social workers. Methods: Our study reports on the mixed methods evaluation of AICF, including therapists’ opinions as well as quantitative measures. AICF’s ability to detect distress was evaluated by the patient's real-time emoji check-in, the Linguistic Inquiry and Word Count software, and the Impact of Event Scale-Revised. Results: Although quantitative results showed only some validity of AICF’s ability in detecting distress, the qualitative results showed that AICF was able to detect real-time issues that are amenable to treatment, thus allowing therapists to be more proactive in supporting every group member on an individual basis. However, therapists are concerned about the ethical liability of AICF’s distress detection function. Conclusions: Future works will look into wearable sensors and facial cues by using videoconferencing to overcome the barriers associated with text-based online support groups. International Registered Report Identifier (IRRID): RR2-10.2196/21453 %M 37294610 %R 10.2196/40113 %U https://cancer.jmir.org/2023/1/e40113 %U https://doi.org/10.2196/40113 %U http://www.ncbi.nlm.nih.gov/pubmed/37294610 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e45184 %T Gender Bias When Using Artificial Intelligence to Assess Anorexia Nervosa on Social Media: Data-Driven Study %A Solans Noguero,David %A Ramírez-Cifuentes,Diana %A Ríssola,Esteban Andrés %A Freire,Ana %+ Telefonica I+D, Telefónica Research, Torre Diagonal Telefónica 00, Plaça d'Ernest Lluch i Martin, 5, Barcelona, 08019, Spain, 34 913 12 87 00, david.solansnoguero@telefonica.com %K anorexia nervosa %K gender bias %K artificial intelligence %K social media %D 2023 %7 8.6.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Social media sites are becoming an increasingly important source of information about mental health disorders. Among them, eating disorders are complex psychological problems that involve unhealthy eating habits. In particular, there is evidence showing that signs and symptoms of anorexia nervosa can be traced in social media platforms. Knowing that input data biases tend to be amplified by artificial intelligence algorithms and, in particular, machine learning, these methods should be revised to mitigate biased discrimination in such important domains. Objective: The main goal of this study was to detect and analyze the performance disparities across genders in algorithms trained for the detection of anorexia nervosa on social media posts. We used a collection of automated predictors trained on a data set in Spanish containing cases of 177 users that showed signs of anorexia (471,262 tweets) and 326 control cases (910,967 tweets). Methods: We first inspected the predictive performance differences between the algorithms for male and female users. Once biases were detected, we applied a feature-level bias characterization to evaluate the source of such biases and performed a comparative analysis of such features and those that are relevant for clinicians. Finally, we showcased different bias mitigation strategies to develop fairer automated classifiers, particularly for risk assessment in sensitive domains. Results: Our results revealed concerning predictive performance differences, with substantially higher false negative rates (FNRs) for female samples (FNR=0.082) compared with male samples (FNR=0.005). The findings show that biological processes and suicide risk factors were relevant for classifying positive male cases, whereas age, emotions, and personal concerns were more relevant for female cases. We also proposed techniques for bias mitigation, and we could see that, even though disparities can be mitigated, they cannot be eliminated. Conclusions: We concluded that more attention should be paid to the assessment of biases in automated methods dedicated to the detection of mental health issues. This is particularly relevant before the deployment of systems that are thought to assist clinicians, especially considering that the outputs of such systems can have an impact on the diagnosis of people at risk. %M 37289496 %R 10.2196/45184 %U https://www.jmir.org/2023/1/e45184 %U https://doi.org/10.2196/45184 %U http://www.ncbi.nlm.nih.gov/pubmed/37289496 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 9 %N %P e45455 %T Epidemiology and a Predictive Model of Prognosis Index Based on Machine Learning in Primary Breast Lymphoma: Population-Based Study %A Yu,Yushuai %A Xu,Zelin %A Shao,Tinglei %A Huang,Kaiyan %A Chen,Ruiliang %A Yu,Xiaoqin %A Zhang,Jie %A Han,Hui %A Song,Chuangui %+ Department of Breast Surgery, Fujian Medical University Union Hospital, 29 Xin Quan Road, Gulou District, Fuzhou, 350001, China, 86 13960709993, Songcg1971@hotmail.com %K primary breast lymphoma %K epidemiology %K prognosis %K machine learning %K disparities %D 2023 %7 8.6.2023 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Primary breast lymphoma (PBL) is a rare disease whose epidemiological features, treatment principles, and factors used for the patients’ prognosis remain controversial. Objective: The aim of this study was to explore the epidemiology of PBL and to develop a better model based on machine learning to predict the prognosis for patients with primary breast lymphoma. Methods: The annual incidence of PBL was extracted from the surveillance, epidemiology, and end results database between 1975 and 2019 to examine disease occurrence trends using Joinpoint software (version 4.9; National Cancer Institute). We enrolled data from 1251 female patients with primary breast lymphoma from the surveillance, epidemiology, and end results database for survival analysis. Univariable and multivariable analyses were performed to explore independent prognostic factors for overall survival and disease-specific survival of patients with primary breast lymphoma. Eight machine learning algorithms were developed to predict the 5-year survival of patients with primary breast lymphoma. Results: The overall incidence of PBL increased drastically between 1975 and 2004, followed by a significant downward trend in incidence around 2004, with an average annual percent change (AAPC) of −0.8 (95% CI −1.1 to −0.6). Disparities in trends of PBL exist by age and race. The AAPC of the 65 years or older cohort was about 1.2 higher than that for the younger than 65 years cohort. The AAPC of White patients is 0.9 (95% CI 0.0-1.8), while that of Black patients was significantly higher at 2.1 (95% CI −2.5 to 6.9). We also identified that the risk of death from PBL is multifactorial and includes patient factors and treatment factors. Survival analysis revealed that the patients diagnosed between 2007 and 2015 had a significant risk reduction of mortality compared to those diagnosed between 1983 and 1990. The gradient booster model outperforms other models, with 0.752 for sensitivity and 0.817 for area under the curve. The important features established with the gradient booster model were the year of diagnosis, age, histologic type, and primary site, which were the 4 most relevant variables to explain 5-year survival status. Conclusions: The incidence of PBL started demonstrating a tendency to decrease after 2004, which varied by age and race. In recent years, the prognosis of patients with primary breast lymphoma has been remarkably improved. The gradient booster model had a promising performance. This model can help clinicians identify the early prognosis of patients with primary breast lymphoma and therefore improve the clinical outcome by changing management strategies and patient health care. %M 37169516 %R 10.2196/45455 %U https://publichealth.jmir.org/2023/1/e45455 %U https://doi.org/10.2196/45455 %U http://www.ncbi.nlm.nih.gov/pubmed/37169516 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e42337 %T A Trainable Open-Source Machine Learning Accelerometer Activity Recognition Toolbox: Deep Learning Approach %A Wieland,Fluri %A Nigg,Claudio %+ Department of Health Science, Institute of Sports Science, University of Bern, Bremgartenstrasse 145, Bern, 3012, Switzerland, 41 787347220, flu.wieland@gmail.com %K activity classification %K deep learning %K accelerometry %K open source %K activity recognition %K machine learning %K activity recorder %K digital health application %K smartphone app %K deep learning algorithm %K sensor device %D 2023 %7 8.6.2023 %9 Original Paper %J JMIR AI %G English %X Background: The accuracy of movement determination software in current activity trackers is insufficient for scientific applications, which are also not open-source. Objective: To address this issue, we developed an accurate, trainable, and open-source smartphone-based activity-tracking toolbox that consists of an Android app (HumanActivityRecorder) and 2 different deep learning algorithms that can be adapted to new behaviors. Methods: We employed a semisupervised deep learning approach to identify the different classes of activity based on accelerometry and gyroscope data, using both our own data and open competition data. Results: Our approach is robust against variation in sampling rate and sensor dimensional input and achieved an accuracy of around 87% in classifying 6 different behaviors on both our own recorded data and the MotionSense data. However, if the dimension-adaptive neural architecture model is tested on our own data, the accuracy drops to 26%, which demonstrates the superiority of our algorithm, which performs at 63% on the MotionSense data used to train the dimension-adaptive neural architecture model. Conclusions: HumanActivityRecorder is a versatile, retrainable, open-source, and accurate toolbox that is continually tested on new data. This enables researchers to adapt to the behavior being measured and achieve repeatability in scientific studies. %M 38875548 %R 10.2196/42337 %U https://ai.jmir.org/2023/1/e42337 %U https://doi.org/10.2196/42337 %U http://www.ncbi.nlm.nih.gov/pubmed/38875548 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e48163 %T The Advent of Generative Language Models in Medical Education %A Karabacak,Mert %A Ozkara,Burak Berksu %A Margetis,Konstantinos %A Wintermark,Max %A Bisdas,Sotirios %+ Department of Neuroradiology, The National Hospital for Neurology and Neurosurgery, University College London NHS Foundation Trust, National Hospital for Neurology and Neurosurgery, Queen Square, London, WC1N 3BG, United Kingdom, 44 020 3448 3446, s.bisdas@ucl.ac.uk %K generative language model %K artificial intelligence %K medical education %K ChatGPT %K academic integrity %K AI-driven feedback %K stimulation %K evaluation %K technology %K learning environment %K medical student %D 2023 %7 6.6.2023 %9 Viewpoint %J JMIR Med Educ %G English %X Artificial intelligence (AI) and generative language models (GLMs) present significant opportunities for enhancing medical education, including the provision of realistic simulations, digital patients, personalized feedback, evaluation methods, and the elimination of language barriers. These advanced technologies can facilitate immersive learning environments and enhance medical students' educational outcomes. However, ensuring content quality, addressing biases, and managing ethical and legal concerns present obstacles. To mitigate these challenges, it is necessary to evaluate the accuracy and relevance of AI-generated content, address potential biases, and develop guidelines and policies governing the use of AI-generated content in medical education. Collaboration among educators, researchers, and practitioners is essential for developing best practices, guidelines, and transparent AI models that encourage the ethical and responsible use of GLMs and AI in medical education. By sharing information about the data used for training, obstacles encountered, and evaluation methods, developers can increase their credibility and trustworthiness within the medical community. In order to realize the full potential of AI and GLMs in medical education while mitigating potential risks and obstacles, ongoing research and interdisciplinary collaboration are necessary. By collaborating, medical professionals can ensure that these technologies are effectively and responsibly integrated, contributing to enhanced learning experiences and patient care. %M 37279048 %R 10.2196/48163 %U https://mededu.jmir.org/2023/1/e48163 %U https://doi.org/10.2196/48163 %U http://www.ncbi.nlm.nih.gov/pubmed/37279048 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e44835 %T Natural Language Processing for Clinical Laboratory Data Repository Systems: Implementation and Evaluation for Respiratory Viruses %A Dolatabadi,Elham %A Chen,Branson %A Buchan,Sarah A %A Austin,Alex Marchand %A Azimaee,Mahmoud %A McGeer,Allison %A Mubareka,Samira %A Kwong,Jeffrey C %+ Vector Institute, 661 University Ave, Toronto, ON, M5G 1M1, Canada, 1 6477069756, elham.dolatabadi@gmail.com %K health %K informatics %K natural language processing %K knowledge extraction %K electronic health record %K EHR %D 2023 %7 6.6.2023 %9 Original Paper %J JMIR AI %G English %X Background: With the growing volume and complexity of laboratory repositories, it has become tedious to parse unstructured data into structured and tabulated formats for secondary uses such as decision support, quality assurance, and outcome analysis. However, advances in natural language processing (NLP) approaches have enabled efficient and automated extraction of clinically meaningful medical concepts from unstructured reports. Objective: In this study, we aimed to determine the feasibility of using the NLP model for information extraction as an alternative approach to a time-consuming and operationally resource-intensive handcrafted rule-based tool. Therefore, we sought to develop and evaluate a deep learning–based NLP model to derive knowledge and extract information from text-based laboratory reports sourced from a provincial laboratory repository system. Methods: The NLP model, a hierarchical multilabel classifier, was trained on a corpus of laboratory reports covering testing for 14 different respiratory viruses and viral subtypes. The corpus includes 87,500 unique laboratory reports annotated by 8 subject matter experts (SMEs). The classification task involved assigning the laboratory reports to labels at 2 levels: 24 fine-grained labels in level 1 and 6 coarse-grained labels in level 2. A “label” also refers to the status of a specific virus or strain being tested or detected (eg, influenza A is detected). The model’s performance stability and variation were analyzed across all labels in the classification task. Additionally, the model's generalizability was evaluated internally and externally on various test sets. Results: Overall, the NLP model performed well on internal, out-of-time (pre–COVID-19), and external (different laboratories) test sets with microaveraged F1-scores >94% across all classes. Higher precision and recall scores with less variability were observed for the internal and pre–COVID-19 test sets. As expected, the model’s performance varied across categories and virus types due to the imbalanced nature of the corpus and sample sizes per class. There were intrinsically fewer classes of viruses being detected than those tested; therefore, the model's performance (lowest F1-score of 57%) was noticeably lower in the detected cases. Conclusions: We demonstrated that deep learning–based NLP models are promising solutions for information extraction from text-based laboratory reports. These approaches enable scalable, timely, and practical access to high-quality and encoded laboratory data if integrated into laboratory information system repositories. %M 38875570 %R 10.2196/44835 %U https://ai.jmir.org/2023/1/e44835 %U https://doi.org/10.2196/44835 %U http://www.ncbi.nlm.nih.gov/pubmed/38875570 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e42884 %T Extraction of Radiological Characteristics From Free-Text Imaging Reports Using Natural Language Processing Among Patients With Ischemic and Hemorrhagic Stroke: Algorithm Development and Validation %A Hsu,Enshuo %A Bako,Abdulaziz T %A Potter,Thomas %A Pan,Alan P %A Britz,Gavin W %A Tannous,Jonika %A Vahidy,Farhaan S %+ Center for Health Data Science and Analytics, Houston Methodist Research Institute, 7550 Greenbriar Drive, Houston, TX, 77030, United States, 1 346 356 1479, fvahidy@houstonmethodist.org %K natural language processing %K deep learning %K electronic health records %K ischemic stroke %K cerebral hemorrhage %K neuroimaging %K computed tomography %K stroke %K radiology %D 2023 %7 6.6.2023 %9 Original Paper %J JMIR AI %G English %X Background: Neuroimaging is the gold-standard diagnostic modality for all patients suspected of stroke. However, the unstructured nature of imaging reports remains a major challenge to extracting useful information from electronic health records systems. Despite the increasing adoption of natural language processing (NLP) for radiology reports, information extraction for many stroke imaging features has not been systematically evaluated. Objective: In this study, we propose an NLP pipeline, which adopts the state-of-the-art ClinicalBERT model with domain-specific pretraining and task-oriented fine-tuning to extract 13 stroke features from head computed tomography imaging notes. Methods: We used the model to generate structured data sets with information on the presence or absence of common stroke features for 24,924 patients with strokes. We compared the survival characteristics of patients with and without features of severe stroke (eg, midline shift, perihematomal edema, or mass effect) using the Kaplan-Meier curve and log-rank tests. Results: Pretrained on 82,073 head computed tomography notes with 13.7 million words and fine-tuned on 200 annotated notes, our HeadCT_BERT model achieved an average area under receiver operating characteristic curve of 0.9831, F1-score of 0.8683, and accuracy of 97%. Among patients with acute ischemic stroke, admissions with any severe stroke feature in initial imaging notes were associated with a lower probability of survival (P<.001). Conclusions: Our proposed NLP pipeline achieved high performance and has the potential to improve medical research and patient safety. %M 38875556 %R 10.2196/42884 %U https://ai.jmir.org/2023/1/e42884 %U https://doi.org/10.2196/42884 %U http://www.ncbi.nlm.nih.gov/pubmed/38875556 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e46216 %T Prediction of Sleep Stages Via Deep Learning Using Smartphone Audio Recordings in Home Environments: Model Development and Validation %A Tran,Hai Hong %A Hong,Jung Kyung %A Jang,Hyeryung %A Jung,Jinhwan %A Kim,Jongmok %A Hong,Joonki %A Lee,Minji %A Kim,Jeong-Whun %A Kushida,Clete A %A Lee,Dongheon %A Kim,Daewoo %A Yoon,In-Young %+ Department of Psychiatry, Seoul National University Bundang Hospital, 82 Gumi-ro 173beon-gil, Bundang-gu, Seongnam-si, Gyeonggi-do, 13620, Republic of Korea, 82 31 787 7433, iyoon@snu.ac.kr %K respiratory sounds %K sleep stages %K deep learning %K smartphone %K home environment %D 2023 %7 1.6.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: The growing public interest and awareness regarding the significance of sleep is driving the demand for sleep monitoring at home. In addition to various commercially available wearable and nearable devices, sound-based sleep staging via deep learning is emerging as a decent alternative for their convenience and potential accuracy. However, sound-based sleep staging has only been studied using in-laboratory sound data. In real-world sleep environments (homes), there is abundant background noise, in contrast to quiet, controlled environments such as laboratories. The use of sound-based sleep staging at homes has not been investigated while it is essential for practical use on a daily basis. Challenges are the lack of and the expected huge expense of acquiring a sufficient size of home data annotated with sleep stages to train a large-scale neural network. Objective: This study aims to develop and validate a deep learning method to perform sound-based sleep staging using audio recordings achieved from various uncontrolled home environments. Methods: To overcome the limitation of lacking home data with known sleep stages, we adopted advanced training techniques and combined home data with hospital data. The training of the model consisted of 3 components: (1) the original supervised learning using 812 pairs of hospital polysomnography (PSG) and audio recordings, and the 2 newly adopted components; (2) transfer learning from hospital to home sounds by adding 829 smartphone audio recordings at home; and (3) consistency training using augmented hospital sound data. Augmented data were created by adding 8255 home noise data to hospital audio recordings. Besides, an independent test set was built by collecting 45 pairs of overnight PSG and smartphone audio recording at homes to examine the performance of the trained model. Results: The accuracy of the model was 76.2% (63.4% for wake, 64.9% for rapid-eye movement [REM], and 83.6% for non-REM) for our test set. The macro F1-score and mean per-class sensitivity were 0.714 and 0.706, respectively. The performance was robust across demographic groups such as age, gender, BMI, or sleep apnea severity (accuracy 73.4%-79.4%). In the ablation study, we evaluated the contribution of each component. While the supervised learning alone achieved accuracy of 69.2% on home sound data, adding consistency training to the supervised learning helped increase the accuracy to a larger degree (+4.3%) than adding transfer learning (+0.1%). The best performance was shown when both transfer learning and consistency training were adopted (+7.0%). Conclusions: This study shows that sound-based sleep staging is feasible for home use. By adopting 2 advanced techniques (transfer learning and consistency training) the deep learning model robustly predicts sleep stages using sounds recorded at various uncontrolled home environments, without using any special equipment but smartphones only. %M 37261889 %R 10.2196/46216 %U https://www.jmir.org/2023/1/e46216 %U https://doi.org/10.2196/46216 %U http://www.ncbi.nlm.nih.gov/pubmed/37261889 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e48291 %T Large Language Models in Medical Education: Opportunities, Challenges, and Future Directions %A Abd-alrazaq,Alaa %A AlSaad,Rawan %A Alhuwail,Dari %A Ahmed,Arfan %A Healy,Padraig Mark %A Latifi,Syed %A Aziz,Sarah %A Damseh,Rafat %A Alabed Alrazak,Sadam %A Sheikh,Javaid %+ AI Center for Precision Health, Weill Cornell Medicine-Qatar, PO Box 5825, Doha Al Luqta St, Ar-Rayyan, Doha, NA, Qatar, 974 55708549, alaa_alzoubi88@yahoo.com %K large language models %K artificial intelligence %K medical education %K ChatGPT %K GPT-4 %K generative AI %K students %K educators %D 2023 %7 1.6.2023 %9 Viewpoint %J JMIR Med Educ %G English %X The integration of large language models (LLMs), such as those in the Generative Pre-trained Transformers (GPT) series, into medical education has the potential to transform learning experiences for students and elevate their knowledge, skills, and competence. Drawing on a wealth of professional and academic experience, we propose that LLMs hold promise for revolutionizing medical curriculum development, teaching methodologies, personalized study plans and learning materials, student assessments, and more. However, we also critically examine the challenges that such integration might pose by addressing issues of algorithmic bias, overreliance, plagiarism, misinformation, inequity, privacy, and copyright concerns in medical education. As we navigate the shift from an information-driven educational paradigm to an artificial intelligence (AI)–driven educational paradigm, we argue that it is paramount to understand both the potential and the pitfalls of LLMs in medical education. This paper thus offers our perspective on the opportunities and challenges of using LLMs in this context. We believe that the insights gleaned from this analysis will serve as a foundation for future recommendations and best practices in the field, fostering the responsible and effective use of AI technologies in medical education. %M 37261894 %R 10.2196/48291 %U https://mededu.jmir.org/2023/1/e48291 %U https://doi.org/10.2196/48291 %U http://www.ncbi.nlm.nih.gov/pubmed/37261894 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e44537 %T Detecting Ground Glass Opacity Features in Patients With Lung Cancer: Automated Extraction and Longitudinal Analysis via Deep Learning–Based Natural Language Processing %A Lee,Kyeryoung %A Liu,Zongzhi %A Chandran,Urmila %A Kalsekar,Iftekhar %A Laxmanan,Balaji %A Higashi,Mitchell K %A Jun,Tomi %A Ma,Meng %A Li,Minghao %A Mai,Yun %A Gilman,Christopher %A Wang,Tongyu %A Ai,Lei %A Aggarwal,Parag %A Pan,Qi %A Oh,William %A Stolovitzky,Gustavo %A Schadt,Eric %A Wang,Xiaoyan %+ Sema4, 333 Ludlow Street, Stamford, CT, 06902, United States, 1 800 298 6470, xw108@caa.columbia.edu %K natural language processing %K ground glass opacity %K real world data %K radiology notes %K longitudinal analysis %K deep learning %K bidirectional long short-term memory (Bi-LSTM) %K conditional random fields (CRF) %D 2023 %7 1.6.2023 %9 Original Paper %J JMIR AI %G English %X Background: Ground-glass opacities (GGOs) appearing in computed tomography (CT) scans may indicate potential lung malignancy. Proper management of GGOs based on their features can prevent the development of lung cancer. Electronic health records are rich sources of information on GGO nodules and their granular features, but most of the valuable information is embedded in unstructured clinical notes. Objective: We aimed to develop, test, and validate a deep learning–based natural language processing (NLP) tool that automatically extracts GGO features to inform the longitudinal trajectory of GGO status from large-scale radiology notes. Methods: We developed a bidirectional long short-term memory with a conditional random field–based deep-learning NLP pipeline to extract GGO and granular features of GGO retrospectively from radiology notes of 13,216 lung cancer patients. We evaluated the pipeline with quality assessments and analyzed cohort characterization of the distribution of nodule features longitudinally to assess changes in size and solidity over time. Results: Our NLP pipeline built on the GGO ontology we developed achieved between 95% and 100% precision, 89% and 100% recall, and 92% and 100% F1-scores on different GGO features. We deployed this GGO NLP model to extract and structure comprehensive characteristics of GGOs from 29,496 radiology notes of 4521 lung cancer patients. Longitudinal analysis revealed that size increased in 16.8% (240/1424) of patients, decreased in 14.6% (208/1424), and remained unchanged in 68.5% (976/1424) in their last note compared to the first note. Among 1127 patients who had longitudinal radiology notes of GGO status, 815 (72.3%) were reported to have stable status, and 259 (23%) had increased/progressed status in the subsequent notes. Conclusions: Our deep learning–based NLP pipeline can automatically extract granular GGO features at scale from electronic health records when this information is documented in radiology notes and help inform the natural history of GGO. This will open the way for a new paradigm in lung cancer prevention and early detection. %M 38875565 %R 10.2196/44537 %U https://ai.jmir.org/2023/1/e44537 %U https://doi.org/10.2196/44537 %U http://www.ncbi.nlm.nih.gov/pubmed/38875565 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e49323 %T Open Science and Software Assistance: Commentary on “Artificial Intelligence Can Generate Fraudulent but Authentic-Looking Scientific Medical Articles: Pandora’s Box Has Been Opened” %A Ballester,Pedro L %+ Neuroscience Graduate Program, McMaster University, 1280 Main Street West, Hamilton, ON, L8S 4L8, Canada, 1 905 525 9140, pedballester@gmail.com %K artificial intelligence %K AI %K ChatGPT %K open science %K reproducibility %K software assistance %D 2023 %7 31.5.2023 %9 Commentary %J J Med Internet Res %G English %X Májovský and colleagues have investigated the important issue of ChatGPT being used for the complete generation of scientific works, including fake data and tables. The issues behind why ChatGPT poses a significant concern to research reach far beyond the model itself. Once again, the lack of reproducibility and visibility of scientific works creates an environment where fraudulent or inaccurate work can thrive. What are some of the ways in which we can handle this new situation? %M 37256656 %R 10.2196/49323 %U https://www.jmir.org/2023/1/e49323 %U https://doi.org/10.2196/49323 %U http://www.ncbi.nlm.nih.gov/pubmed/37256656 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e46924 %T Artificial Intelligence Can Generate Fraudulent but Authentic-Looking Scientific Medical Articles: Pandora’s Box Has Been Opened %A Májovský,Martin %A Černý,Martin %A Kasal,Matěj %A Komarc,Martin %A Netuka,David %+ Department of Neurosurgery and Neurooncology, First Faculty of Medicine, Charles University, U Vojenské nemocnice 1200, Prague, 16000, Czech Republic, 420 973202963, majovmar@uvn.cz %K artificial intelligence %K publications %K ethics %K neurosurgery %K ChatGPT %K language models %K fraudulent medical articles %D 2023 %7 31.5.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) has advanced substantially in recent years, transforming many industries and improving the way people live and work. In scientific research, AI can enhance the quality and efficiency of data analysis and publication. However, AI has also opened up the possibility of generating high-quality fraudulent papers that are difficult to detect, raising important questions about the integrity of scientific research and the trustworthiness of published papers. Objective: The aim of this study was to investigate the capabilities of current AI language models in generating high-quality fraudulent medical articles. We hypothesized that modern AI models can create highly convincing fraudulent papers that can easily deceive readers and even experienced researchers. Methods: This proof-of-concept study used ChatGPT (Chat Generative Pre-trained Transformer) powered by the GPT-3 (Generative Pre-trained Transformer 3) language model to generate a fraudulent scientific article related to neurosurgery. GPT-3 is a large language model developed by OpenAI that uses deep learning algorithms to generate human-like text in response to prompts given by users. The model was trained on a massive corpus of text from the internet and is capable of generating high-quality text in a variety of languages and on various topics. The authors posed questions and prompts to the model and refined them iteratively as the model generated the responses. The goal was to create a completely fabricated article including the abstract, introduction, material and methods, discussion, references, charts, etc. Once the article was generated, it was reviewed for accuracy and coherence by experts in the fields of neurosurgery, psychiatry, and statistics and compared to existing similar articles. Results: The study found that the AI language model can create a highly convincing fraudulent article that resembled a genuine scientific paper in terms of word usage, sentence structure, and overall composition. The AI-generated article included standard sections such as introduction, material and methods, results, and discussion, as well a data sheet. It consisted of 1992 words and 17 citations, and the whole process of article creation took approximately 1 hour without any special training of the human user. However, there were some concerns and specific mistakes identified in the generated article, specifically in the references. Conclusions: The study demonstrates the potential of current AI language models to generate completely fabricated scientific articles. Although the papers look sophisticated and seemingly flawless, expert readers may identify semantic inaccuracies and errors upon closer inspection. We highlight the need for increased vigilance and better detection methods to combat the potential misuse of AI in scientific research. At the same time, it is important to recognize the potential benefits of using AI language models in genuine scientific writing and research, such as manuscript preparation and language editing. %M 37256685 %R 10.2196/46924 %U https://www.jmir.org/2023/1/e46924 %U https://doi.org/10.2196/46924 %U http://www.ncbi.nlm.nih.gov/pubmed/37256685 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e44081 %T Issue of Data Imbalance on Low Birthweight Baby Outcomes Prediction and Associated Risk Factors Identification: Establishment of Benchmarking Key Machine Learning Models With Data Rebalancing Strategies %A Ren,Yang %A Wu,Dezhi %A Tong,Yan %A López-DeFede,Ana %A Gareau,Sarah %+ Department of Integrated Information Technology, University of South Carolina, 550 Assembly Street, Columbia, SC, 29298, United States, 1 8033774691, dezhiwu@cec.sc.edu %K low birthweight %K machine learning %K risk factor %K benchmark %K data rebalance %D 2023 %7 31.5.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Low birthweight (LBW) is a leading cause of neonatal mortality in the United States and a major causative factor of adverse health effects in newborns. Identifying high-risk patients early in prenatal care is crucial to preventing adverse outcomes. Previous studies have proposed various machine learning (ML) models for LBW prediction task, but they were limited by small and imbalanced data sets. Some authors attempted to address this through different data rebalancing methods. However, most of their reported performances did not reflect the models’ actual performance in real-life scenarios. To date, few studies have successfully benchmarked the performance of ML models in maternal health; thus, it is critical to establish benchmarks to advance ML use to subsequently improve birth outcomes. Objective: This study aimed to establish several key benchmarking ML models to predict LBW and systematically apply different rebalancing optimization methods to a large-scale and extremely imbalanced all-payer hospital record data set that connects mother and baby data at a state level in the United States. We also performed feature importance analysis to identify the most contributing features in the LBW classification task, which can aid in targeted intervention. Methods: Our large data set consisted of 266,687 birth records across 6 years, and 8.63% (n=23,019) of records were labeled as LBW. To set up benchmarking ML models to predict LBW, we applied 7 classic ML models (ie, logistic regression, naive Bayes, random forest, extreme gradient boosting, adaptive boosting, multilayer perceptron, and sequential artificial neural network) while using 4 different data rebalancing methods: random undersampling, random oversampling, synthetic minority oversampling technique, and weight rebalancing. Owing to ethical considerations, in addition to ML evaluation metrics, we primarily used recall to evaluate model performance, indicating the number of correctly predicted LBW cases out of all actual LBW cases, as false negative health care outcomes could be fatal. We further analyzed feature importance to explore the degree to which each feature contributed to ML model prediction among our best-performing models. Results: We found that extreme gradient boosting achieved the highest recall score—0.70—using the weight rebalancing method. Our results showed that various data rebalancing methods improved the prediction performance of the LBW group substantially. From the feature importance analysis, maternal race, age, payment source, sum of predelivery emergency department and inpatient hospitalizations, predelivery disease profile, and different social vulnerability index components were important risk factors associated with LBW. Conclusions: Our findings establish useful ML benchmarks to improve birth outcomes in the maternal health domain. They are informative to identify the minority class (ie, LBW) based on an extremely imbalanced data set, which may guide the development of personalized LBW early prevention, clinical interventions, and statewide maternal and infant health policy changes. %M 37256674 %R 10.2196/44081 %U https://www.jmir.org/2023/1/e44081 %U https://doi.org/10.2196/44081 %U http://www.ncbi.nlm.nih.gov/pubmed/37256674 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e45434 %T An Artificial Intelligence–Based Smartphone App for Assessing the Risk of Opioid Misuse in Working Populations Using Synthetic Data: Pilot Development Study %A Islam,A B M Rezbaul %A Khan,Khalid M %A Scarbrough,Amanda %A Zimpfer,Mariah Jade %A Makkena,Navya %A Omogunwa,Adebola %A Ahamed,Sheikh Iqbal %+ Department of Computer Science, Sam Houston State University, 1803 Avenue I AB1, Room 214, Huntsville, TX, 77341, United States, 1 936294 ext 4198, ari014@shsu.edu %K opioid overused disorder %K OUD %K mobile health %K mHealth %K artificial intelligence %K smartphone app %K opioids %K application %K caregivers %K mobile app %D 2023 %7 30.5.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Opioid use disorder (OUD) is an addiction crisis in the United States. As recent as 2019, more than 10 million people have misused or abused prescription opioids, making OUD one of the leading causes of accidental death in the United States. Workforces that are physically demanding and laborious in the transportation, construction and extraction, and health care industries are prime targets for OUD due to high-risk occupational activities. Because of this high prevalence of OUD among working populations in the United States, elevated workers’ compensation and health insurance costs, absenteeism, and declined productivity in workplaces have been reported. Objective: With the emergence of new smartphone technologies, health interventions can be widely used outside clinical settings via mobile health tools. The major objective of our pilot study was to develop a smartphone app that can track work-related risk factors leading to OUD with a specific focus on high-risk occupational groups. We used synthetic data analyzed by applying a machine learning algorithm to accomplish our objective. Methods: To make the OUD assessment process more convenient and to motivate potential patients with OUD, we developed a smartphone-based app through a step-by-step process. First, an extensive literature survey was conducted to list a set of critical risk assessment questions that can capture high-risk behaviors leading to OUD. Next, a review panel short-listed 15 questions after careful evaluation with specific emphasis on physically demanding workforces—9 questions had two, 5 questions had five, and 1 question had three response options. Instead of human participant data, synthetic data were used as user responses. Finally, an artificial intelligence algorithm, naive Bayes, was used to predict the OUD risk, trained with the synthetic data collected. Results: The smartphone app we have developed is functional as tested with synthetic data. Using the naive Bayes algorithm on collected synthetic data, we successfully predicted the risk of OUD. This would eventually create a platform to test the functionality of the app further using human participant data. Conclusions: The use of mobile health techniques, such as our mobile app, is highly promising in predicting and offering mitigation plans for disease detection and prevention. Using a naive Bayes algorithm model along with a representational state transfer (REST) application programming interface and cloud-based data encryption storage, respondents can guarantee their privacy and accuracy in estimating their risk. Our app offers a tailored mitigation strategy for specific workforces (eg, transportation and health care workers) that are most impacted by OUD. Despite the limitations of the study, we have developed a robust methodology and believe that our app has the potential to help reduce the opioid crisis. %M 37252763 %R 10.2196/45434 %U https://formative.jmir.org/2023/1/e45434 %U https://doi.org/10.2196/45434 %U http://www.ncbi.nlm.nih.gov/pubmed/37252763 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e39219 %T Impacts of Symptom Checkers for Laypersons’ Self-diagnosis on Physicians in Primary Care: Scoping Review %A Radionova,Natalia %A Ög,Eylem %A Wetzel,Anna-Jasmin %A Rieger,Monika A %A Preiser,Christine %+ Institute of Occupational Medicine, Social Medicine and Health Services Research, University Hospital Tuebingen, Wilhelmstrasse 27, Tuebingen, 72074, Germany, 49 70712984361, christine.preiser@med.uni-tuebingen.de %K mobile health %K mHealth %K symptom checkers %K artificial intelligence–based technology %K AI-based technology %K self-diagnosis %K general practice %K scoping review %K mobile phone %D 2023 %7 29.5.2023 %9 Review %J J Med Internet Res %G English %X Background: Symptom checkers (SCs) for laypersons’ self-assessment and preliminary self-diagnosis are widely used by the public. Little is known about the impact of these tools on health care professionals (HCPs) in primary care and their work. This is relevant to understanding how technological changes might affect the working world and how this is linked to work-related psychosocial demands and resources for HCPs. Objective: This scoping review aimed to systematically explore the existing publications on the impacts of SCs on HCPs in primary care and to identify knowledge gaps. Methods: We used the Arksey and O’Malley framework. We based our search string on the participant, concept, and context scheme and searched PubMed (MEDLINE) and CINAHL in January and June 2021. We performed a reference search in August 2021 and a manual search in November 2021. We included publications of peer-reviewed journals that focused on artificial intelligence- or algorithm-based self-diagnosing apps and tools for laypersons and had primary care or nonclinical settings as a relevant context. The characteristics of these studies were described numerically. We used thematic analysis to identify core themes. We followed the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) checklist to report the study. Results: Of the 2729 publications identified through initial and follow-up database searches, 43 full texts were screened for eligibility, of which 9 were included. Further 8 publications were included through manual search. Two publications were excluded after receiving feedback in the peer-review process. Fifteen publications were included in the final sample, which comprised 5 (33%) commentaries or nonresearch publications, 3 (20%) literature reviews, and 7 (47%) research publications. The earliest publications stemmed from 2015. We identified 5 themes. The theme finding prediagnosis comprised the comparison between SCs and physicians. We identified the performance of the diagnosis and the relevance of human factors as topics. In the theme layperson-technology relationship, we identified potentials for laypersons’ empowerment and harm through SCs. Our analysis showed potential disruptions of the physician-patient relationship and uncontested roles of HCPs in the theme (impacts on) physician-patient relationship. In the theme impacts on HCPs’ tasks, we described the reduction or increase in HCPs’ workload. We identified potential transformations of HCPs’ work and impacts on the health care system in the theme future role of SCs in health care. Conclusions: The scoping review approach was suitable for this new field of research. The heterogeneity of technologies and wordings was challenging. We identified research gaps in the literature regarding the impact of artificial intelligence– or algorithm-based self-diagnosing apps or tools on the work of HCPs in primary care. Further empirical studies on HCPs’ lived experiences are needed, as the current literature depicts expectations rather than empirical findings. %M 37247214 %R 10.2196/39219 %U https://www.jmir.org/2023/1/e39219 %U https://doi.org/10.2196/39219 %U http://www.ncbi.nlm.nih.gov/pubmed/37247214 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e45032 %T An Assessment of How Clinicians and Staff Members Use a Diabetes Artificial Intelligence Prediction Tool: Mixed Methods Study %A Liaw,Winston R %A Ramos Silva,Yessenia %A Soltero,Erica G %A Krist,Alex %A Stotts,Angela L %+ Department of Health Systems and Population Health Sciences, Tilman J Fertitta Family College of Medicine, University of Houston, 5055 Medical Circle, Houston, TX, 77204, United States, 1 713 743 9862, winstonrliaw@gmail.com %K artificial intelligence %K medical informatics %K qualitative research %K prediction tool %K clinicians %K diabetes %K treatment %K clinical decision support %K decision-making %K survey %K interview %K usefulness %K implementation %K validation %K design %K usability %D 2023 %7 29.5.2023 %9 Original Paper %J JMIR AI %G English %X Background: Nearly one-third of patients with diabetes are poorly controlled (hemoglobin A1c≥9%). Identifying at-risk individuals and providing them with effective treatment is an important strategy for preventing poor control. Objective: This study aims to assess how clinicians and staff members would use a clinical decision support tool based on artificial intelligence (AI) and identify factors that affect adoption. Methods: This was a mixed methods study that combined semistructured interviews and surveys to assess the perceived usefulness and ease of use, intent to use, and factors affecting tool adoption. We recruited clinicians and staff members from practices that manage diabetes. During the interviews, participants reviewed a sample electronic health record alert and were informed that the tool uses AI to identify those at high risk for poor control. Participants discussed how they would use the tool, whether it would contribute to care, and the factors affecting its implementation. In a survey, participants reported their demographics; rank-ordered factors influencing the adoption of the tool; and reported their perception of the tool’s usefulness as well as their intent to use, ease of use, and organizational support for use. Qualitative data were analyzed using a thematic content analysis approach. We used descriptive statistics to report demographics and analyze the findings of the survey. Results: In total, 22 individuals participated in the study. Two-thirds (14/22, 63%) of respondents were physicians. Overall, 36% (8/22) of respondents worked in academic health centers, whereas 27% (6/22) of respondents worked in federally qualified health centers. The interviews identified several themes: this tool has the potential to be useful because it provides information that is not currently available and can make care more efficient and effective; clinicians and staff members were concerned about how the tool affects patient-oriented outcomes and clinical workflows; adoption of the tool is dependent on its validation, transparency, actionability, and design and could be increased with changes to the interface and usability; and implementation would require buy-in and need to be tailored to the demands and resources of clinics and communities. Survey findings supported these themes, as 77% (17/22) of participants somewhat, moderately, or strongly agreed that they would use the tool, whereas these figures were 82% (18/22) for usefulness, 82% (18/22) for ease of use, and 68% (15/22) for clinic support. The 2 highest ranked factors affecting adoption were whether the tool improves health and the accuracy of the tool. Conclusions: Most participants found the tool to be easy to use and useful, although they had concerns about alert fatigue, bias, and transparency. These data will be used to enhance the design of an AI tool. %M 38875578 %R 10.2196/45032 %U https://ai.jmir.org/2023/1/e45032 %U https://doi.org/10.2196/45032 %U http://www.ncbi.nlm.nih.gov/pubmed/38875578 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e47283 %T Forecasting Artificial Intelligence Trends in Health Care: Systematic International Patent Analysis %A Benjamens,Stan %A Dhunnoo,Pranavsingh %A Görög,Márton %A Mesko,Bertalan %+ The Medical Futurist Institute, Povl Bang-Jensen u 2/B1 4/1, Budapest, 1118, Hungary, 36 703807260, berci@medicalfuturist.com %K artificial intelligence %K patent %K healthcare %K health care %K medical %K forecasting %K future %K AI %K machine learning %K medical device %K open-access %K AI technology %D 2023 %7 26.5.2023 %9 Original Paper %J JMIR AI %G English %X Background: Artificial intelligence (AI)– and machine learning (ML)–based medical devices and algorithms are rapidly changing the medical field. To provide an insight into the trends in AI and ML in health care, we conducted an international patent analysis. Objective: It is pivotal to obtain a clear overview on upcoming AI and MLtrends in health care to provide regulators with a better position to foresee what technologies they will have to create regulations for, which are not yet available on the market. Therefore, in this study, we provide insights and forecasts into the trends in AI and ML in health care by conducting an international patent analysis. Methods: A systematic patent analysis, focusing on AI- and ML-based patents in health care, was performed using the Espacenet database (from January 2012 until July 2022). This database includes patents from the China National Intellectual Property Administration, European Patent Office, Japan Patent Office, Korean Intellectual Property Office, and the United States Patent and Trademark Office. Results: We identified 10,967 patents: 7332 (66.9%) from the China National Intellectual Property Administration, 191 (1.7%) from the European Patent Office, 163 (1.5%) from the Japan Patent Office, 513 (4.7%) from the Korean Intellectual Property Office, and 2768 (25.2%) from the United States Patent and Trademark Office. The number of published patents showed a yearly doubling from 2015 until 2021. Five international companies that had the greatest impact on this increase were Ping An Medical and Healthcare Management Co Ltd with 568 (5.2%) patents, Siemens Healthineers with 273 (2.5%) patents, IBM Corp with 226 (2.1%) patents, Philips Healthcare with 150 (1.4%) patents, and Shanghai United Imaging Healthcare Co Ltd with 144 (1.3%) patents. Conclusions: This international patent analysis showed a linear increase in patents published by the 5 largest patent offices. An open access database with interactive search options was launched for AI- and ML-based patents in health care. %M 10449890 %R 10.2196/47283 %U https://ai.jmir.org/2023/1/e47283 %U https://doi.org/10.2196/47283 %U http://www.ncbi.nlm.nih.gov/pubmed/10449890 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e45450 %T Machine Learning–Based Time in Patterns for Blood Glucose Fluctuation Pattern Recognition in Type 1 Diabetes Management: Development and Validation Study %A Chan,Nicholas Berin %A Li,Weizi %A Aung,Theingi %A Bazuaye,Eghosa %A Montero,Rosa M %+ Informatics Research Centre, Henley Business School, University of Reading, Whiteknights, Reading, RG6 6UD, United Kingdom, 44 7714021891, weizi.li@henley.ac.uk %K diabetes mellitus %K continuous glucose monitoring %K glycemic variability %K glucose fluctuation pattern %K temporal clustering %K scalable metrics %D 2023 %7 26.5.2023 %9 Original Paper %J JMIR AI %G English %X Background: Continuous glucose monitoring (CGM) for diabetes combines noninvasive glucose biosensors, continuous monitoring, cloud computing, and analytics to connect and simulate a hospital setting in a person’s home. CGM systems inspired analytics methods to measure glycemic variability (GV), but existing GV analytics methods disregard glucose trends and patterns; hence, they fail to capture entire temporal patterns and do not provide granular insights about glucose fluctuations. Objective: This study aimed to propose a machine learning–based framework for blood glucose fluctuation pattern recognition, which enables a more comprehensive representation of GV profiles that could present detailed fluctuation information, be easily understood by clinicians, and provide insights about patient groups based on time in blood fluctuation patterns. Methods: Overall, 1.5 million measurements from 126 patients in the United Kingdom with type 1 diabetes mellitus (T1DM) were collected, and prevalent blood fluctuation patterns were extracted using dynamic time warping. The patterns were further validated in 225 patients in the United States with T1DM. Hierarchical clustering was then applied on time in patterns to form 4 clusters of patients. Patient groups were compared using statistical analysis. Results: In total, 6 patterns depicting distinctive glucose levels and trends were identified and validated, based on which 4 GV profiles of patients with T1DM were found. They were significantly different in terms of glycemic statuses such as diabetes duration (P=.04), glycated hemoglobin level (P<.001), and time in range (P<.001) and thus had different management needs. Conclusions: The proposed method can analytically extract existing blood fluctuation patterns from CGM data. Thus, time in patterns can capture a rich view of patients’ GV profile. Its conceptual resemblance with time in range, along with rich blood fluctuation details, makes it more scalable, accessible, and informative to clinicians. %M 38875568 %R 10.2196/45450 %U https://ai.jmir.org/2023/1/e45450 %U https://doi.org/10.2196/45450 %U http://www.ncbi.nlm.nih.gov/pubmed/38875568 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 10 %N %P e40887 %T Optometrists' Perspectives Regarding Artificial Intelligence Aids and Contributing Retinal Images to a Repository: Web-Based Interview Study %A Constantin,Aurora %A Atkinson,Malcolm %A Bernabeu,Miguel Oscar %A Buckmaster,Fiona %A Dhillon,Baljean %A McTrusty,Alice %A Strang,Niall %A Williams,Robin %+ Department of Vision Sciences, Glasgow Caledonian University, Cowcaddens Road, Glasgow, G40BA, United Kingdom, 44 07794835467, n.strang@gcu.ac.uk %K AI in optometry %K repository of ocular images %K user studies %K AI decision support tools %K perspectives of optometrists and ophthalmologists %K AI %K research %K medical %K decision support %K tool %K digital tool %K digital %D 2023 %7 25.5.2023 %9 Original Paper %J JMIR Hum Factors %G English %X Background: A repository of retinal images for research is being established in Scotland. It will permit researchers to validate, tune, and refine artificial intelligence (AI) decision-support algorithms to accelerate safe deployment in Scottish optometry and beyond. Research demonstrates the potential of AI systems in optometry and ophthalmology, though they are not yet widely adopted. Objective: In this study, 18 optometrists were interviewed to (1) identify their expectations and concerns about the national image research repository and their use of AI decision support and (2) gather their suggestions for improving eye health care. The goal was to clarify attitudes among optometrists delivering primary eye care with respect to contributing their patients’ images and to using AI assistance. These attitudes are less well studied in primary care contexts. Five ophthalmologists were interviewed to discover their interactions with optometrists. Methods: Between March and August 2021, 23 semistructured interviews were conducted online lasting for 30-60 minutes. Transcribed and pseudonymized recordings were analyzed using thematic analysis. Results: All optometrists supported contributing retinal images to form an extensive and long-running research repository. Our main findings are summarized as follows. Optometrists were willing to share images of their patients’ eyes but expressed concern about technical difficulties, lack of standardization, and the effort involved. Those interviewed thought that sharing digital images would improve collaboration between optometrists and ophthalmologists, for example, during referral to secondary health care. Optometrists welcomed an expanded primary care role in diagnosis and management of diseases by exploiting new technologies and anticipated significant health benefits. Optometrists welcomed AI assistance but insisted that it should not reduce their role and responsibilities. Conclusions: Our investigation focusing on optometrists is novel because most similar studies on AI assistance were performed in hospital settings. Our findings are consistent with those of studies with professionals in ophthalmology and other medical disciplines: showing near universal willingness to use AI to improve health care, alongside concerns over training, costs, responsibilities, skill retention, data sharing, and disruptions to professional practices. Our study on optometrists’ willingness to contribute images to a research repository introduces a new aspect; they hope that a digital image sharing infrastructure will facilitate service integration. %M 37227761 %R 10.2196/40887 %U https://humanfactors.jmir.org/2023/1/e40887 %U https://doi.org/10.2196/40887 %U http://www.ncbi.nlm.nih.gov/pubmed/37227761 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e40306 %T Supporting Adolescent Engagement with Artificial Intelligence–Driven Digital Health Behavior Change Interventions %A Giovanelli,Alison %A Rowe,Jonathan %A Taylor,Madelynn %A Berna,Mark %A Tebb,Kathleen P %A Penilla,Carlos %A Pugatch,Marianne %A Lester,James %A Ozer,Elizabeth M %+ Department of Pediatrics, University of California, San Francisco, 550 16th Street, 4th Floor, San Francisco, CA, 94158, United States, 1 510 428 3387, alison.giovanelli@ucsf.edu %K digital health behavior change %K adolescent %K adolescence %K behavior change %K BCT %K behavioral intervention %K artificial intelligence %K machine learning %K model %K AI ethics %K trace log data %K ethics %K ethical %K youth %K risky behavior %K engagement %K privacy %K security %K optimization %K operationalization %D 2023 %7 24.5.2023 %9 Viewpoint %J J Med Internet Res %G English %X Understanding and optimizing adolescent-specific engagement with behavior change interventions will open doors for providers to promote healthy changes in an age group that is simultaneously difficult to engage and especially important to affect. For digital interventions, there is untapped potential in combining the vastness of process-level data with the analytical power of artificial intelligence (AI) to understand not only how adolescents engage but also how to improve upon interventions with the goal of increasing engagement and, ultimately, efficacy. Rooted in the example of the INSPIRE narrative-centered digital health behavior change intervention (DHBCI) for adolescent risky behaviors around alcohol use, we propose a framework for harnessing AI to accomplish 4 goals that are pertinent to health care providers and software developers alike: measurement of adolescent engagement, modeling of adolescent engagement, optimization of current interventions, and generation of novel interventions. Operationalization of this framework with youths must be situated in the ethical use of this technology, and we have outlined the potential pitfalls of AI with particular attention to privacy concerns for adolescents. Given how recently AI advances have opened up these possibilities in this field, the opportunities for further investigation are plenty. %M 37223987 %R 10.2196/40306 %U https://www.jmir.org/2023/1/e40306 %U https://doi.org/10.2196/40306 %U http://www.ncbi.nlm.nih.gov/pubmed/37223987 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e40031 %T Artificial Intelligence in Emergency Medicine: Viewpoint of Current Applications and Foreseeable Opportunities and Challenges %A Chenais,Gabrielle %A Lagarde,Emmanuel %A Gil-Jardiné,Cédric %+ Bordeaux Population Health Center, INSERM U1219, 146 rue Léo Saignat, Bordeaux, 33000, France, 33 05 57 57 15 0, gabrielle.chenais@u-bordeaux.fr %K viewpoint %K ethics %K artificial intelligence %K emergency medicine %K perspectives %K mobile phone %D 2023 %7 23.5.2023 %9 Viewpoint %J J Med Internet Res %G English %X Emergency medicine and its services have reached a breaking point during the COVID-19 pandemic. This pandemic has highlighted the failures of a system that needs to be reconsidered, and novel approaches need to be considered. Artificial intelligence (AI) has matured to the point where it is poised to fundamentally transform health care, and applications within the emergency field are particularly promising. In this viewpoint, we first attempt to depict the landscape of AI-based applications currently in use in the daily emergency field. We review the existing AI systems; their algorithms; and their derivation, validation, and impact studies. We also propose future directions and perspectives. Second, we examine the ethics and risk specificities of the use of AI in the emergency field. %M 36972306 %R 10.2196/40031 %U https://www.jmir.org/2023/1/e40031 %U https://doi.org/10.2196/40031 %U http://www.ncbi.nlm.nih.gov/pubmed/36972306 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e46020 %T Roles and Competencies of Doctors in Artificial Intelligence Implementation: Qualitative Analysis Through Physician Interviews %A Tanaka,Masashi %A Matsumura,Shinji %A Bito,Seiji %+ Department of Clinical Epidemiology, Tokyo Medical Center, National Hospital Organization, 2-5-1,higashigaoka,meguroku, Tokyo, 1520021, Japan, 81 334110111, tanakamasashino@gmail.com %K artificial intelligence %K shared decision-making %K competency %K decision-making, qualitative research %K patient-physician %K medical field %K AI services %K AI technology %D 2023 %7 18.5.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Artificial intelligence (AI) is a term used to describe the use of computers and technology to emulate human intelligence mechanisms. Although AI is known to affect health services, the impact of information provided by AI on the patient-physician relationship in actual practice is unclear. Objective: The purpose of this study is to investigate the effect of introducing AI functions into the medical field on the role of the physician or physician-patient relationship, as well as potential concerns in the AI era. Methods: We conducted focus group interviews in Tokyo’s suburbs with physicians recruited through snowball sampling. The interviews were conducted in accordance with the questions listed in the interview guide. A verbatim transcript recording of all interviews was qualitatively analyzed using content analysis by all authors. Similarly, extracted code was grouped into subcategories, categories, and then core categories. We continued interviewing, analyzing, and discussing until we reached data saturation. In addition, we shared the results with all interviewees and confirmed the content to ensure the credibility of the analysis results. Results: A total of 9 participants who belonged to various clinical departments in the 3 groups were interviewed. The same interviewers conducted the interview as the moderator each time. The average group interview time for the 3 groups was 102 minutes. Content saturation and theme development were achieved with the 3 groups. We identified three core categories: (1) functions expected to be replaced by AI, (2) functions still expected of human physicians, and (3) concerns about the medical field in the AI era. We also summarized the roles of physicians and patients, as well as the changes in the clinical environment in the age of AI. Some of the current functions of the physician were primarily replaced by AI functions, while others were inherited as the functions of the physician. In addition, “functions extended by AI” obtained by processing massive amounts of data will emerge, and a new role for physicians will be created to deal with them. Accordingly, the importance of physician functions, such as responsibility and commitment based on values, will increase, which will simultaneously increase the expectations of the patients that physicians will perform these functions. Conclusions: We presented our findings on how the medical processes of physicians and patients will change as AI technology is fully implemented. Promoting interdisciplinary discussions on how to overcome the challenges is essential, referring to the discussions being conducted in other fields. %M 37200074 %R 10.2196/46020 %U https://formative.jmir.org/2023/1/e46020 %U https://doi.org/10.2196/46020 %U http://www.ncbi.nlm.nih.gov/pubmed/37200074 %0 Journal Article %@ 2561-1011 %I JMIR Publications %V 7 %N %P e45190 %T Continuous Data-Driven Monitoring in Critical Congenital Heart Disease: Clinical Deterioration Model Development %A Zoodsma,Ruben S %A Bosch,Rian %A Alderliesten,Thomas %A Bollen,Casper W %A Kappen,Teus H %A Koomen,Erik %A Siebes,Arno %A Nijman,Joppe %+ Department of Paediatric Intensive Care, University Medical Center Utrecht, Office Number KG0.2.306.1, Lundlaan 6, Utrecht, 3584 EA, Netherlands, 31 88 7575092, j.nijman@umcutrecht.nl %K artificial intelligence %K aberration detection %K clinical deterioration %K classification model %K paediatric intensive care %K pediatric intensive care %K congenital heart disease %K cardiac monitoring %K machine learning %K peri-operative %K perioperative %K surgery %D 2023 %7 16.5.2023 %9 Original Paper %J JMIR Cardio %G English %X Background: Critical congenital heart disease (cCHD)—requiring cardiac intervention in the first year of life for survival—occurs globally in 2-3 of every 1000 live births. In the critical perioperative period, intensive multimodal monitoring at a pediatric intensive care unit (PICU) is warranted, as their organs—especially the brain—may be severely injured due to hemodynamic and respiratory events. These 24/7 clinical data streams yield large quantities of high-frequency data, which are challenging in terms of interpretation due to the varying and dynamic physiology innate to cCHD. Through advanced data science algorithms, these dynamic data can be condensed into comprehensible information, reducing the cognitive load on the medical team and providing data-driven monitoring support through automated detection of clinical deterioration, which may facilitate timely intervention. Objective: This study aimed to develop a clinical deterioration detection algorithm for PICU patients with cCHD. Methods: Retrospectively, synchronous per-second data of cerebral regional oxygen saturation (rSO2) and 4 vital parameters (respiratory rate, heart rate, oxygen saturation, and invasive mean blood pressure) in neonates with cCHD admitted to the University Medical Center Utrecht, the Netherlands, between 2002 and 2018 were extracted. Patients were stratified based on mean oxygen saturation during admission to account for physiological differences between acyanotic and cyanotic cCHD. Each subset was used to train our algorithm in classifying data as either stable, unstable, or sensor dysfunction. The algorithm was designed to detect combinations of parameters abnormal to the stratified subpopulation and significant deviations from the patient’s unique baseline, which were further analyzed to distinguish clinical improvement from deterioration. Novel data were used for testing, visualized in detail, and internally validated by pediatric intensivists. Results: A retrospective query yielded 4600 hours and 209 hours of per-second data in 78 and 10 neonates for, respectively, training and testing purposes. During testing, stable episodes occurred 153 times, of which 134 (88%) were correctly detected. Unstable episodes were correctly noted in 46 of 57 (81%) observed episodes. Twelve expert-confirmed unstable episodes were missed in testing. Time-percentual accuracy was 93% and 77% for, respectively, stable and unstable episodes. A total of 138 sensorial dysfunctions were detected, of which 130 (94%) were correct. Conclusions: In this proof-of-concept study, a clinical deterioration detection algorithm was developed and retrospectively evaluated to classify clinical stability and instability, achieving reasonable performance considering the heterogeneous population of neonates with cCHD. Combined analysis of baseline (ie, patient-specific) deviations and simultaneous parameter-shifting (ie, population-specific) proofs would be promising with respect to enhancing applicability to heterogeneous critically ill pediatric populations. After prospective validation, the current—and comparable—models may, in the future, be used in the automated detection of clinical deterioration and eventually provide data-driven monitoring support to the medical team, allowing for timely intervention. %M 37191988 %R 10.2196/45190 %U https://cardio.jmir.org/2023/1/e45190 %U https://doi.org/10.2196/45190 %U http://www.ncbi.nlm.nih.gov/pubmed/37191988 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e41868 %T Prediction of Chronic Stress and Protective Factors in Adults: Development of an Interpretable Prediction Model Based on XGBoost and SHAP Using National Cross-sectional DEGS1 Data %A Bozorgmehr,Arezoo %A Weltermann,Birgitta %+ Institute of General Practice and Family Medicine, University Hospital Bonn, University of Bonn, Venusberg-Campus1, Bonn, 53127, Germany, 49 228 287 11160, arezoo.bozorgmehr@ukbonn.de %K artificial intelligence %K machine learning %K prognostic %K model %K chronic stress %K resilience factors %K interpretable model %K explainability %K stress %K disease %K diabetes %K cancer %K dataset %K clinical %K data %K gender %K social support %K support %K intervention %K SHAP %D 2023 %7 16.5.2023 %9 Original Paper %J JMIR AI %G English %X Background: Chronic stress is highly prevalent in the German population. It has known adverse effects on mental health, such as burnout and depression. Known long-term effects of chronic stress are cardiovascular disease, diabetes, and cancer. Objective: This study aims to derive an interpretable multiclass machine learning model for predicting chronic stress levels and factors protecting against chronic stress based on representative nationwide data from the German Health Interview and Examination Survey for Adults, which is part of the national health monitoring program. Methods: A data set from the German Health Interview and Examination Survey for Adults study including demographic, clinical, and laboratory data from 5801 participants was analyzed. A multiclass eXtreme Gradient Boosting (XGBoost) model was constructed to classify participants into 3 categories including low, middle, and high chronic stress levels. The model’s performance was evaluated using the area under the receiver operating characteristic curve, precision, recall, specificity, and the F1-score. Additionally, SHapley Additive exPlanations was used to interpret the prediction XGBoost model and to identify factors protecting against chronic stress. Results: The multiclass XGBoost model exhibited the macroaverage scores, with an area under the receiver operating characteristic curve of 81%, precision of 63%, recall of 52%, specificity of 78%, and F1-score of 54%. The most important features for low-level chronic stress were male gender, very good general health, high satisfaction with living space, and strong social support. Conclusions: This study presents a multiclass interpretable prediction model for chronic stress in adults in Germany. The explainable artificial intelligence technique SHapley Additive exPlanations identified relevant protective factors for chronic stress, which need to be considered when developing interventions to reduce chronic stress. %M 38875576 %R 10.2196/41868 %U https://ai.jmir.org/2023/1/e41868 %U https://doi.org/10.2196/41868 %U http://www.ncbi.nlm.nih.gov/pubmed/38875576 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 10 %N %P e44986 %T Associations Between Smartphone Keystroke Metadata and Mental Health Symptoms in Adolescents: Findings From the Future Proofing Study %A Braund,Taylor A %A O’Dea,Bridianne %A Bal,Debopriyo %A Maston,Kate %A Larsen,Mark %A Werner-Seidler,Aliza %A Tillman,Gabriel %A Christensen,Helen %+ Faculty of Medicine and Health, University of New South Wales, High St, Kensington, 2052, Australia, 61 290659255, t.braund@blackdog.org.au %K adolescents %K anxiety %K depression %K digital phenotype %K keystroke dynamics %K keystroke metadata %K smartphone %K students %D 2023 %7 15.5.2023 %9 Original Paper %J JMIR Ment Health %G English %X Background: Mental disorders are prevalent during adolescence. Among the digital phenotypes currently being developed to monitor mental health symptoms, typing behavior is one promising candidate. However, few studies have directly assessed associations between typing behavior and mental health symptom severity, and whether these relationships differs between genders. Objective: In a cross-sectional analysis of a large cohort, we tested whether various features of typing behavior derived from keystroke metadata were associated with mental health symptoms and whether these relationships differed between genders. Methods: A total of 934 adolescents from the Future Proofing study undertook 2 typing tasks on their smartphones through the Future Proofing app. Common keystroke timing and frequency features were extracted across tasks. Mental health symptoms were assessed using the Patient Health Questionnaire-Adolescent version, the Children’s Anxiety Scale-Short Form, the Distress Questionnaire 5, and the Insomnia Severity Index. Bivariate correlations were used to test whether keystroke features were associated with mental health symptoms. The false discovery rates of P values were adjusted to q values. Machine learning models were trained and tested using independent samples (ie, 80% train 20% test) to identify whether keystroke features could be combined to predict mental health symptoms. Results: Keystroke timing features showed a weak negative association with mental health symptoms across participants. When split by gender, females showed weak negative relationships between keystroke timing features and mental health symptoms, and weak positive relationships between keystroke frequency features and mental health symptoms. The opposite relationships were found for males (except for dwell). Machine learning models using keystroke features alone did not predict mental health symptoms. Conclusions: Increased mental health symptoms are weakly associated with faster typing, with important gender differences. Keystroke metadata should be collected longitudinally and combined with other digital phenotypes to enhance their clinical relevance. Trial Registration: Australian and New Zealand Clinical Trial Registry, ACTRN12619000855123; https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=377664&isReview=true %M 37184904 %R 10.2196/44986 %U https://mental.jmir.org/2023/1/e44986 %U https://doi.org/10.2196/44986 %U http://www.ncbi.nlm.nih.gov/pubmed/37184904 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e45156 %T Machine Learning Model to Predict Assignment of Therapy Homework in Behavioral Treatments: Algorithm Development and Validation %A Peretz,Gal %A Taylor,C Barr %A Ruzek,Josef I %A Jefroykin,Samuel %A Sadeh-Sharvit,Shiri %+ Eleos Health, 260 Charles St., Waltham, MA, 02453, United States, 1 5109848132, shiri@eleos.health %K deep learning %K empirically-based practice %K natural language processing %K behavioral treatment %K machine learning %K homework %K treatment fidelity %K artificial intelligence %K intervention %K therapy %K mental health %K mHealth %D 2023 %7 15.5.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Therapeutic homework is a core element of cognitive and behavioral interventions, and greater homework compliance predicts improved treatment outcomes. To date, research in this area has relied mostly on therapists’ and clients’ self-reports or studies carried out in academic settings, and there is little knowledge on how homework is used as a treatment intervention in routine clinical care. Objective: This study tested whether a machine learning (ML) model using natural language processing could identify homework assignments in behavioral health sessions. By leveraging this technology, we sought to develop a more objective and accurate method for detecting the presence of homework in therapy sessions. Methods: We analyzed 34,497 audio-recorded treatment sessions provided in 8 behavioral health care programs via an artificial intelligence (AI) platform designed for therapy provided by Eleos Health. Therapist and client utterances were captured and analyzed via the AI platform. Experts reviewed the homework assigned in 100 sessions to create classifications. Next, we sampled 4000 sessions and labeled therapist-client microdialogues that suggested homework to train an unsupervised sentence embedding model. This model was trained on 2.83 million therapist-client microdialogues. Results: An analysis of 100 random sessions found that homework was assigned in 61% (n=61) of sessions, and in 34% (n=21) of these cases, more than one homework assignment was provided. Homework addressed practicing skills (n=34, 37%), taking action (n=26, 28.5%), journaling (n=17, 19%), and learning new skills (n=14, 15%). Our classifier reached a 72% F1-score, outperforming state-of-the-art ML models. The therapists reviewing the microdialogues agreed in 90% (n=90) of cases on whether or not homework was assigned. Conclusions: The findings of this study demonstrate the potential of ML and natural language processing to improve the detection of therapeutic homework assignments in behavioral health sessions. Our findings highlight the importance of accurately capturing homework in real-world settings and the potential for AI to support therapists in providing evidence-based care and increasing fidelity with science-backed interventions. By identifying areas where AI can facilitate homework assignments and tracking, such as reminding therapists to prescribe homework and reducing the charting associated with homework, we can ultimately improve the overall quality of behavioral health care. Additionally, our approach can be extended to investigate the impact of homework assignments on therapeutic outcomes, providing insights into the effectiveness of specific types of homework. %M 37184927 %R 10.2196/45156 %U https://formative.jmir.org/2023/1/e45156 %U https://doi.org/10.2196/45156 %U http://www.ncbi.nlm.nih.gov/pubmed/37184927 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e44432 %T Predicting Treatment Interruption Among People Living With HIV in Nigeria: Machine Learning Approach %A Ogbechie,Matthew-David %A Fischer Walker,Christa %A Lee,Mu-Tien %A Abba Gana,Amina %A Oduola,Abimbola %A Idemudia,Augustine %A Edor,Matthew %A Harris,Emily Lark %A Stephens,Jessica %A Gao,Xiaoming %A Chen,Pai-Lien %A Persaud,Navindra Etwaroo %+ FHI 360, 1825 Connecticut Ave NW, Washington, DC, 20009, United States, 1 2028848017, npersaud@fhi360.org %K HIV %K machine learning %K treatment interruption %K Nigeria %K chronic disease %K antiretroviral therapy %K chronic disease %K HIV program %K intervention %K data collection %D 2023 %7 12.5.2023 %9 Original Paper %J JMIR AI %G English %X Background: Antiretroviral therapy (ART) has transformed HIV from a fatal illness to a chronic disease. Given the high rate of treatment interruptions, HIV programs use a range of approaches to support individuals in adhering to ART and in re-engaging those who interrupt treatment. These interventions can often be time-consuming and costly, and thus providing for all may not be sustainable. Objective: This study aims to describe our experiences developing a machine learning (ML) model to predict interruption in treatment (IIT) at 30 days among people living with HIV newly enrolled on ART in Nigeria and our integration of the model into the routine information system. In addition, we collected health workers’ perceptions and use of the model’s outputs for case management. Methods: Routine program data collected from January 2005 through February 2021 was used to train and test an ML model (boosting tree and Extreme Gradient Boosting) to predict future IIT. Data were randomly sampled using an 80/20 split into training and test data sets, respectively. Model performance was estimated using sensitivity, specificity, and positive and negative predictive values. Variables considered to be highly associated with treatment interruption were preselected by a group of HIV prevention researchers, program experts, and biostatisticians for inclusion in the model. Individuals were defined as having IIT if they were provided a 30-day supply of antiretrovirals but did not return for a refill within 28 days of their scheduled follow-up visit date. Outputs from the ML model were shared weekly with health care workers at selected facilities. Results: After data cleaning, complete data for 136,747 clients were used for the analysis. The percentage of IIT cases decreased from 58.6% (36,663/61,864) before 2017 to 14.2% (3690/28,046) from October 2019 through February 2021. Overall IIT was higher among clients who were sicker at enrollment. Other factors that were significantly associated with IIT included pregnancy and breastfeeding status and facility characteristics (location, service level, and service type). Several models were initially developed; the selected model had a sensitivity of 81%, specificity of 88%, positive predictive value of 83%, and negative predictive value of 87%, and was successfully integrated into the national electronic medical records database. During field-testing, the majority of users reported that an IIT prediction tool could lead to proactive steps for preventing IIT and improving patient outcomes. Conclusions: High-performing ML models to identify patients with HIV at risk of IIT can be developed using routinely collected service delivery data and integrated into routine health management information systems. Machine learning can improve the targeting of interventions through differentiated models of care before patients interrupt treatment, resulting in increased cost-effectiveness and improved patient outcomes. %M 38875546 %R 10.2196/44432 %U https://ai.jmir.org/2023/1/e44432 %U https://doi.org/10.2196/44432 %U http://www.ncbi.nlm.nih.gov/pubmed/38875546 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 10 %N %P e42420 %T Prediction of Mental Health Problem Using Annual Student Health Survey: Machine Learning Approach %A Baba,Ayako %A Bunji,Kyosuke %+ Health Service Center, Kanazawa University, Kakuma-machi, Kanazawa-city, Ishikawa, 9201192, Japan, 81 762645254, a_baba@staff.kanazawa-u.ac.jp %K student counseling %K health survey %K machine learning %K mental health problem %K response time %D 2023 %7 10.5.2023 %9 Original Paper %J JMIR Ment Health %G English %X Background: One of the reasons why students go to counseling is being called on based on self-reported health survey results. However, there is no concordant standard for such calls. Objective: This study aims to develop a machine learning (ML) model to predict students’ mental health problems in 1 year and the following year using the health survey’s content and answering time (response time, response time stamp, and answer date). Methods: Data were obtained from the responses of 3561 (62.58%) of 5690 undergraduate students from University A in Japan (a national university) who completed the health survey in 2020 and 2021. We performed 2 analyses; in analysis 1, a mental health problem in 2020 was predicted from demographics, answers for the health survey, and answering time in the same year, and in analysis 2, a mental health problem in 2021 was predicted from the same input variables as in analysis 1. We compared the results from different ML models, such as logistic regression, elastic net, random forest, XGBoost, and LightGBM. The results with and without answering time conditions were compared using the adopted model. Results: On the basis of the comparison of the models, we adopted the LightGBM model. In this model, both analyses and conditions achieved adequate performance (eg, Matthews correlation coefficient [MCC] of with answering time condition in analysis 1 was 0.970 and MCC of without answering time condition in analysis 1 was 0.976; MCC of with answering time condition in analysis 2 was 0.986 and that of without answering time condition in analysis 2 was 0.971). In both analyses and in both conditions, the response to the questions about campus life (eg, anxiety and future) had the highest impact (Gain 0.131-0.216; Shapley additive explanations 0.018-0.028). Shapley additive explanations of 5 to 6 input variables from questions about campus life were included in the top 10. In contrast to our expectation, the inclusion of answering time–related variables did not exhibit substantial improvement in the prediction of students’ mental health problems. However, certain variables generated based on the answering time are apparently helpful in improving the prediction and affecting the prediction probability. Conclusions: These results demonstrate the possibility of predicting mental health across years using health survey data. Demographic and behavioral data, including answering time, were effective as well as self-rating items. This model demonstrates the possibility of synergistically using the characteristics of health surveys and advantages of ML. These findings can improve health survey items and calling criteria. %M 37163323 %R 10.2196/42420 %U https://mental.jmir.org/2023/1/e42420 %U https://doi.org/10.2196/42420 %U http://www.ncbi.nlm.nih.gov/pubmed/37163323 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 11 %N %P e44597 %T Chinese Clinical Named Entity Recognition From Electronic Medical Records Based on Multisemantic Features by Using Robustly Optimized Bidirectional Encoder Representation From Transformers Pretraining Approach Whole Word Masking and Convolutional Neural Networks: Model Development and Validation %A Wang,Weijie %A Li,Xiaoying %A Ren,Huiling %A Gao,Dongping %A Fang,An %+ Institute of Medical Information and Library, Chinese Academy of Medical Sciences & Peking Union Medical College, 69 Dongdan N St, Beijing, 100005, China, 86 010 52328911, ren.huiling@imicams.ac.cn %K Chinese clinical named entity recognition %K multisemantic features %K image feature %K Robustly Optimized Bidirectional Encoder Representation from Transformers Pretraining Approach Whole Word Masking %K RoBERTa-wwm %K convolutional neural network %K CNN %D 2023 %7 10.5.2023 %9 Original Paper %J JMIR Med Inform %G English %X Background: Clinical electronic medical records (EMRs) contain important information on patients’ anatomy, symptoms, examinations, diagnoses, and medications. Large-scale mining of rich medical information from EMRs will provide notable reference value for medical research. With the complexity of Chinese grammar and blurred boundaries of Chinese words, Chinese clinical named entity recognition (CNER) remains a notable challenge. Follow-up tasks such as medical entity structuring, medical entity standardization, medical entity relationship extraction, and medical knowledge graph construction largely depend on medical named entity recognition effects. A promising CNER result would provide reliable support for building domain knowledge graphs, knowledge bases, and knowledge retrieval systems. Furthermore, it would provide research ideas for scientists and medical decision-making references for doctors and even guide patients on disease and health management. Therefore, obtaining excellent CNER results is essential. Objective: We aimed to propose a Chinese CNER method to learn semantics-enriched representations for comprehensively enhancing machines to understand deep semantic information of EMRs by using multisemantic features, which makes medical information more readable and understandable. Methods: First, we used Robustly Optimized Bidirectional Encoder Representation from Transformers Pretraining Approach Whole Word Masking (RoBERTa-wwm) with dynamic fusion and Chinese character features, including 5-stroke code, Zheng code, phonological code, and stroke code, extracted by 1-dimensional convolutional neural networks (CNNs) to obtain fine-grained semantic features of Chinese characters. Subsequently, we converted Chinese characters into square images to obtain Chinese character image features from another modality by using a 2-dimensional CNN. Finally, we input multisemantic features into Bidirectional Long Short-Term Memory with Conditional Random Fields to achieve Chinese CNER. The effectiveness of our model was compared with that of the baseline and existing research models, and the features involved in the model were ablated and analyzed to verify the model’s effectiveness. Results: We collected 1379 Yidu-S4K EMRs containing 23,655 entities in 6 categories and 2007 self-annotated EMRs containing 118,643 entities in 7 categories. The experiments showed that our model outperformed the comparison experiments, with F1-scores of 89.28% and 84.61% on the Yidu-S4K and self-annotated data sets, respectively. The results of the ablation analysis demonstrated that each feature and method we used could improve the entity recognition ability. Conclusions: Our proposed CNER method would mine the richer deep semantic information in EMRs by multisemantic embedding using RoBERTa-wwm and CNNs, enhancing the semantic recognition of characters at different granularity levels and improving the generalization capability of the method by achieving information complementarity among different semantic features, thus making the machine semantically understand EMRs and improving the CNER task accuracy. %M 37163343 %R 10.2196/44597 %U https://medinform.jmir.org/2023/1/e44597 %U https://doi.org/10.2196/44597 %U http://www.ncbi.nlm.nih.gov/pubmed/37163343 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e44804 %T Evaluating Listening Performance for COVID-19 Detection by Clinicians and Machine Learning: Comparative Study %A Han,Jing %A Montagna,Marco %A Grammenos,Andreas %A Xia,Tong %A Bondareva,Erika %A Siegele-Brown,Chloë %A Chauhan,Jagmohan %A Dang,Ting %A Spathis,Dimitris %A Floto,R Andres %A Cicuta,Pietro %A Mascolo,Cecilia %+ Department of Computer Science and Technology, University of Cambridge, 15 JJ Thomson Ave, Cambridge, CB3 0FD, United Kingdom, 44 012237 ext 63540, jh2298@cam.ac.uk %K audio analysis %K COVID-19 detection %K deep learning %K respiratory disease diagnosis %K mobile health %K detection %K clinicians %K machine learning %K respiratory diagnosis %K clinical decisions %K respiratory %D 2023 %7 9.5.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: To date, performance comparisons between men and machines have been carried out in many health domains. Yet machine learning (ML) models and human performance comparisons in audio-based respiratory diagnosis remain largely unexplored. Objective: The primary objective of this study was to compare human clinicians and an ML model in predicting COVID-19 from respiratory sound recordings. Methods: In this study, we compared human clinicians and an ML model in predicting COVID-19 from respiratory sound recordings. Prediction performance on 24 audio samples (12 tested positive) made by 36 clinicians with experience in treating COVID-19 or other respiratory illnesses was compared with predictions made by an ML model trained on 1162 samples. Each sample consisted of voice, cough, and breathing sound recordings from 1 subject, and the length of each sample was around 20 seconds. We also investigated whether combining the predictions of the model and human experts could further enhance the performance in terms of both accuracy and confidence. Results: The ML model outperformed the clinicians, yielding a sensitivity of 0.75 and a specificity of 0.83, whereas the best performance achieved by the clinicians was 0.67 in terms of sensitivity and 0.75 in terms of specificity. Integrating the clinicians’ and the model’s predictions, however, could enhance performance further, achieving a sensitivity of 0.83 and a specificity of 0.92. Conclusions: Our findings suggest that the clinicians and the ML model could make better clinical decisions via a cooperative approach and achieve higher confidence in audio-based respiratory diagnosis. %M 37126593 %R 10.2196/44804 %U https://www.jmir.org/2023/1/e44804 %U https://doi.org/10.2196/44804 %U http://www.ncbi.nlm.nih.gov/pubmed/37126593 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 10 %N %P e43227 %T Feasibility and Acceptability of Chatbots for Nutrition and Physical Activity Health Promotion Among Adolescents: Systematic Scoping Review With Adolescent Consultation %A Han,Rui %A Todd,Allyson %A Wardak,Sara %A Partridge,Stephanie R %A Raeside,Rebecca %+ Engagement and Co-Design Research Hub, School of Health Sciences, Faculty of Medicine and Health, University of Sydney, Level 6, Block K, Westmead Hospital, Westmead, 2145, Australia, 61 404306607, rebecca.raeside@sydney.edu.au %K chatbot %K artificial intelligence %K text message %K adolescent nutrition %K physical activity %K health promotion %D 2023 %7 5.5.2023 %9 Review %J JMIR Hum Factors %G English %X Background: Reducing lifestyle risk behaviors among adolescents depends on access to age-appropriate health promotion information. Chatbots—computer programs designed to simulate conversations with human users—have the potential to deliver health information to adolescents to improve their lifestyle behaviors and support behavior change, but research on the feasibility and acceptability of chatbots in the adolescent population is unknown. Objective: This systematic scoping review aims to evaluate the feasibility and acceptability of chatbots in nutrition and physical activity interventions among adolescents. A secondary aim is to consult adolescents to identify features of chatbots that are acceptable and feasible. Methods: We searched 6 electronic databases from March to April 2022 (MEDLINE, Embase, Joanna Briggs Institute, the Cumulative Index to Nursing and Allied Health, the Association for Computing Machinery library, and the IT database Institute of Electrical and Electronics Engineers). Peer-reviewed studies were included that were conducted in the adolescent population (10-19 years old) without any chronic disease, except obesity or type 2 diabetes, and assessed chatbots used nutrition or physical activity interventions or both that encouraged individuals to meet dietary or physical activity guidelines and support positive behavior change. Studies were screened by 2 independent reviewers, with any queries resolved by a third reviewer. Data were extracted into tables and collated in a narrative summary. Gray literature searches were also undertaken. Results of the scoping review were presented to a diverse youth advisory group (N=16, 13-18 years old) to gain insights into this topic beyond what is published in the literature. Results: The search identified 5558 papers, with 5 (0.1%) studies describing 5 chatbots meeting the inclusion criteria. The 5 chatbots were supported by mobile apps using a combination of the following features: personalized feedback, conversational agents, gamification, and monitoring of behavior change. Of the 5 studies, 2 (40.0%) studies focused on nutrition, 2 (40.0%) studies focused on physical activity, and 1 (20.0%) focused on both nutrition and physical activity. Feasibility and acceptability varied across the 5 studies, with usage rates above 50% in 3 (60.0%) studies. In addition, 3 (60.0%) studies reported health-related outcomes, with only 1 (20.0%) study showing promising effects of the intervention. Adolescents presented novel concerns around the use of chatbots in nutrition and physical activity interventions, including ethical concerns and the use of false or misleading information. Conclusions: Limited research is available on chatbots in adolescent nutrition and physical activity interventions, finding insufficient evidence on the acceptability and feasibility of chatbots in the adolescent population. Similarly, adolescent consultation identified issues in the design features that have not been mentioned in the published literature. Therefore, chatbot codesign with adolescents may help ensure that such technology is feasible and acceptable to an adolescent population. %M 37145858 %R 10.2196/43227 %U https://humanfactors.jmir.org/2023/1/e43227 %U https://doi.org/10.2196/43227 %U http://www.ncbi.nlm.nih.gov/pubmed/37145858 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e44030 %T Quality, Usability, and Effectiveness of mHealth Apps and the Role of Artificial Intelligence: Current Scenario and Challenges %A Deniz-Garcia,Alejandro %A Fabelo,Himar %A Rodriguez-Almeida,Antonio J %A Zamora-Zamorano,Garlene %A Castro-Fernandez,Maria %A Alberiche Ruano,Maria del Pino %A Solvoll,Terje %A Granja,Conceição %A Schopf,Thomas Roger %A Callico,Gustavo M %A Soguero-Ruiz,Cristina %A Wägner,Ana M %A , %+ Instituto Universitario de Investigaciones Biomédicas y Sanitarias, Universidad de Las Palmas de Gran Canaria, Paseo Blas Cabrera Felipe, Las Palmas de Gran Canaria, 35011, Spain, 34 928453431, ana.wagner@ulpgc.es %K artificial intelligence %K chronic disease prevention and management %K big data %K mobile health %K mHealth %K noncommunicable diseases %K mobile phone %D 2023 %7 4.5.2023 %9 Viewpoint %J J Med Internet Res %G English %X The use of artificial intelligence (AI) and big data in medicine has increased in recent years. Indeed, the use of AI in mobile health (mHealth) apps could considerably assist both individuals and health care professionals in the prevention and management of chronic diseases, in a person-centered manner. Nonetheless, there are several challenges that must be overcome to provide high-quality, usable, and effective mHealth apps. Here, we review the rationale and guidelines for the implementation of mHealth apps and the challenges regarding quality, usability, and user engagement and behavior change, with a special focus on the prevention and management of noncommunicable diseases. We suggest that a cocreation-based framework is the best method to address these challenges. Finally, we describe the current and future roles of AI in improving personalized medicine and provide recommendations for developing AI-based mHealth apps. We conclude that the implementation of AI and mHealth apps for routine clinical practice and remote health care will not be feasible until we overcome the main challenges regarding data privacy and security, quality assessment, and the reproducibility and uncertainty of AI results. Moreover, there is a lack of both standardized methods to measure the clinical outcomes of mHealth apps and techniques to encourage user engagement and behavior changes in the long term. We expect that in the near future, these obstacles will be overcome and that the ongoing European project, Watching the risk factors (WARIFA), will provide considerable advances in the implementation of AI-based mHealth apps for disease prevention and health promotion. %M 37140973 %R 10.2196/44030 %U https://www.jmir.org/2023/1/e44030 %U https://doi.org/10.2196/44030 %U http://www.ncbi.nlm.nih.gov/pubmed/37140973 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 10 %N %P e42714 %T Detecting Medication-Taking Gestures Using Machine Learning and Accelerometer Data Collected via Smartwatch Technology: Instrument Validation Study %A Odhiambo,Chrisogonas Odero %A Ablonczy,Lukacs %A Wright,Pamela J %A Corbett,Cynthia F %A Reichardt,Sydney %A Valafar,Homayoun %+ Department of Computer Science and Engineering, University of South Carolina, 315 Main Street, Columbia, SC, 29208, United States, 1 8037774046, homayoun@cec.sc.edu %K machine learning %K neural networks %K automated pattern recognition %K medication adherence %K ecological momentary assessment %K digital signal processing %K digital biomarkers %D 2023 %7 4.5.2023 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Medication adherence is a global public health challenge, as only approximately 50% of people adhere to their medication regimens. Medication reminders have shown promising results in terms of promoting medication adherence. However, practical mechanisms to determine whether a medication has been taken or not, once people are reminded, remain elusive. Emerging smartwatch technology may more objectively, unobtrusively, and automatically detect medication taking than currently available methods. Objective: This study aimed to examine the feasibility of detecting natural medication-taking gestures using smartwatches. Methods: A convenience sample (N=28) was recruited using the snowball sampling method. During data collection, each participant recorded at least 5 protocol-guided (scripted) medication-taking events and at least 10 natural instances of medication-taking events per day for 5 days. Using a smartwatch, the accelerometer data were recorded for each session at a sampling rate of 25 Hz. The raw recordings were scrutinized by a team member to validate the accuracy of the self-reports. The validated data were used to train an artificial neural network (ANN) to detect a medication-taking event. The training and testing data included previously recorded accelerometer data from smoking, eating, and jogging activities in addition to the medication-taking data recorded in this study. The accuracy of the model to identify medication taking was evaluated by comparing the ANN’s output with the actual output. Results: Most (n=20, 71%) of the 28 study participants were college students and aged 20 to 56 years. Most individuals were Asian (n=12, 43%) or White (n=12, 43%), single (n=24, 86%), and right-hand dominant (n=23, 82%). In total, 2800 medication-taking gestures (n=1400, 50% natural plus n=1400, 50% scripted gestures) were used to train the network. During the testing session, 560 natural medication-taking events that were not previously presented to the ANN were used to assess the network. The accuracy, precision, and recall were calculated to confirm the performance of the network. The trained ANN exhibited an average true-positive and true-negative performance of 96.5% and 94.5%, respectively. The network exhibited <5% error in the incorrect classification of medication-taking gestures. Conclusions: Smartwatch technology may provide an accurate, nonintrusive means of monitoring complex human behaviors such as natural medication-taking gestures. Future research is warranted to evaluate the efficacy of using modern sensing devices and machine learning algorithms to monitor medication-taking behavior and improve medication adherence. %M 37140971 %R 10.2196/42714 %U https://humanfactors.jmir.org/2023/1/e42714 %U https://doi.org/10.2196/42714 %U http://www.ncbi.nlm.nih.gov/pubmed/37140971 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e44293 %T Few-Shot Learning for Clinical Natural Language Processing Using Siamese Neural Networks: Algorithm Development and Validation Study %A Oniani,David %A Chandrasekar,Premkumar %A Sivarajkumar,Sonish %A Wang,Yanshan %+ Department of Health Information Management, University of Pittsburgh, 6026 Forbes Tower, Pittsburgh, PA, 15260, United States, 1 4123832712, yanshan.wang@pitt.edu %K few-shot learning %K FSL %K Siamese neural network %K SNN %K natural language processing %K NLP %K neural networks %D 2023 %7 4.5.2023 %9 Original Paper %J JMIR AI %G English %X Background: Natural language processing (NLP) has become an emerging technology in health care that leverages a large amount of free-text data in electronic health records to improve patient care, support clinical decisions, and facilitate clinical and translational science research. Recently, deep learning has achieved state-of-the-art performance in many clinical NLP tasks. However, training deep learning models often requires large, annotated data sets, which are normally not publicly available and can be time-consuming to build in clinical domains. Working with smaller annotated data sets is typical in clinical NLP; therefore, ensuring that deep learning models perform well is crucial for real-world clinical NLP applications. A widely adopted approach is fine-tuning existing pretrained language models, but these attempts fall short when the training data set contains only a few annotated samples. Few-shot learning (FSL) has recently been investigated to tackle this problem. Siamese neural network (SNN) has been widely used as an FSL approach in computer vision but has not been studied well in NLP. Furthermore, the literature on its applications in clinical domains is scarce. Objective: The aim of our study is to propose and evaluate SNN-based approaches for few-shot clinical NLP tasks. Methods: We propose 2 SNN-based FSL approaches, including pretrained SNN and SNN with second-order embeddings. We evaluate the proposed approaches on the clinical sentence classification task. We experiment with 3 few-shot settings, including 4-shot, 8-shot, and 16-shot learning. The clinical NLP task is benchmarked using the following 4 pretrained language models: bidirectional encoder representations from transformers (BERT), BERT for biomedical text mining (BioBERT), BioBERT trained on clinical notes (BioClinicalBERT), and generative pretrained transformer 2 (GPT-2). We also present a performance comparison between SNN-based approaches and the prompt-based GPT-2 approach. Results: In 4-shot sentence classification tasks, GPT-2 had the highest precision (0.63), but its recall (0.38) and F score (0.42) were lower than those of BioBERT-based pretrained SNN (0.45 and 0.46, respectively). In both 8-shot and 16-shot settings, SNN-based approaches outperformed GPT-2 in all 3 metrics of precision, recall, and F score. Conclusions: The experimental results verified the effectiveness of the proposed SNN approaches for few-shot clinical NLP tasks. %M 38875537 %R 10.2196/44293 %U https://ai.jmir.org/2023/1/e44293 %U https://doi.org/10.2196/44293 %U http://www.ncbi.nlm.nih.gov/pubmed/38875537 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e39862 %T Leveraging Mobile Phone Sensors, Machine Learning, and Explainable Artificial Intelligence to Predict Imminent Same-Day Binge-drinking Events to Support Just-in-time Adaptive Interventions: Algorithm Development and Validation Study %A Bae,Sang Won %A Suffoletto,Brian %A Zhang,Tongze %A Chung,Tammy %A Ozolcer,Melik %A Islam,Mohammad Rahul %A Dey,Anind K %+ Human-Computer Interaction and Human-Centered AI Systems Lab, AI for Healthcare Lab, School of Systems and Enterprises, Stevens Institute of Technology, 1 Castle Point Terrace, Hoboken, NJ, 07030, United States, 1 4122658616, sbae4@stevens.edu %K alcohol consumption %K binge-drinking event %K BDE %K behavioral prediction model %K machine learning %K smartphone sensors %K passive sensing %K explainable artificial intelligence %K XAI %K just-in-time adaptive interventions %K JITAIs %K mobile phone %D 2023 %7 4.5.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Digital just-in-time adaptive interventions can reduce binge-drinking events (BDEs; consuming ≥4 drinks for women and ≥5 drinks for men per occasion) in young adults but need to be optimized for timing and content. Delivering just-in-time support messages in the hours prior to BDEs could improve intervention impact. Objective: We aimed to determine the feasibility of developing a machine learning (ML) model to accurately predict future, that is, same-day BDEs 1 to 6 hours prior BDEs, using smartphone sensor data and to identify the most informative phone sensor features associated with BDEs on weekends and weekdays to determine the key features that explain prediction model performance. Methods: We collected phone sensor data from 75 young adults (aged 21 to 25 years; mean 22.4, SD 1.9 years) with risky drinking behavior who reported their drinking behavior over 14 weeks. The participants in this secondary analysis were enrolled in a clinical trial. We developed ML models testing different algorithms (eg, extreme gradient boosting [XGBoost] and decision tree) to predict same-day BDEs (vs low-risk drinking events and non-drinking periods) using smartphone sensor data (eg, accelerometer and GPS). We tested various “prediction distance” time windows (more proximal: 1 hour; distant: 6 hours) from drinking onset. We also tested various analysis time windows (ie, the amount of data to be analyzed), ranging from 1 to 12 hours prior to drinking onset, because this determines the amount of data that needs to be stored on the phone to compute the model. Explainable artificial intelligence was used to explore interactions among the most informative phone sensor features contributing to the prediction of BDEs. Results: The XGBoost model performed the best in predicting imminent same-day BDEs, with 95% accuracy on weekends and 94.3% accuracy on weekdays (F1-score=0.95 and 0.94, respectively). This XGBoost model needed 12 and 9 hours of phone sensor data at 3- and 6-hour prediction distance from the onset of drinking on weekends and weekdays, respectively, prior to predicting same-day BDEs. The most informative phone sensor features for BDE prediction were time (eg, time of day) and GPS-derived features, such as the radius of gyration (an indicator of travel). Interactions among key features (eg, time of day and GPS-derived features) contributed to the prediction of same-day BDEs. Conclusions: We demonstrated the feasibility and potential use of smartphone sensor data and ML for accurately predicting imminent (same-day) BDEs in young adults. The prediction model provides “windows of opportunity,” and with the adoption of explainable artificial intelligence, we identified “key contributing features” to trigger just-in-time adaptive intervention prior to the onset of BDEs, which has the potential to reduce the likelihood of BDEs in young adults. Trial Registration: ClinicalTrials.gov NCT02918565; https://clinicaltrials.gov/ct2/show/NCT02918565 %M 36809294 %R 10.2196/39862 %U https://formative.jmir.org/2023/1/e39862 %U https://doi.org/10.2196/39862 %U http://www.ncbi.nlm.nih.gov/pubmed/36809294 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e32962 %T The Gap Between AI and Bedside: Participatory Workshop on the Barriers to the Integration, Translation, and Adoption of Digital Health Care and AI Startup Technology Into Clinical Practice %A Olaye,Iredia M %A Seixas,Azizi A %+ Department of Medicine, Weill Cornell Medicine, Cornell University, 1300 York Avenue, Box #46, New York, NY, 10065, United States, 1 646 962 5050, imo4001@med.cornell.edu %K digital health %K startups %K venture capital %K artificial intelligence %K AI translation %K clinical practice %K early-stage %K funding %K bedside %K machine learning %K technology %K tech %K qualitative %K workshop %K entrepreneurs %D 2023 %7 2.5.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) and digital health technological innovations from startup companies used in clinical practice can yield better health outcomes, reduce health care costs, and improve patients' experience. However, the integration, translation, and adoption of these technologies into clinical practice are plagued with many challenges and are lagging. Furthermore, explanations of the impediments to clinical translation are largely unknown and have not been systematically studied from the perspective of AI and digital health care startup founders and executives. Objective: The aim of this paper is to describe the barriers to integrating early-stage technologies in clinical practice and health care systems from the perspectives of digital health and health care AI founders and executives. Methods: A stakeholder focus group workshop was conducted with a sample of 10 early-stage digital health and health care AI founders and executives. Digital health, health care AI, digital health–focused venture capitalists, and physician executives were represented. Using an inductive thematic analysis approach, transcripts were organized, queried, and analyzed for thematic convergence. Results: We identified the following four categories of barriers in the integration of early-stage digital health innovations into clinical practice and health care systems: (1) lack of knowledge of health system technology procurement protocols and best practices, (2) demanding regulatory and validation requirements, (3) challenges within the health system technology procurement process, and (4) disadvantages of early-stage digital health companies compared to large technology conglomerates. Recommendations from the study participants were also synthesized to create a road map to mitigate the barriers to integrating early-stage or novel digital health technologies in clinical practice. Conclusions: Early-stage digital health and health care AI entrepreneurs identified numerous barriers to integrating digital health solutions into clinical practice. Mitigation initiatives should create opportunities for early-stage digital health technology companies and health care providers to interact, develop relationships, and use evidence-based research and best practices during health care technology procurement and evaluation processes. %M 37129947 %R 10.2196/32962 %U https://www.jmir.org/2023/1/e32962 %U https://doi.org/10.2196/32962 %U http://www.ncbi.nlm.nih.gov/pubmed/37129947 %0 Journal Article %@ 2561-1011 %I JMIR Publications %V 7 %N %P e44791 %T Feasibility of Artificial Intelligence–Based Electrocardiography Analysis for the Prediction of Obstructive Coronary Artery Disease in Patients With Stable Angina: Validation Study %A Park,Jiesuck %A Yoon,Yeonyee %A Cho,Youngjin %A Kim,Joonghee %+ Department of Cardiology, Seoul National University Bundang Hospital, 82, Gumi-ro 173 beon-gil, Bundang-gu, Seongnam, Gyeonggi-do, 13620, Republic of Korea, 82 31 787 7072, yeonyeeyoon@gmail.com %K artificial intelligence %K AI %K coronary artery disease %K coronary stenosis %K electrocardiography %K stable angina %D 2023 %7 2.5.2023 %9 Original Paper %J JMIR Cardio %G English %X Background: Despite accumulating research on artificial intelligence–based electrocardiography (ECG) algorithms for predicting acute coronary syndrome (ACS), their application in stable angina is not well evaluated. Objective: We evaluated the utility of an existing artificial intelligence–based quantitative electrocardiography (QCG) analyzer in stable angina and developed a new ECG biomarker more suitable for stable angina. Methods: This single-center study comprised consecutive patients with stable angina. The independent and incremental value of QCG scores for coronary artery disease (CAD)–related conditions (ACS, myocardial injury, critical status, ST-elevation myocardial infarction, and left ventricular dysfunction) for predicting obstructive CAD confirmed by invasive angiography was examined. Additionally, ECG signals extracted by the QCG analyzer were used as input to develop a new QCG score. Results: Among 723 patients with stable angina (median age 68 years; male: 470/723, 65%), 497 (69%) had obstructive CAD. QCG scores for ACS and myocardial injury were independently associated with obstructive CAD (odds ratio [OR] 1.09, 95% CI 1.03-1.17 and OR 1.08, 95% CI 1.02-1.16 per 10-point increase, respectively) but did not significantly improve prediction performance compared to clinical features. However, our new QCG score demonstrated better prediction performance for obstructive CAD (area under the receiver operating characteristic curve 0.802) than the original QCG scores, with incremental predictive value in combination with clinical features (area under the receiver operating characteristic curve 0.827 vs 0.730; P<.001). Conclusions: QCG scores developed for acute conditions show limited performance in identifying obstructive CAD in stable angina. However, improvement in the QCG analyzer, through training on comprehensive ECG signals in patients with stable angina, is feasible. %M 37129937 %R 10.2196/44791 %U https://cardio.jmir.org/2023/1/e44791 %U https://doi.org/10.2196/44791 %U http://www.ncbi.nlm.nih.gov/pubmed/37129937 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e38169 %T Exploring Differential Perceptions of Artificial Intelligence in Health Care Among Younger Versus Older Canadians: Results From the 2021 Canadian Digital Health Survey %A Cinalioglu,Karin %A Elbaz,Sasha %A Sekhon,Kerman %A Su,Chien-Lin %A Rej,Soham %A Sekhon,Harmehr %+ Department of Psychiatry, Lady Davis Institute for Medical Research, Jewish General Hospital, 3755 Chem de la Côte-Sainte-Catherine, Montreal, QC, H3T 1E2, Canada, 1 5145508683, karin.cinalioglu@mail.mcgill.ca %K artificial intelligence %K telehealth %K telemedicine %K older adult %K perception %K technology %K public opinion %K national survey %K Canada %K Canadian %K attitude %K adoption %K trust %K satisfaction %D 2023 %7 28.4.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: The changing landscape of health care has led to the incorporation of powerful new technologies like artificial intelligence (AI) to assist with various services across a hospital. However, despite the potential outcomes that this tool may provide, little work has examined public opinion regarding their use. Objective: In this study, we aim to explore differences between younger versus older Canadians with regard to the level of comfort and perceptions around the adoption and use of AI in health care settings. Methods: Using data from the 2021 Canadian Digital Health Survey (n=12,052), items related to perceptions about the use of AI as well as previous experience and satisfaction with health care were identified. We conducted Mann-Whitney U tests to compare the level of comfort of younger versus older Canadians regarding the use of AI in health care for a variety of purposes. Multinomial logistic regression was used to predict the comfort ratings based on categorical indicators. Results: Younger Canadians had greater knowledge of AI, but older Canadians were more comfortable with AI applied to monitoring and predicting health conditions, decision support, diagnostic imaging, precision medicine, drug and vaccine development, disease monitoring at home, tracking epidemics, and optimizing workflow to save time. Additionally, for older respondents, higher satisfaction led to higher comfort ratings. Only 1 interaction effect was identified between previous experience, satisfaction, and comfort with AI for drug and vaccine development. Conclusions: Older Canadians may be more open to various applications of AI within health care than younger Canadians. High satisfaction may be a critical criterion for comfort with AI, especially for older Canadians. Additionally, in the case of drug and vaccine development, previous experience may be an important moderating factor. We conclude that gaining a greater understanding of the perceptions of all health care users is integral to the implementation and sustainability of new and cutting-edge technologies in health care settings. %M 37115588 %R 10.2196/38169 %U https://www.jmir.org/2023/1/e38169 %U https://doi.org/10.2196/38169 %U http://www.ncbi.nlm.nih.gov/pubmed/37115588 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e47995 %T Reporting and Methodological Observations on Prognostic and Diagnostic Machine Learning Studies %A El Emam,Khaled %A Klement,William %A Malin,Bradley %+ School of Epidemiology and Public Health, University of Ottawa, 401 Smyth Rd, Ottawa, ON, K1H 8L1, Canada, 1 6137975412, kelemam@ehealthinformation.ca %K reporting guidelines %K machine learning %K modeling studies %K prognostic studies %K methodological observations %K diagnostic studies %K ML models %D 2023 %7 28.4.2023 %9 Editorial %J JMIR AI %G English %X Common reporting and methodological patterns were observed from the peer reviews of prognostic and diagnostic machine learning modeling studies submitted to JMIR AI. In this editorial, we summarized some key observations to inform future studies and their reporting. %M 32148429 %R 10.2196/47995 %U https://ai.jmir.org/2023/1/e47995 %U https://doi.org/10.2196/47995 %U http://www.ncbi.nlm.nih.gov/pubmed/32148429 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e31694 %T My Health, My Life, My Way—An Inclusive Web-Based Self-management Program for People With Disabilities Living With Chronic Conditions: Protocol for a Multiphase Optimization Strategy Study %A Evans,Eric %A Zengul,Ayse %A Knight,Amy %A Willig,Amanda %A Cherrington,Andrea %A Mehta,Tapan %A Thirumalai,Mohanraj %+ Department of Health Services Administration, School of Health Professions, University of Alabama at Birmingham, 1716 9th Ave S, Birmingham, AL, 35233, United States, 1 2059347189, mohanraj@uab.edu %K telehealth %K health coaching %K artificial intelligence %K chronic conditions %K mobile phone %D 2023 %7 28.4.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: Individuals with disabilities living with chronic health conditions require self-management programs that are accessible, sustainable, inclusive, and adaptable. Health coaching is an effective approach to promoting behavior change in self-management. Health coaching combined with telehealth technology has the potential to improve the overall quality of, and access to, health services. Objective: This protocol outlines the study design for implementing the My Health, My Life, My Way intervention. The study will assess the feasibility, acceptability, and preliminary efficacy of the intervention for people with disabilities and optimize it. Methods: The My Health, My Life, My Way study is a 4-arm randomized controlled trial evaluating the delivery of a 6-month intervention involving telecoaching, inclusive educational content, and technology access for 200 individuals with chronic conditions and physical disabilities. This study uses the engineering-inspired multiphase optimization strategy (MOST) framework to evaluate intervention components and assess whether a combination or lack of individual elements influences behavior. Participants will be randomized to 1 of 4 study arms: scheduled coaching calls and gamified rewards, no scheduled coaching calls and gamified rewards, scheduled coaching calls and flat rewards, and no scheduled coaching calls and flat rewards. Results: The My Health, My Life, My Way study was approved by the institutional review board of the University of Alabama at Birmingham, and recruitment and enrollment will begin in May 2023. Data analysis is expected to be completed within 6 months of ending data collection. This clinical trial protocol was developed based on the SPIRIT (Standard Protocol Items: Recommendations for Interventional Trials) 2013 statement. Conclusions: The My Health, My Life, My Way study will help to optimize and improve our understanding of the feasibility and efficacy of a web-based self-management program for people with physical disabilities and chronic conditions. More specifically, My Health, My Life, My Way will determine which combination of interventions (coaching calls and gamification) will result in increased participation in self-management programming. The My Health, My Life, My Way intervention has the potential to become a scalable and novel method to successfully manage chronic conditions in people with disabilities. Trial Registration: ClinicalTrials.gov NCT05481593; https://clinicaltrials.gov/ct2/show/NCT05481593 International Registered Report Identifier (IRRID): PRR1-10.2196/31694 %M 37115620 %R 10.2196/31694 %U https://www.researchprotocols.org/2023/1/e31694 %U https://doi.org/10.2196/31694 %U http://www.ncbi.nlm.nih.gov/pubmed/37115620 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e44357 %T Ethical Implications of Artificial Intelligence in Population Health and the Public’s Role in Its Governance: Perspectives From a Citizen and Expert Panel %A Couture,Vincent %A Roy,Marie-Christine %A Dez,Emma %A Laperle,Samuel %A Bélisle-Pipon,Jean-Christophe %+ Faculty of Nursing, Université Laval, 1050 Avenue de la Médecine, Québec, QC, G1V 0A6, Canada, 1 418 656 2131 ext 407900, vincent.couture@fsi.ulaval.ca %K artificial intelligence %K population health %K citizen engagement %K ethics %K bioethics %K digital app %D 2023 %7 27.4.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) systems are widely used in the health care sector. Mainly applied for individualized care, AI is increasingly aimed at population health. This raises important ethical considerations but also calls for responsible governance, considering that this will affect the population. However, the literature points to a lack of citizen participation in the governance of AI in health. Therefore, it is necessary to investigate the governance of the ethical and societal implications of AI in population health. Objective: This study aimed to explore the perspectives and attitudes of citizens and experts regarding the ethics of AI in population health, the engagement of citizens in AI governance, and the potential of a digital app to foster citizen engagement. Methods: We recruited a panel of 21 citizens and experts. Using a web-based survey, we explored their perspectives and attitudes on the ethical issues of AI in population health, the relative role of citizens and other actors in AI governance, and the ways in which citizens can be supported to participate in AI governance through a digital app. The responses of the participants were analyzed quantitatively and qualitatively. Results: According to the participants, AI is perceived to be already present in population health and its benefits are regarded positively, but there is a consensus that AI has substantial societal implications. The participants also showed a high level of agreement toward involving citizens into AI governance. They highlighted the aspects to be considered in the creation of a digital app to foster this involvement. They recognized the importance of creating an app that is both accessible and transparent. Conclusions: These results offer avenues for the development of a digital app to raise awareness, to survey, and to support citizens’ decision-making regarding the ethical, legal, and social issues of AI in population health. %M 37104026 %R 10.2196/44357 %U https://www.jmir.org/2023/1/e44357 %U https://doi.org/10.2196/44357 %U http://www.ncbi.nlm.nih.gov/pubmed/37104026 %0 Journal Article %@ 2561-1011 %I JMIR Publications %V 7 %N %P e45299 %T Accuracy of Artificial Intelligence–Based Automated Quantitative Coronary Angiography Compared to Intravascular Ultrasound: Retrospective Cohort Study %A Moon,In Tae %A Kim,Sun-Hwa %A Chin,Jung Yeon %A Park,Sung Hun %A Yoon,Chang-Hwan %A Youn,Tae-Jin %A Chae,In-Ho %A Kang,Si-Hyuck %+ Seoul National University Bundang Hospital, 82, Gumi-Ro 173 Beon-Gil, Bundang-Gu, Seongnam-Si, Gyeonggi-Do, Seongnam, 13620, Republic of Korea, 82 31 787 7027, eandp303@gmail.com %K artificial intelligence %K AI %K coronary angiography %K coronary stenosis %K interventional ultrasonography %K coronary %K machine learning %K angiography %K stenosis %K automated analysis %K computer vision %D 2023 %7 26.4.2023 %9 Original Paper %J JMIR Cardio %G English %X Background: An accurate quantitative analysis of coronary artery stenotic lesions is essential to make optimal clinical decisions. Recent advances in computer vision and machine learning technology have enabled the automated analysis of coronary angiography. Objective: The aim of this paper is to validate the performance of artificial intelligence–based quantitative coronary angiography (AI-QCA) in comparison with that of intravascular ultrasound (IVUS). Methods: This retrospective study included patients who underwent IVUS-guided coronary intervention at a single tertiary center in Korea. Proximal and distal reference areas, minimal luminal area, percent plaque burden, and lesion length were measured by AI-QCA and human experts using IVUS. First, fully automated QCA analysis was compared with IVUS analysis. Next, we adjusted the proximal and distal margins of AI-QCA to avoid geographic mismatch. Scatter plots, Pearson correlation coefficients, and Bland-Altman were used to analyze the data. Results: A total of 54 significant lesions were analyzed in 47 patients. The proximal and distal reference areas, as well as the minimal luminal area, showed moderate to strong correlation between the 2 modalities (correlation coefficients of 0.57, 0.80, and 0.52, respectively; P<.001). The correlation was weaker for percent area stenosis and lesion length, although statistically significant (correlation coefficients of 0.29 and 0.33, respectively). AI-QCA tended to measure reference vessel areas smaller and lesion lengths shorter than IVUS did. Systemic proportional bias was not observed in Bland-Altman plots. The biggest cause of bias originated from the geographic mismatch of AI-QCA with IVUS. Discrepancies in the proximal or distal lesion margins were observed between the 2 modalities, which were more frequent at the distal margins. After the adjustment of proximal or distal margins, there was a stronger correlation of proximal and distal reference areas between AI-QCA and IVUS (correlation coefficients of 0.70 and 0.83, respectively). Conclusions: AI-QCA showed a moderate to strong correlation compared with IVUS in analyzing coronary lesions with significant stenosis. The main discrepancy was in the perception of the distal margins by AI-QCA, and the correction of margins improved the correlation coefficients. We believe that this novel tool could provide confidence to treating physicians and help in making optimal clinical decisions. %M 37099368 %R 10.2196/45299 %U https://cardio.jmir.org/2023/1/e45299 %U https://doi.org/10.2196/45299 %U http://www.ncbi.nlm.nih.gov/pubmed/37099368 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e41748 %T Artificial Intelligence–Based Ethical Hacking for Health Information Systems: Simulation Study %A He,Ying %A Zamani,Efpraxia %A Yevseyeva,Iryna %A Luo,Cunjin %+ School of Computer Science and Electronic Engineering, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom, 44 7493622995, cunjin.luo@essex.ac.uk %K health information system %K HIS %K ethical hacking %K open-source electronic medical record %K OpenEMR %K artificial intelligence %K AI-based hacking %K cyber defense solutions %D 2023 %7 25.4.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Health information systems (HISs) are continuously targeted by hackers, who aim to bring down critical health infrastructure. This study was motivated by recent attacks on health care organizations that have resulted in the compromise of sensitive data held in HISs. Existing research on cybersecurity in the health care domain places an imbalanced focus on protecting medical devices and data. There is a lack of a systematic way to investigate how attackers may breach an HIS and access health care records. Objective: This study aimed to provide new insights into HIS cybersecurity protection. We propose a systematic, novel, and optimized (artificial intelligence–based) ethical hacking method tailored specifically for HISs, and we compared it with the traditional unoptimized ethical hacking method. This allows researchers and practitioners to identify the points and attack pathways of possible penetration attacks on the HIS more efficiently. Methods: In this study, we propose a novel methodological approach to ethical hacking in HISs. We implemented ethical hacking using both optimized and unoptimized methods in an experimental setting. Specifically, we set up an HIS simulation environment by implementing the open-source electronic medical record (OpenEMR) system and followed the National Institute of Standards and Technology’s ethical hacking framework to launch the attacks. In the experiment, we launched 50 rounds of attacks using both unoptimized and optimized ethical hacking methods. Results: Ethical hacking was successfully conducted using both optimized and unoptimized methods. The results show that the optimized ethical hacking method outperforms the unoptimized method in terms of average time used, the average success rate of exploit, the number of exploits launched, and the number of successful exploits. We were able to identify the successful attack paths and exploits that are related to remote code execution, cross-site request forgery, improper authentication, vulnerability in the Oracle Business Intelligence Publisher, an elevation of privilege vulnerability (in MediaTek), and remote access backdoor (in the web graphical user interface for the Linux Virtual Server). Conclusions: This research demonstrates systematic ethical hacking against an HIS using optimized and unoptimized methods, together with a set of penetration testing tools to identify exploits and combining them to perform ethical hacking. The findings contribute to the HIS literature, ethical hacking methodology, and mainstream artificial intelligence–based ethical hacking methods because they address some key weaknesses of these research fields. These findings also have great significance for the health care sector, as OpenEMR is widely adopted by health care organizations. Our findings offer novel insights for the protection of HISs and allow researchers to conduct further research in the HIS cybersecurity domain. %M 37097723 %R 10.2196/41748 %U https://www.jmir.org/2023/1/e41748 %U https://doi.org/10.2196/41748 %U http://www.ncbi.nlm.nih.gov/pubmed/37097723 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e38039 %T What Works Where and How for Uptake and Impact of Artificial Intelligence in Pathology: Review of Theories for a Realist Evaluation %A King,Henry %A Wright,Judy %A Treanor,Darren %A Williams,Bethany %A Randell,Rebecca %+ Faculty of Health Studies, University of Bradford, Richmond Rd, Bradford, BD7 1DP, United Kingdom, 44 1274234144, r.randell@bradford.ac.uk %K artificial intelligence %K AI %K machine learning %K histopathology %K pathology %K implementation %K review %D 2023 %7 24.4.2023 %9 Review %J J Med Internet Res %G English %X Background: There is increasing interest in the use of artificial intelligence (AI) in pathology to increase accuracy and efficiency. To date, studies of clinicians’ perceptions of AI have found only moderate acceptability, suggesting the need for further research regarding how to integrate it into clinical practice. Objective: The aim of the study was to determine contextual factors that may support or constrain the uptake of AI in pathology. Methods: To go beyond a simple listing of barriers and facilitators, we drew on the approach of realist evaluation and undertook a review of the literature to elicit stakeholders’ theories of how, for whom, and in what circumstances AI can provide benefit in pathology. Searches were designed by an information specialist and peer-reviewed by a second information specialist. Searches were run on the arXiv.org repository, MEDLINE, and the Health Management Information Consortium, with additional searches undertaken on a range of websites to identify gray literature. In line with a realist approach, we also made use of relevant theory. Included documents were indexed in NVivo 12, using codes to capture different contexts, mechanisms, and outcomes that could affect the introduction of AI in pathology. Coded data were used to produce narrative summaries of each of the identified contexts, mechanisms, and outcomes, which were then translated into theories in the form of context-mechanism-outcome configurations. Results: A total of 101 relevant documents were identified. Our analysis indicates that the benefits that can be achieved will vary according to the size and nature of the pathology department’s workload and the extent to which pathologists work collaboratively; the major perceived benefit for specialist centers is in reducing workload. For uptake of AI, pathologists’ trust is essential. Existing theories suggest that if pathologists are able to “make sense” of AI, engage in the adoption process, receive support in adapting their work processes, and can identify potential benefits to its introduction, it is more likely to be accepted. Conclusions: For uptake of AI in pathology, for all but the most simple quantitative tasks, measures will be required that either increase confidence in the system or provide users with an understanding of the performance of the system. For specialist centers, efforts should focus on reducing workload rather than increasing accuracy. Designers also need to give careful thought to usability and how AI is integrated into pathologists’ workflow. %M 37093631 %R 10.2196/38039 %U https://www.jmir.org/2023/1/e38039 %U https://doi.org/10.2196/38039 %U http://www.ncbi.nlm.nih.gov/pubmed/37093631 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e46428 %T Artificial Intelligence Teaching as Part of Medical Education: Qualitative Analysis of Expert Interviews %A Weidener,Lukas %A Fischer,Michael %+ Research Unit for Quality and Ethics in Health Care, UMIT TIROL – Private University for Health Sciences and Health Technology, Eduard-Wallnöfer-Zentrum 1, Hall in Tirol, 6060, Austria, 43 17670491594, lukas.weidener@edu.umit-tirol.at %K AI technology %K artificial intelligence %K clinical context %K expert interviews %K health care %K medical curriculum %K medical education %K medical school %K medical student %K medicine %D 2023 %7 24.4.2023 %9 Original Paper %J JMIR Med Educ %G English %X Background: The use of artificial intelligence (AI) in medicine is expected to increase significantly in the upcoming years. Advancements in AI technology have the potential to revolutionize health care, from aiding in the diagnosis of certain diseases to helping with treatment decisions. Current literature suggests the integration of the subject of AI in medicine as part of the medical curriculum to prepare medical students for the opportunities and challenges related to the use of the technology within the clinical context. Objective: We aimed to explore the relevant knowledge and understanding of the subject of AI in medicine and specify curricula teaching content within medical education. Methods: For this research, we conducted 12 guideline-based expert interviews. Experts were defined as individuals who have been engaged in full-time academic research, development, or teaching in the field of AI in medicine for at least 5 years. As part of the data analysis, we recorded, transcribed, and analyzed the interviews using qualitative content analysis. We used the software QCAmap and inductive category formation to analyze the data. Results: The qualitative content analysis led to the formation of three main categories (“Knowledge,” “Interpretation,” and “Application”) with a total of 9 associated subcategories. The experts interviewed cited knowledge and an understanding of the fundamentals of AI, statistics, ethics, and privacy and regulation as necessary basic knowledge that should be part of medical education. The analysis also showed that medical students need to be able to interpret as well as critically reflect on the results provided by AI, taking into account the associated risks and data basis. To enable the application of AI in medicine, medical education should promote the acquisition of practical skills, including the need for basic technological skills, as well as the development of confidence in the technology and one’s related competencies. Conclusions: The analyzed expert interviews’ results suggest that medical curricula should include the topic of AI in medicine to develop the knowledge, understanding, and confidence needed to use AI in the clinical context. The results further imply an imminent need for standardization of the definition of AI as the foundation to identify, define, and teach respective content on AI within medical curricula. %M 36946094 %R 10.2196/46428 %U https://mededu.jmir.org/2023/1/e46428 %U https://doi.org/10.2196/46428 %U http://www.ncbi.nlm.nih.gov/pubmed/36946094 %0 Journal Article %@ 2373-6658 %I JMIR Publications %V 7 %N %P e48136 %T Impact of ChatGPT on Interdisciplinary Nursing Education and Research %A Miao,Hongyu %A Ahn,Hyochol %+ Florida State University, 98 Varsity Way, Tallahassee, FL, 32306, United States, 1 8506442647, hyochol.ahn@jmir.org %K ChatGPT %K nursing education %K nursing research %K artificial intelligence %K OpenAI %D 2023 %7 24.4.2023 %9 Editorial %J Asian Pac Isl Nurs J %G English %X ChatGPT, a trending artificial intelligence tool developed by OpenAI, was launched in November 2022. The impact of ChatGPT on the nursing and interdisciplinary research ecosystem is profound. %M 37093625 %R 10.2196/48136 %U https://apinj.jmir.org/2023/1/e48136 %U https://doi.org/10.2196/48136 %U http://www.ncbi.nlm.nih.gov/pubmed/37093625 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e45475 %T Digital Guardian Angel Supported by an Artificial Intelligence System to Improve Quality of Life, Well-being, and Health Outcomes of Patients With Cancer (ONCORELIEF): Protocol for a Single Arm Prospective Multicenter Pilot Study %A Reis,Joaquim %A Travado,Luzia %A Scherrer,Alexander %A Kosmidis,Thanos %A Venios,Stefanos %A Laras,Paris Emmanouil %A Oestreicher,Gabrielle %A Moehler,Markus %A Parolini,Margherita %A Passardi,Alessandro %A Meggiolaro,Elena %A Martinelli,Giovanni %A Petracci,Elisabetta %A Zingaretti,Chiara %A Diamantopoulos,Sotiris %A Plakia,Maria %A Vassiliou,Charalampos %A Mousa,Suheib %A Zifrid,Robert %A Sullo,Francesco Giulio %A Gallio,Chiara %+ Institute of Biophysics and Biomedical Engineering, Faculty of Sciences, University of Lisbon, Campo Grande, Lisboa, 1749-016, Portugal, 351 217 500 17, jdcreis@fc.ul.pt %K eHealth %K artificial intelligence %K quality of life and well-being %K supportive cancer care %K mobile phone %K cancer support %K artificial intelligence–based recommendations %D 2023 %7 21.4.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: According to Europe’s Beating Cancer Plan, the number of cancer survivors is growing every year and is now estimated at over 12 million in Europe. A main objective of the European Commission is to ensure that cancer survivors can enjoy a high quality of life, underlining the role of digital technology and eHealth apps and tools to achieve this. Objective: The main objective of this study is the development of a user-centered artificial intelligence system to facilitate the input and integration of patient-related biopsychosocial data to improve posttreatment quality of life, well-being, and health outcomes and examine the feasibility of this digitally assisted workflow in a real-life setting in patients with colorectal cancer and acute myeloid leukemia. Methods: A total of 60 patients with colorectal cancer and 30 patients with acute myeloid leukemia will be recruited from 2 clinical centers: Universitätsmedizin der Johannes Gutenberg-Universität Mainz (Mainz, Germany) and IRCCS Istituto Romagnolo per lo Studio dei Tumori “Dino Amadori” (IRST, Italy). Psychosocial data (eg, emotional distress, fatigue, quality of life, subjective well-being, sleep problems, and appetite loss) will be collected by questionnaires via a smartphone app, and physiological data (eg, heart rate, skin temperature, and movement through step count) will be collected by a customizable smart wrist-worn sensor device. Each patient will be assessed every 2 weeks over their 3-month participation in the ONCORELIEF study. Inclusion criteria include patients with the diagnosis of acute myeloid leukemia or colorectal cancer, adult patients aged 18 years and older, life expectancy greater than 12 months, Eastern Cooperative Oncology Group performance status ≤2, and patients who have a smartphone and agree to use it for the purpose of the study. Exclusion criteria include patients with a reduced cognitive function (such as dementia) or technological illiteracy and other known active malignant neoplastic diseases (patients with a medical history of treated neoplastic disease are included). Results: The pilot study started on September 1, 2022. As of January 2023, we enrolled 33 patients with colorectal cancer and 7 patients with acute myeloid leukemia. As of January 2023, we have not yet started the data analysis. We expect to get all data in June 2023 and expect the results to be published in the second semester of 2023. Conclusions: Web-based and mobile apps use methods from mathematical decision support and artificial intelligence through a closed-loop workflow that connects health professionals and patients. The ONCORELIEF system has the potential of continuously identifying, collecting, and processing data from diverse patient dimensions to offer health care recommendations, support patients with cancer to address their unmet needs, and optimize survivorship care. Trial Registration: German Clinical Trials Register (DRKS) 00027808; https://drks.de/search/en/trial/DRKS00027808 International Registered Report Identifier (IRRID): DERR1-10.2196/45475 %M 37083563 %R 10.2196/45475 %U https://www.researchprotocols.org/2023/1/e45475 %U https://doi.org/10.2196/45475 %U http://www.ncbi.nlm.nih.gov/pubmed/37083563 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e43958 %T Exploring Stakeholder Requirements to Enable Research and Development of Artificial Intelligence Algorithms in a Hospital-Based Generic Infrastructure: Results of a Multistep Mixed Methods Study %A Weinert,Lina %A Klass,Maximilian %A Schneider,Gerd %A Heinze,Oliver %+ Section for Translational Health Economics, Department for Conservative Dentistry, Heidelberg University Hospital, Im Neuenheimer Feld 130.3, Heidelberg, 69120, Germany, 49 622156 ext 34367, lina.weinert@med.uni-heidelberg.de %K artificial intelligence %K requirements analysis %K mixed-methods %K data availability %K qualitative research %D 2023 %7 18.4.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Legal, controlled, and regulated access to high-quality data from academic hospitals currently poses a barrier to the development and testing of new artificial intelligence (AI) algorithms. To overcome this barrier, the German Federal Ministry of Health supports the “pAItient” (Protected Artificial Intelligence Innovation Environment for Patient Oriented Digital Health Solutions for developing, testing and evidence-based evaluation of clinical value) project, with the goal to establish an AI Innovation Environment at the Heidelberg University Hospital, Germany. It is designed as a proof-of-concept extension to the preexisting Medical Data Integration Center. Objective: The first part of the pAItient project aims to explore stakeholders’ requirements for developing AI in partnership with an academic hospital and granting AI experts access to anonymized personal health data. Methods: We designed a multistep mixed methods approach. First, researchers and employees from stakeholder organizations were invited to participate in semistructured interviews. In the following step, questionnaires were developed based on the participants’ answers and distributed among the stakeholders’ organizations. In addition, patients and physicians were interviewed. Results: The identified requirements covered a wide range and were conflicting sometimes. Relevant patient requirements included adequate provision of necessary information for data use, clear medical objective of the research and development activities, trustworthiness of the organization collecting the patient data, and data should not be reidentifiable. Requirements of AI researchers and developers encompassed contact with clinical users, an acceptable user interface (UI) for shared data platforms, stable connection to the planned infrastructure, relevant use cases, and assistance in dealing with data privacy regulations. In a next step, a requirements model was developed, which depicts the identified requirements in different layers. This developed model will be used to communicate stakeholder requirements within the pAItient project consortium. Conclusions: The study led to the identification of necessary requirements for the development, testing, and validation of AI applications within a hospital-based generic infrastructure. A requirements model was developed, which will inform the next steps in the development of an AI innovation environment at our institution. Results from our study replicate previous findings from other contexts and will add to the emerging discussion on the use of routine medical data for the development of AI applications. International Registered Report Identifier (IRRID): RR2-10.2196/42208 %M 37071450 %R 10.2196/43958 %U https://formative.jmir.org/2023/1/e43958 %U https://doi.org/10.2196/43958 %U http://www.ncbi.nlm.nih.gov/pubmed/37071450 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 10 %N %P e43960 %T Performance of a Web-Based Reference Database With Natural Language Searching Capabilities: Usability Evaluation of DynaMed and Micromedex With Watson %A Rui,Angela %A Garabedian,Pamela M %A Marceau,Marlika %A Syrowatka,Ania %A Volk,Lynn A %A Edrees,Heba H %A Seger,Diane L %A Amato,Mary G %A Cambre,Jacob %A Dulgarian,Sevan %A Newmark,Lisa P %A Nanji,Karen C %A Schultz,Petra %A Jackson,Gretchen Purcell %A Rozenblum,Ronen %A Bates,David W %+ Division of General Internal Medicine, Brigham and Women's Hospital, 75 Francis St, Boston, MA, 02115, United States, 1 978 397 0082, arui@partners.org %K medication safety %K patient safety %K usability %K searching behavior %K efficiency %K quality of care %K web-based databases %K point-of-care information %K POCI %K point-of-care tools %K artificial intelligence %K machine learning %K clinical decision support %K natural language processing %D 2023 %7 17.4.2023 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Evidence-based point-of-care information (POCI) tools can facilitate patient safety and care by helping clinicians to answer disease state and drug information questions in less time and with less effort. However, these tools may also be visually challenging to navigate or lack the comprehensiveness needed to sufficiently address a medical issue. Objective: This study aimed to collect clinicians’ feedback and directly observe their use of the combined POCI tool DynaMed and Micromedex with Watson, now known as DynaMedex. EBSCO partnered with IBM Watson Health, now known as Merative, to develop the combined tool as a resource for clinicians. We aimed to identify areas for refinement based on participant feedback and examine participant perceptions to inform further development. Methods: Participants (N=43) within varying clinical roles and specialties were recruited from Brigham and Women’s Hospital and Massachusetts General Hospital in Boston, Massachusetts, United States, between August 10, 2021, and December 16, 2021, to take part in usability sessions aimed at evaluating the efficiency and effectiveness of, as well as satisfaction with, the DynaMed and Micromedex with Watson tool. Usability testing methods, including think aloud and observations of user behavior, were used to identify challenges regarding the combined tool. Data collection included measurements of time on task; task ease; satisfaction with the answer; posttest feedback on likes, dislikes, and perceived reliability of the tool; and interest in recommending the tool to a colleague. Results: On a 7-point Likert scale, pharmacists rated ease (mean 5.98, SD 1.38) and satisfaction (mean 6.31, SD 1.34) with the combined POCI tool higher than the physicians, nurse practitioner, and physician’s assistants (ease: mean 5.57, SD 1.64, and satisfaction: mean 5.82, SD 1.60). Pharmacists spent longer (mean 2 minutes, 26 seconds, SD 1 minute, 41 seconds) on average finding an answer to their question than the physicians, nurse practitioner, and physician’s assistants (mean 1 minute, 40 seconds, SD 1 minute, 23 seconds). Conclusions: Overall, the tool performed well, but this usability evaluation identified multiple opportunities for improvement that would help inexperienced users. %M 37067858 %R 10.2196/43960 %U https://humanfactors.jmir.org/2023/1/e43960 %U https://doi.org/10.2196/43960 %U http://www.ncbi.nlm.nih.gov/pubmed/37067858 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e44410 %T Development and Validation of a Respiratory-Responsive Vocal Biomarker–Based Tool for Generalizable Detection of Respiratory Impairment: Independent Case-Control Studies in Multiple Respiratory Conditions Including Asthma, Chronic Obstructive Pulmonary Disease, and COVID-19 %A Kaur,Savneet %A Larsen,Erik %A Harper,James %A Purandare,Bharat %A Uluer,Ahmet %A Hasdianda,Mohammad Adrian %A Umale,Nikita Arun %A Killeen,James %A Castillo,Edward %A Jariwala,Sunit %+ Montefiore Medical Center, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY, 10461, United States, 1 3478064503, savneetkaur@hsph.harvard.edu %K vocal biomarkers %K COVID-19 %K respiratory-responsive vocal biomarker %K RRVB %K artificial intelligence %K machine learning %K asthma %K smartphones %K mobile phone %K eHealth %K mobile health %K mHealth %K respiratory symptom %K respiratory %K voice %K vocal %K sound %K speech %D 2023 %7 14.4.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Vocal biomarker–based machine learning approaches have shown promising results in the detection of various health conditions, including respiratory diseases, such as asthma. Objective: This study aimed to determine whether a respiratory-responsive vocal biomarker (RRVB) model platform initially trained on an asthma and healthy volunteer (HV) data set can differentiate patients with active COVID-19 infection from asymptomatic HVs by assessing its sensitivity, specificity, and odds ratio (OR). Methods: A logistic regression model using a weighted sum of voice acoustic features was previously trained and validated on a data set of approximately 1700 patients with a confirmed asthma diagnosis and a similar number of healthy controls. The same model has shown generalizability to patients with chronic obstructive pulmonary disease, interstitial lung disease, and cough. In this study, 497 participants (female: n=268, 53.9%; <65 years old: n=467, 94%; Marathi speakers: n=253, 50.9%; English speakers: n=223, 44.9%; Spanish speakers: n=25, 5%) were enrolled across 4 clinical sites in the United States and India and provided voice samples and symptom reports on their personal smartphones. The participants included patients who are symptomatic COVID-19 positive and negative as well as asymptomatic HVs. The RRVB model performance was assessed by comparing it with the clinical diagnosis of COVID-19 confirmed by reverse transcriptase–polymerase chain reaction. Results: The ability of the RRVB model to differentiate patients with respiratory conditions from healthy controls was previously demonstrated on validation data in asthma, chronic obstructive pulmonary disease, interstitial lung disease, and cough, with ORs of 4.3, 9.1, 3.1, and 3.9, respectively. The same RRVB model in this study in COVID-19 performed with a sensitivity of 73.2%, specificity of 62.9%, and OR of 4.64 (P<.001). Patients who experienced respiratory symptoms were detected more frequently than those who did not experience respiratory symptoms and completely asymptomatic patients (sensitivity: 78.4% vs 67.4% vs 68%, respectively). Conclusions: The RRVB model has shown good generalizability across respiratory conditions, geographies, and languages. Results using data set of patients with COVID-19 demonstrate its meaningful potential to serve as a prescreening tool for identifying individuals at risk for COVID-19 infection in combination with temperature and symptom reports. Although not a COVID-19 test, these results suggest that the RRVB model can encourage targeted testing. Moreover, the generalizability of this model for detecting respiratory symptoms across different linguistic and geographic contexts suggests a potential path for the development and validation of voice-based tools for broader disease surveillance and monitoring applications in the future. %M 36881540 %R 10.2196/44410 %U https://www.jmir.org/2023/1/e44410 %U https://doi.org/10.2196/44410 %U http://www.ncbi.nlm.nih.gov/pubmed/36881540 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e43682 %T Learning From Experience and Finding the Right Balance in the Governance of Artificial Intelligence and Digital Health Technologies %A Gilbert,Stephen %A Anderson,Stuart %A Daumer,Martin %A Li,Phoebe %A Melvin,Tom %A Williams,Robin %+ Else Kröner Fresenius Center for Digital Health, Technische Universität Dresden, 74 Fetscherstraße, Dresden, 01307, Germany, 49 35145819630, stephen.gilbert@tu-dresden.de %K artificial intelligence %K machine learning %K regulation %K algorithm change protocol %K health care %K regulatory framework %K medical tool %K tool %K patient %K intervention %K safety %K performance %K technology %K implementation %D 2023 %7 14.4.2023 %9 Viewpoint %J J Med Internet Res %G English %X Artificial intelligence (AI) and machine learning medical tools have the potential to be transformative in care delivery; however, this change will only be realized if accompanied by effective governance that ensures patient safety and public trust. Recent digital health initiatives have called for tighter governance of digital health. A correct balance must be found between ensuring product safety and performance while also enabling the innovation needed to deliver better approaches for patients and affordable efficient health care for society. This requires innovative, fit-for-purpose approaches to regulation. Digital health technologies, particularly AI-based tools, pose specific challenges to the development and implementation of functional regulation. The approaches of regulatory science and “better regulation” have a critical role in developing and evaluating solutions to these problems and ensuring effective implementation. We describe the divergent approaches of the European Union and the United States in the implementation of new regulatory approaches in digital health, and we consider the United Kingdom as a third example, which is in a unique position of developing a new post-Brexit regulatory framework. %M 37058329 %R 10.2196/43682 %U https://www.jmir.org/2023/1/e43682 %U https://doi.org/10.2196/43682 %U http://www.ncbi.nlm.nih.gov/pubmed/37058329 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e43815 %T A Risk Prediction Model for Physical Restraints Among Older Chinese Adults in Long-term Care Facilities: Machine Learning Study %A Wang,Jun %A Chen,Hongmei %A Wang,Houwei %A Liu,Weichu %A Peng,Daomei %A Zhao,Qinghua %A Xiao,Mingzhao %+ Department of Urology, The First Affiliated Hospital of Chongqing Medical University, 1 Youyi Road, Yuzhong District, Chongqing, 400016, China, 86 13608399433, xmz.2004@163.com %K physical restraint %K prediction model %K machine learning %K stacking ensemble model %K model %K older adults %K elderly %K risk factor %K learning model %K development %K support %K accuracy %K precision %K cognitive impairment %K utility %K management %D 2023 %7 6.4.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Numerous studies have identified risk factors for physical restraint (PR) use in older adults in long-term care facilities. Nevertheless, there is a lack of predictive tools to identify high-risk individuals. Objective: We aimed to develop machine learning (ML)–based models to predict the risk of PR in older adults. Methods: This study conducted a cross-sectional secondary data analysis based on 1026 older adults from 6 long-term care facilities in Chongqing, China, from July 2019 to November 2019. The primary outcome was the use of PR (yes or no), identified by 2 collectors’ direct observation. A total of 15 candidate predictors (older adults’ demographic and clinical factors) that could be commonly and easily collected from clinical practice were used to build 9 independent ML models: Gaussian Naïve Bayesian (GNB), k-nearest neighbor (KNN), decision tree (DT), logistic regression (LR), support vector machine (SVM), random forest (RF), multilayer perceptron (MLP), extreme gradient boosting (XGBoost), and light gradient boosting machine (Lightgbm), as well as stacking ensemble ML. Performance was evaluated using accuracy, precision, recall, an F score, a comprehensive evaluation indicator (CEI) weighed by the above indicators, and the area under the receiver operating characteristic curve (AUC). A net benefit approach using the decision curve analysis (DCA) was performed to evaluate the clinical utility of the best model. Models were tested via 10-fold cross-validation. Feature importance was interpreted using Shapley Additive Explanations (SHAP). Results: A total of 1026 older adults (mean 83.5, SD 7.6 years; n=586, 57.1% male older adults) and 265 restrained older adults were included in the study. All ML models performed well, with an AUC above 0.905 and an F score above 0.900. The 2 best independent models are RF (AUC 0.938, 95% CI 0.914-0.947) and SVM (AUC 0.949, 95% CI 0.911-0.953). The DCA demonstrated that the RF model displayed better clinical utility than other models. The stacking model combined with SVM, RF, and MLP performed best with AUC (0.950) and CEI (0.943) values, as well as the DCA curve indicated the best clinical utility. The SHAP plots demonstrated that the significant contributors to model performance were related to cognitive impairment, care dependency, mobility decline, physical agitation, and an indwelling tube. Conclusions: The RF and stacking models had high performance and clinical utility. ML prediction models for predicting the probability of PR in older adults could offer clinical screening and decision support, which could help medical staff in the early identification and PR management of older adults. %M 37023416 %R 10.2196/43815 %U https://www.jmir.org/2023/1/e43815 %U https://doi.org/10.2196/43815 %U http://www.ncbi.nlm.nih.gov/pubmed/37023416 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e43386 %T A “Do No Harm” Novel Safety Checklist and Research Approach to Determine Whether to Launch an Artificial Intelligence–Based Medical Technology: Introducing the Biological-Psychological, Economic, and Social (BPES) Framework %A Khan,Waqas Ullah %A Seto,Emily %+ Health Informatics, Institute of Health Policy, Management and Evaluation, University of Toronto, 155 College Street, Toronto, ON, M6H 1E1, Canada, 1 6474031872, waqas.khan@alum.utoronto.ca %K artificial intelligence %K AI %K safety checklist %K Do No Harm %K biological-psychological factors %K economic factors %K social factors %K AI medical hardware devices %K AI medical mobile apps %K AI medical software programs %D 2023 %7 5.4.2023 %9 Viewpoint %J J Med Internet Res %G English %X Given the impact artificial intelligence (AI)–based medical technologies (hardware devices, software programs, and mobile apps) can have on society, debates regarding the principles behind their development and deployment are emerging. Using the biopsychosocial model applied in psychiatry and other fields of medicine as our foundation, we propose a novel 3-step framework to guide industry developers of AI-based medical tools as well as health care regulatory agencies on how to decide if a product should be launched—a “Go or No-Go” approach. More specifically, our novel framework places stakeholders’ (patients, health care professionals, industry, and government institutions) safety at its core by asking developers to demonstrate the biological-psychological (impact on physical and mental health), economic, and social value of their AI tool before it is launched. We also introduce a novel cost-effective, time-sensitive, and safety-oriented mixed quantitative and qualitative clinical phased trial approach to help industry and government health care regulatory agencies test and deliberate on whether to launch these AI-based medical technologies. To our knowledge, our biological-psychological, economic, and social (BPES) framework and mixed method phased trial approach are the first to place the Hippocratic Oath of “Do No Harm” at the center of developers’, implementers’, regulators’, and users’ mindsets when determining whether an AI-based medical technology is safe to launch. Moreover, as the welfare of AI users and developers becomes a greater concern, our framework’s novel safety feature will allow it to complement existing and future AI reporting guidelines. %M 37018019 %R 10.2196/43386 %U https://www.jmir.org/2023/1/e43386 %U https://doi.org/10.2196/43386 %U http://www.ncbi.nlm.nih.gov/pubmed/37018019 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e40337 %T Surveying Public Perceptions of Artificial Intelligence in Health Care in the United States: Systematic Review %A Beets,Becca %A Newman,Todd P %A Howell,Emily L %A Bao,Luye %A Yang,Shiyu %+ Department of Life Sciences Communication, University of Wisconsin–Madison, 1545 Observatory Dr, Madison, WI, 53706, United States, 1 608 262 1464, becca.beets@wisc.edu %K artificial intelligence %K AI %K public opinion %K public perception %K health care %D 2023 %7 4.4.2023 %9 Review %J J Med Internet Res %G English %X Background: This paper reviews nationally representative public opinion surveys on artificial intelligence (AI) in the United States, with a focus on areas related to health care. The potential health applications of AI continue to gain attention owing to their promise as well as challenges. For AI to fulfill its potential, it must not only be adopted by physicians and health providers but also by patients and other members of the public. Objective: This study reviews the existing survey research on the United States’ public attitudes toward AI in health care and reveals the challenges and opportunities for more effective and inclusive engagement on the use of AI in health settings. Methods: We conducted a systematic review of public opinion surveys, reports, and peer-reviewed journal articles published on Web of Science, PubMed, and Roper iPoll between January 2010 and January 2022. We include studies that are nationally representative US public opinion surveys and include at least one or more questions about attitudes toward AI in health care contexts. Two members of the research team independently screened the included studies. The reviewers screened study titles, abstracts, and methods for Web of Science and PubMed search results. For the Roper iPoll search results, individual survey items were assessed for relevance to the AI health focus, and survey details were screened to determine a nationally representative US sample. We reported the descriptive statistics available for the relevant survey questions. In addition, we performed secondary analyses on 4 data sets to further explore the findings on attitudes across different demographic groups. Results: This review includes 11 nationally representative surveys. The search identified 175 records, 39 of which were assessed for inclusion. Surveys include questions related to familiarity and experience with AI; applications, benefits, and risks of AI in health care settings; the use of AI in disease diagnosis, treatment, and robotic caregiving; and related issues of data privacy and surveillance. Although most Americans have heard of AI, they are less aware of its specific health applications. Americans anticipate that medicine is likely to benefit from advances in AI; however, the anticipated benefits vary depending on the type of application. Specific application goals, such as disease prediction, diagnosis, and treatment, matter for the attitudes toward AI in health care among Americans. Most Americans reported wanting control over their personal health data. The willingness to share personal health information largely depends on the institutional actor collecting the data and the intended use. Conclusions: Americans in general report seeing health care as an area in which AI applications could be particularly beneficial. However, they have substantial levels of concern regarding specific applications, especially those in which AI is involved in decision-making and regarding the privacy of health information. %M 37014676 %R 10.2196/40337 %U https://www.jmir.org/2023/1/e40337 %U https://doi.org/10.2196/40337 %U http://www.ncbi.nlm.nih.gov/pubmed/37014676 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 10 %N %P e43135 %T Evaluating User Experience With a Chatbot Designed as a Public Health Response to the COVID-19 Pandemic in Brazil: Mixed Methods Study %A Chagas,Bruno Azevedo %A Pagano,Adriana Silvina %A Prates,Raquel Oliveira %A Praes,Elisa Cordeiro %A Ferreguetti,Kícila %A Vaz,Helena %A Reis,Zilma Silveira Nogueira %A Ribeiro,Leonardo Bonisson %A Ribeiro,Antonio Luiz Pinho %A Pedroso,Thais Marques %A Beleigoli,Alline %A Oliveira,Clara Rodrigues Alves %A Marcolino,Milena Soriano %+ Telehealth Center, University Hospital, Universidade Federal de Minas Gerais, Avenida Professor Alfredo Balena, 110 1o andar Sala 107 Ala Sul, Belo Horizonte, 30130-100, Brazil, 55 31 3307 9201, milenamarc@gmail.com %K user experience %K chatbots %K telehealth %K COVID-19 %K human-computer interaction %K HCI %K empirical studies in human-computer interaction %K empirical studies in HCI %K health care information systems %D 2023 %7 3.4.2023 %9 Original Paper %J JMIR Hum Factors %G English %X Background: The potential of chatbots for screening and monitoring COVID-19 was envisioned since the outbreak of the disease. Chatbots can help disseminate up-to-date and trustworthy information, promote healthy social behavior, and support the provision of health care services safely and at scale. In this scenario and in view of its far-reaching postpandemic impact, it is important to evaluate user experience with this kind of application. Objective: We aimed to evaluate the quality of user experience with a COVID-19 chatbot designed by a large telehealth service in Brazil, focusing on the usability of real users and the exploration of strengths and shortcomings of the chatbot, as revealed in reports by participants in simulated scenarios. Methods: We examined a chatbot developed by a multidisciplinary team and used it as a component within the workflow of a local public health care service. The chatbot had 2 core functionalities: assisting web-based screening of COVID-19 symptom severity and providing evidence-based information to the population. From October 2020 to January 2021, we conducted a mixed methods approach and performed a 2-fold evaluation of user experience with our chatbot by following 2 methods: a posttask usability Likert-scale survey presented to all users after concluding their interaction with the bot and an interview with volunteer participants who engaged in a simulated interaction with the bot guided by the interviewer. Results: Usability assessment with 63 users revealed very good scores for chatbot usefulness (4.57), likelihood of being recommended (4.48), ease of use (4.44), and user satisfaction (4.38). Interviews with 15 volunteers provided insights into the strengths and shortcomings of our bot. Comments on the positive aspects and problems reported by users were analyzed in terms of recurrent themes. We identified 6 positive aspects and 15 issues organized in 2 categories: usability of the chatbot and health support offered by it, the former referring to usability of the chatbot and how users can interact with it and the latter referring to the chatbot’s goal in supporting people during the pandemic through the screening process and education to users through informative content. We found 6 themes accounting for what people liked most about our chatbot and why they found it useful—3 themes pertaining to the usability domain and 3 themes regarding health support. Our findings also identified 15 types of problems producing a negative impact on users—10 of them related to the usability of the chatbot and 5 related to the health support it provides. Conclusions: Our results indicate that users had an overall positive experience with the chatbot and found the health support relevant. Nonetheless, qualitative evaluation of the chatbot indicated challenges and directions to be pursued in improving not only our COVID-19 chatbot but also health chatbots in general. %M 36634267 %R 10.2196/43135 %U https://humanfactors.jmir.org/2023/1/e43135 %U https://doi.org/10.2196/43135 %U http://www.ncbi.nlm.nih.gov/pubmed/36634267 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e44248 %T Artificial Intelligence for the Prediction and Early Diagnosis of Pancreatic Cancer: Scoping Review %A Jan,Zainab %A El Assadi,Farah %A Abd-alrazaq,Alaa %A Jithesh,Puthen Veettil %+ College of Health & Life Sciences, Hamad Bin Khalifa University, Penrose House, Education City, Doha, 34110, Qatar, 974 44547438, jveettil@hbku.edu.qa %K artificial Intelligence %K pancreatic cancer %K diagnosis %K diagnostic %K prediction %K machine learning %K deep learning %K scoping %K review method %K predict %K cancer %K oncology %K pancreatic %K algorithm %D 2023 %7 31.3.2023 %9 Review %J J Med Internet Res %G English %X Background: Pancreatic cancer is the 12th most common cancer worldwide, with an overall survival rate of 4.9%. Early diagnosis of pancreatic cancer is essential for timely treatment and survival. Artificial intelligence (AI) provides advanced models and algorithms for better diagnosis of pancreatic cancer. Objective: This study aims to explore AI models used for the prediction and early diagnosis of pancreatic cancers as reported in the literature. Methods: A scoping review was conducted and reported in line with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. PubMed, Google Scholar, Science Direct, BioRXiv, and MedRxiv were explored to identify relevant articles. Study selection and data extraction were independently conducted by 2 reviewers. Data extracted from the included studies were synthesized narratively. Results: Of the 1185 publications, 30 studies were included in the scoping review. The included articles reported the use of AI for 6 different purposes. Of these included articles, AI techniques were mostly used for the diagnosis of pancreatic cancer (14/30, 47%). Radiological images (14/30, 47%) were the most frequently used data in the included articles. Most of the included articles used data sets with a size of <1000 samples (11/30, 37%). Deep learning models were the most prominent branch of AI used for pancreatic cancer diagnosis in the studies, and the convolutional neural network was the most used algorithm (18/30, 60%). Six validation approaches were used in the included studies, of which the most frequently used approaches were k-fold cross-validation (10/30, 33%) and external validation (10/30, 33%). A higher level of accuracy (99%) was found in studies that used support vector machine, decision trees, and k-means clustering algorithms. Conclusions: This review presents an overview of studies based on AI models and algorithms used to predict and diagnose pancreatic cancer patients. AI is expected to play a vital role in advancing pancreatic cancer prediction and diagnosis. Further research is required to provide data that support clinical decisions in health care. %M 37000507 %R 10.2196/44248 %U https://www.jmir.org/2023/1/e44248 %U https://doi.org/10.2196/44248 %U http://www.ncbi.nlm.nih.gov/pubmed/37000507 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e39972 %T Machine Learning Approaches for Predicting Psoriatic Arthritis Risk Using Electronic Medical Records: Population-Based Study %A Lee,Leon Tsung-Ju %A Yang,Hsuan-Chia %A Nguyen,Phung Anh %A Muhtar,Muhammad Solihuddin %A Li,Yu-Chuan Jack %+ Department of Dermatology, Taipei Municipal Wanfang Hospital, Taipei Medical University, No 111, Section 3, Hsing-Long Rd, Taipei, 116, Taiwan, 886 02 2930 7930, jack@tmu.edu.tw %K convolutional neural network %K deep learning, machine learning %K prediction model %K psoriasis %K psoriatic arthritis %K temporal phenomic map %K electronic medical records %D 2023 %7 28.3.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Psoriasis (PsO) is a chronic, systemic, immune-mediated disease with multiorgan involvement. Psoriatic arthritis (PsA) is an inflammatory arthritis that is present in 6%-42% of patients with PsO. Approximately 15% of patients with PsO have undiagnosed PsA. Predicting patients with a risk of PsA is crucial for providing them with early examination and treatment that can prevent irreversible disease progression and function loss. Objective: The aim of this study was to develop and validate a prediction model for PsA based on chronological large-scale and multidimensional electronic medical records using a machine learning algorithm. Methods: This case-control study used Taiwan’s National Health Insurance Research Database from January 1, 1999, to December 31, 2013. The original data set was split into training and holdout data sets in an 80:20 ratio. A convolutional neural network was used to develop a prediction model. This model used 2.5-year diagnostic and medical records (inpatient and outpatient) with temporal-sequential information to predict the risk of PsA for a given patient within the next 6 months. The model was developed and cross-validated using the training data and was tested using the holdout data. An occlusion sensitivity analysis was performed to identify the important features of the model. Results: The prediction model included a total of 443 patients with PsA with earlier diagnosis of PsO and 1772 patients with PsO without PsA for the control group. The 6-month PsA risk prediction model that uses sequential diagnostic and drug prescription information as a temporal phenomic map yielded an area under the receiver operating characteristic curve of 0.70 (95% CI 0.559-0.833), a mean sensitivity of 0.80 (SD 0.11), a mean specificity of 0.60 (SD 0.04), and a mean negative predictive value of 0.93 (SD 0.04). Conclusions: The findings of this study suggest that the risk prediction model can identify patients with PsO at a high risk of PsA. This model may help health care professionals to prioritize treatment for target high-risk populations and prevent irreversible disease progression and functional loss. %M 36976633 %R 10.2196/39972 %U https://www.jmir.org/2023/1/e39972 %U https://doi.org/10.2196/39972 %U http://www.ncbi.nlm.nih.gov/pubmed/36976633 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 10 %N %P e44325 %T Predicting Generalized Anxiety Disorder From Impromptu Speech Transcripts Using Context-Aware Transformer-Based Neural Networks: Model Evaluation Study %A Teferra,Bazen Gashaw %A Rose,Jonathan %+ The Edward S Rogers Sr Department of Electrical and Computer Engineering, University of Toronto, 10 King’s College Road, Toronto, ON, M5S3G4, Canada, 1 4169786992, bazen.teferra@mail.utoronto.ca %K mental health %K generalized anxiety disorder %K impromptu speech %K linguistic features %K anxiety prediction %K neural networks %K natural language processing %K transformer models %K mobile phone %D 2023 %7 28.3.2023 %9 Original Paper %J JMIR Ment Health %G English %X Background: The ability to automatically detect anxiety disorders from speech could be useful as a screening tool for an anxiety disorder. Prior studies have shown that individual words in textual transcripts of speech have an association with anxiety severity. Transformer-based neural networks are models that have been recently shown to have powerful predictive capabilities based on the context of more than one input word. Transformers detect linguistic patterns and can be separately trained to make specific predictions based on these patterns. Objective: This study aimed to determine whether a transformer-based language model can be used to screen for generalized anxiety disorder from impromptu speech transcripts. Methods: A total of 2000 participants provided an impromptu speech sample in response to a modified version of the Trier Social Stress Test (TSST). They also completed the Generalized Anxiety Disorder 7-item (GAD-7) scale. A transformer-based neural network model (pretrained on large textual corpora) was fine-tuned on the speech transcripts and the GAD-7 to predict whether a participant was above or below a screening threshold of the GAD-7. We reported the area under the receiver operating characteristic curve (AUROC) on the test data and compared the results with a baseline logistic regression model using the Linguistic Inquiry and Word Count (LIWC) features as input. Using the integrated gradient method to determine specific words that strongly affect the predictions, we inferred specific linguistic patterns that influence the predictions. Results: The baseline LIWC-based logistic regression model had an AUROC value of 0.58. The fine-tuned transformer model achieved an AUROC value of 0.64. Specific words that were often implicated in the predictions were also dependent on the context. For example, the first-person singular pronoun “I” influenced toward an anxious prediction 88% of the time and a nonanxious prediction 12% of the time, depending on the context. Silent pauses in speech, also often implicated in predictions, influenced toward an anxious prediction 20% of the time and a nonanxious prediction 80% of the time. Conclusions: There is evidence that a transformer-based neural network model has increased predictive power compared with the single word–based LIWC model. We also showed that the use of specific words in a specific context—a linguistic pattern—is part of the reason for the better prediction. This suggests that such transformer-based models could play a useful role in anxiety screening systems. %M 36976636 %R 10.2196/44325 %U https://mental.jmir.org/2023/1/e44325 %U https://doi.org/10.2196/44325 %U http://www.ncbi.nlm.nih.gov/pubmed/36976636 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e42683 %T Predicting Social Determinants of Health in Patient Navigation: Case Study %A Iacobelli,Francisco %A Yang,Anna %A Tom,Laura %A Leung,Ivy S %A Crissman,John %A Salgado,Rufino %A Simon,Melissa %+ Department of Computer Science, Northeastern Illinois University, 5500 N. St. Louis Ave., Chicago, IL, 60625, United States, 1 7734424728, fdiacobe@neiu.edu %K patient navigation %K machine learning %K social determinants of health %K health care disparities %K health equity %K case study %D 2023 %7 28.3.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Patient navigation (PN) programs have demonstrated efficacy in improving health outcomes for marginalized populations across a range of clinical contexts by addressing barriers to health care, including social determinants of health (SDoHs). However, it can be challenging for navigators to identify SDoHs by asking patients directly because of many factors, including patients’ reluctance to disclose information, communication barriers, and the variable resources and experience levels of patient navigators. Navigators could benefit from strategies that augment their ability to gather SDoH data. Machine learning can be leveraged as one of these strategies to identify SDoH-related barriers. This could further improve health outcomes, particularly in underserved populations. Objective: In this formative study, we explored novel machine learning–based approaches to predict SDoHs in 2 Chicago area PN studies. In the first approach, we applied machine learning to data that include comments and interaction details between patients and navigators, whereas the second approach augmented patients’ demographic information. This paper presents the results of these experiments and provides recommendations for data collection and the application of machine learning techniques more generally to the problem of predicting SDoHs. Methods: We conducted 2 experiments to explore the feasibility of using machine learning to predict patients’ SDoHs using data collected from PN research. The machine learning algorithms were trained on data collected from 2 Chicago area PN studies. In the first experiment, we compared several machine learning algorithms (logistic regression, random forest, support vector machine, artificial neural network, and Gaussian naive Bayes) to predict SDoHs from both patient demographics and navigator’s encounter data over time. In the second experiment, we used multiclass classification with augmented information, such as transportation time to a hospital, to predict multiple SDoHs for each patient. Results: In the first experiment, the random forest classifier achieved the highest accuracy among the classifiers tested. The overall accuracy to predict SDoHs was 71.3%. In the second experiment, multiclass classification effectively predicted a few patients’ SDoHs based purely on demographic and augmented data. The best accuracy of these predictions overall was 73%. However, both experiments yielded high variability in individual SDoH predictions and correlations that become salient among SDoHs. Conclusions: To our knowledge, this study is the first approach to applying PN encounter data and multiclass learning algorithms to predict SDoHs. The experiments discussed yielded valuable lessons, including the awareness of model limitations and bias, planning for standardization of data sources and measurement, and the need to identify and anticipate the intersectionality and clustering of SDoHs. Although our focus was on predicting patients’ SDoHs, machine learning can have a broad range of applications in the field of PN, from tailoring intervention delivery (eg, supporting PN decision-making) to informing resource allocation for measurement, and PN supervision. %M 36976634 %R 10.2196/42683 %U https://formative.jmir.org/2023/1/e42683 %U https://doi.org/10.2196/42683 %U http://www.ncbi.nlm.nih.gov/pubmed/36976634 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e45721 %T Prediction Models for Sleep Quality Among College Students During the COVID-19 Outbreak: Cross-sectional Study Based on the Internet New Media %A Zheng,Wanyu %A Chen,Qingquan %A Yao,Ling %A Zhuang,Jiajing %A Huang,Jiewei %A Hu,Yiming %A Yu,Shaoyang %A Chen,Tebin %A Wei,Nan %A Zeng,Yifu %A Zhang,Yixiang %A Fan,Chunmei %A Wang,Youjuan %+ The Second Affiliated Hospital of Fujian Medical University, No. 34, Zhongshan North Road, Licheng District, Quanzhou City, Fujian Province, P. R. China, Quanzhou, 362018, China, 86 13055603250, youjuan@fyey4.wecom.work %K artificial neural network %K college students %K COVID-19 %K internet new media %K logistic regression %K machine learning %K sleep quality %D 2023 %7 24.3.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: COVID-19 has been reported to affect the sleep quality of Chinese residents; however, the epidemic’s effects on the sleep quality of college students during closed-loop management remain unclear, and a screening tool is lacking. Objective: This study aimed to understand the sleep quality of college students in Fujian Province during the epidemic and determine sensitive variables, in order to develop an efficient prediction model for the early screening of sleep problems in college students. Methods: From April 5 to 16, 2022, a cross-sectional internet-based survey was conducted. The Pittsburgh Sleep Quality Index (PSQI) scale, a self-designed general data questionnaire, and the sleep quality influencing factor questionnaire were used to understand the sleep quality of respondents in the previous month. A chi-square test and a multivariate unconditioned logistic regression analysis were performed, and influencing factors obtained were applied to develop prediction models. The data were divided into a training-testing set (n=14,451, 70%) and an independent validation set (n=6194, 30%) by stratified sampling. Four models using logistic regression, an artificial neural network, random forest, and naïve Bayes were developed and validated. Results: In total, 20,645 subjects were included in this survey, with a mean global PSQI score of 6.02 (SD 3.112). The sleep disturbance rate was 28.9% (n=5972, defined as a global PSQI score >7 points). A total of 11 variables related to sleep quality were taken as parameters of the prediction models, including age, gender, residence, specialty, respiratory history, coffee consumption, stay up, long hours on the internet, sudden changes, fears of infection, and impatient closed-loop management. Among the generated models, the artificial neural network model proved to be the best, with an area under curve, accuracy, sensitivity, specificity, positive predictive value, and negative predictive value of 0.713, 73.52%, 25.51%, 92.58%, 57.71%, and 75.79%, respectively. It is noteworthy that the logistic regression, random forest, and naive Bayes models achieved high specificities of 94.41%, 94.77%, and 86.40%, respectively. Conclusions: The COVID-19 containment measures affected the sleep quality of college students on multiple levels, indicating that it is desiderate to provide targeted university management and social support. The artificial neural network model has presented excellent predictive efficiency and is favorable for implementing measures earlier in order to improve present conditions. %M 36961495 %R 10.2196/45721 %U https://www.jmir.org/2023/1/e45721 %U https://doi.org/10.2196/45721 %U http://www.ncbi.nlm.nih.gov/pubmed/36961495 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e43251 %T Human-Centered Design to Address Biases in Artificial Intelligence %A Chen,You %A Clayton,Ellen Wright %A Novak,Laurie Lovett %A Anders,Shilo %A Malin,Bradley %+ Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End Ave, Nashville, TN, 37203, United States, 1 6153431939, you.chen@vanderbilt.edu %K artificial intelligence %K human-centered AI %K biases %K AI %K care %K biomedical %K research %K application %K human-centered %K development %K design %K patient %K health %K benefits %D 2023 %7 24.3.2023 %9 Viewpoint %J J Med Internet Res %G English %X The potential of artificial intelligence (AI) to reduce health care disparities and inequities is recognized, but it can also exacerbate these issues if not implemented in an equitable manner. This perspective identifies potential biases in each stage of the AI life cycle, including data collection, annotation, machine learning model development, evaluation, deployment, operationalization, monitoring, and feedback integration. To mitigate these biases, we suggest involving a diverse group of stakeholders, using human-centered AI principles. Human-centered AI can help ensure that AI systems are designed and used in a way that benefits patients and society, which can reduce health disparities and inequities. By recognizing and addressing biases at each stage of the AI life cycle, AI can achieve its potential in health care. %M 36961506 %R 10.2196/43251 %U https://www.jmir.org/2023/1/e43251 %U https://doi.org/10.2196/43251 %U http://www.ncbi.nlm.nih.gov/pubmed/36961506 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e45456 %T Acoustic Analysis of Speech for Screening for Suicide Risk: Machine Learning Classifiers for Between- and Within-Person Evaluation of Suicidality %A Min,Sooyeon %A Shin,Daun %A Rhee,Sang Jin %A Park,C Hyung Keun %A Yang,Jeong Hun %A Song,Yoojin %A Kim,Min Ji %A Kim,Kyungdo %A Cho,Won Ik %A Kwon,Oh Chul %A Ahn,Yong Min %A Lee,Hyunju %+ Department of Neuropsychiatry, Seoul National University Hospital, 101 Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea, 82 2 2072 2458, wandy04@naver.com %K suicide %K voice analysis %K mood disorder %K artificial intelligence %K screening %K prediction %K digital health tool %D 2023 %7 23.3.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Assessing a patient’s suicide risk is challenging for health professionals because it depends on voluntary disclosure by the patient and often has limited resources. The application of novel machine learning approaches to determine suicide risk has clinical utility. Objective: This study aimed to investigate cross-sectional and longitudinal approaches to assess suicidality based on acoustic voice features of psychiatric patients using artificial intelligence. Methods: We collected 348 voice recordings during clinical interviews of 104 patients diagnosed with mood disorders at baseline and 2, 4, 8, and 12 months after recruitment. Suicidality was assessed using the Beck Scale for Suicidal Ideation and suicidal behavior using the Columbia Suicide Severity Rating Scale. The acoustic features of the voice, including temporal, formal, and spectral features, were extracted from the recordings. A between-person classification model that examines the vocal characteristics of individuals cross sectionally to detect individuals at high risk for suicide and a within-person classification model that detects considerable worsening of suicidality based on changes in acoustic features within an individual were developed and compared. Internal validation was performed using 10-fold cross validation of audio data from baseline to 2-month and external validation was performed using data from 2 to 4 months. Results: A combined set of 12 acoustic features and 3 demographic variables (age, sex, and past suicide attempts) were included in the single-layer artificial neural network for the between-person classification model. Furthermore, 13 acoustic features were included in the extreme gradient boosting machine learning algorithm for the within-person model. The between-person classifier was able to detect high suicidality with 69% accuracy (sensitivity 74%, specificity 62%, area under the receiver operating characteristic curve 0.62), whereas the within-person model was able to predict worsening suicidality over 2 months with 79% accuracy (sensitivity 68%, specificity 84%, area under receiver operating characteristic curve 0.67). The second model showed 62% accuracy in predicting increased suicidality in external sets. Conclusions: Within-person analysis using changes in acoustic features within an individual is a promising approach to detect increased suicidality. Automated analysis of voice can be used to support the real-time assessment of suicide risk in primary care or telemedicine. %M 36951913 %R 10.2196/45456 %U https://www.jmir.org/2023/1/e45456 %U https://doi.org/10.2196/45456 %U http://www.ncbi.nlm.nih.gov/pubmed/36951913 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e45631 %T Understanding Prospective Physicians’ Intention to Use Artificial Intelligence in Their Future Medical Practice: Configurational Analysis %A Wagner,Gerit %A Raymond,Louis %A Paré,Guy %+ Faculty Information Systems and Applied Computer Sciences, University of Bamberg, An der Weberei 5, Bamberg, 96047, Germany, 49 0951863 ext 27834, gerit.wagner@uni-bamberg.de %K artificial intelligence %K medical education %K attitudes and beliefs %K knowledge and experience %K behavioral intentions %K fuzzy-set qualitative comparative analysis %K fsQCA %D 2023 %7 22.3.2023 %9 Original Paper %J JMIR Med Educ %G English %X Background: Prospective physicians are expected to find artificial intelligence (AI) to be a key technology in their future practice. This transformative change has caught the attention of scientists, educators, and policy makers alike, with substantive efforts dedicated to the selection and delivery of AI topics and competencies in the medical curriculum. Less is known about the behavioral perspective or the necessary and sufficient preconditions for medical students’ intention to use AI in the first place. Objective: Our study focused on medical students’ knowledge, experience, attitude, and beliefs related to AI and aimed to understand whether they are necessary conditions and form sufficient configurations of conditions associated with behavioral intentions to use AI in their future medical practice. Methods: We administered a 2-staged questionnaire operationalizing the variables of interest (ie, knowledge, experience, attitude, and beliefs related to AI, as well as intention to use AI) and recorded 184 responses at t0 (February 2020, before the COVID-19 pandemic) and 138 responses at t1 (January 2021, during the COVID-19 pandemic). Following established guidelines, we applied necessary condition analysis and fuzzy-set qualitative comparative analysis to analyze the data. Results: Findings from the fuzzy-set qualitative comparative analysis show that the intention to use AI is only observed when students have a strong belief in the role of AI (individually necessary condition); certain AI profiles, that is, combinations of knowledge and experience, attitudes and beliefs, and academic level and gender, are always associated with high intentions to use AI (equifinal and sufficient configurations); and profiles associated with nonhigh intentions cannot be inferred from profiles associated with high intentions (causal asymmetry). Conclusions: Our work contributes to prior knowledge by showing that a strong belief in the role of AI in the future of medical professions is a necessary condition for behavioral intentions to use AI. Moreover, we suggest that the preparation of medical students should go beyond teaching AI competencies and that educators need to account for the different AI profiles associated with high or nonhigh intentions to adopt AI. %M 36947121 %R 10.2196/45631 %U https://mededu.jmir.org/2023/1/e45631 %U https://doi.org/10.2196/45631 %U http://www.ncbi.nlm.nih.gov/pubmed/36947121 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 11 %N %P e43847 %T A Standardized Clinical Data Harmonization Pipeline for Scalable AI Application Deployment (FHIR-DHP): Validation and Usability Study %A Williams,Elena %A Kienast,Manuel %A Medawar,Evelyn %A Reinelt,Janis %A Merola,Alberto %A Klopfenstein,Sophie Anne Ines %A Flint,Anne Rike %A Heeren,Patrick %A Poncette,Akira-Sebastian %A Balzer,Felix %A Beimes,Julian %A von Bünau,Paul %A Chromik,Jonas %A Arnrich,Bert %A Scherf,Nico %A Niehaus,Sebastian %+ AICURA Medical GmbH, Bessemerstr 22, Berlin, 12103, Germany, 49 173 9449677, evelyn.medawar@aicura-medical.com %K data interoperability %K fast healthcare interoperability resources %K FHIR %K data standardization pipeline %K medical information mart for intensive care %K MIMIC IV %K artificial intelligence %K AI application %K AI %K deployment %K data %K usability %K care unit %K diagnosis %K cooperation %K patient care %K care %K medical research %D 2023 %7 21.3.2023 %9 Original Paper %J JMIR Med Inform %G English %X Background: Increasing digitalization in the medical domain gives rise to large amounts of health care data, which has the potential to expand clinical knowledge and transform patient care if leveraged through artificial intelligence (AI). Yet, big data and AI oftentimes cannot unlock their full potential at scale, owing to nonstandardized data formats, lack of technical and semantic data interoperability, and limited cooperation between stakeholders in the health care system. Despite the existence of standardized data formats for the medical domain, such as Fast Healthcare Interoperability Resources (FHIR), their prevalence and usability for AI remain limited. Objective: In this paper, we developed a data harmonization pipeline (DHP) for clinical data sets relying on the common FHIR data standard. Methods: We validated the performance and usability of our FHIR-DHP with data from the Medical Information Mart for Intensive Care IV database. Results: We present the FHIR-DHP workflow in respect of the transformation of “raw” hospital records into a harmonized, AI-friendly data representation. The pipeline consists of the following 5 key preprocessing steps: querying of data from hospital database, FHIR mapping, syntactic validation, transfer of harmonized data into the patient-model database, and export of data in an AI-friendly format for further medical applications. A detailed example of FHIR-DHP execution was presented for clinical diagnoses records. Conclusions: Our approach enables the scalable and needs-driven data modeling of large and heterogenous clinical data sets. The FHIR-DHP is a pivotal step toward increasing cooperation, interoperability, and quality of patient care in the clinical routine and for medical research. %M 36943344 %R 10.2196/43847 %U https://medinform.jmir.org/2023/1/e43847 %U https://doi.org/10.2196/43847 %U http://www.ncbi.nlm.nih.gov/pubmed/36943344 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e42639 %T Computerization of the Work of General Practitioners: Mixed Methods Survey of Final-Year Medical Students in Ireland %A Blease,Charlotte %A Kharko,Anna %A Bernstein,Michael %A Bradley,Colin %A Houston,Muiris %A Walsh,Ian %A D Mandl,Kenneth %+ General Medicine and Primary Care, Beth Israel Deaconess Medical Center, 330 Brookline Ave, Boston, MA, 02215, United States, 1 6173201281, charlotteblease@gmail.com %K medical students %K medical education %K general practitioners %K artificial intelligence %K machine learning %K digital health %K technology %K tool %K medical professional %K biomedical %K design %K survey %K COVID-19 %D 2023 %7 20.3.2023 %9 Original Paper %J JMIR Med Educ %G English %X Background: The potential for digital health technologies, including machine learning (ML)–enabled tools, to disrupt the medical profession is the subject of ongoing debate within biomedical informatics. Objective: We aimed to describe the opinions of final-year medical students in Ireland regarding the potential of future technology to replace or work alongside general practitioners (GPs) in performing key tasks. Methods: Between March 2019 and April 2020, using a convenience sample, we conducted a mixed methods paper-based survey of final-year medical students. The survey was administered at 4 out of 7 medical schools in Ireland across each of the 4 provinces in the country. Quantitative data were analyzed using descriptive statistics and nonparametric tests. We used thematic content analysis to investigate free-text responses. Results: In total, 43.1% (252/585) of the final-year students at 3 medical schools responded, and data collection at 1 medical school was terminated due to disruptions associated with the COVID-19 pandemic. With regard to forecasting the potential impact of artificial intelligence (AI)/ML on primary care 25 years from now, around half (127/246, 51.6%) of all surveyed students believed the work of GPs will change minimally or not at all. Notably, students who did not intend to enter primary care predicted that AI/ML will have a great impact on the work of GPs. Conclusions: We caution that without a firm curricular foundation on advances in AI/ML, students may rely on extreme perspectives involving self-preserving optimism biases that demote the impact of advances in technology on primary care on the one hand and technohype on the other. Ultimately, these biases may lead to negative consequences in health care. Improvements in medical education could help prepare tomorrow’s doctors to optimize and lead the ethical and evidence-based implementation of AI/ML-enabled tools in medicine for enhancing the care of tomorrow’s patients. %M 36939809 %R 10.2196/42639 %U https://mededu.jmir.org/2023/1/e42639 %U https://doi.org/10.2196/42639 %U http://www.ncbi.nlm.nih.gov/pubmed/36939809 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e41516 %T Calibrating a Transformer-Based Model’s Confidence on Community-Engaged Research Studies: Decision Support Evaluation Study %A Ferrell,Brian %A Raskin,Sarah E %A Zimmerman,Emily B %+ Virginia Commonwealth University, 907 Floyd Ave., Richmond, VA, 23284, United States, 1 8043505426, ferrellbj@vcu.edu %K explainable artificial intelligence %K XAI %K Bidirectional Encoder Representations From Transformers %K BERT %K transformer-based models %K text classification %K community engagement %K community-engaged research %K deep learning %K decision support %K trust %K confidence %D 2023 %7 20.3.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Deep learning offers great benefits in classification tasks such as medical imaging diagnostics or stock trading, especially when compared with human-level performances, and can be a viable option for classifying distinct levels within community-engaged research (CEnR). CEnR is a collaborative approach between academics and community partners with the aim of conducting research that is relevant to community needs while incorporating diverse forms of expertise. In the field of deep learning and artificial intelligence (AI), training multiple models to obtain the highest validation accuracy is common practice; however, it can overfit toward that specific data set and not generalize well to a real-world population, which creates issues of bias and potentially dangerous algorithmic decisions. Consequently, if we plan on automating human decision-making, there is a need for creating techniques and exhaustive evaluative processes for these powerful unexplainable models to ensure that we do not incorporate and blindly trust poor AI models to make real-world decisions. Objective: We aimed to conduct an evaluation study to see whether our most accurate transformer-based models derived from previous studies could emulate our own classification spectrum for tracking CEnR studies as well as whether the use of calibrated confidence scores was meaningful. Methods: We compared the results from 3 domain experts, who classified a sample of 45 studies derived from our university’s institutional review board database, with those from 3 previously trained transformer-based models, as well as investigated whether calibrated confidence scores can be a viable technique for using AI in a support role for complex decision-making systems. Results: Our findings reveal that certain models exhibit an overestimation of their performance through high confidence scores, despite not achieving the highest validation accuracy. Conclusions: Future studies should be conducted with larger sample sizes to generalize the results more effectively. Although our study addresses the concerns of bias and overfitting in deep learning models, there is a need to further explore methods that allow domain experts to trust our models more. The use of a calibrated confidence score can be a misleading metric when determining our AI model’s level of competency. %M 36939830 %R 10.2196/41516 %U https://formative.jmir.org/2023/1/e41516 %U https://doi.org/10.2196/41516 %U http://www.ncbi.nlm.nih.gov/pubmed/36939830 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e44932 %T Artificial Intelligence–Based Psoriasis Severity Assessment: Real-world Study and Application %A Huang,Kai %A Wu,Xian %A Li,Yixin %A Lv,Chengzhi %A Yan,Yangtian %A Wu,Zhe %A Zhang,Mi %A Huang,Weihong %A Jiang,Zixi %A Hu,Kun %A Li,Mingjia %A Su,Juan %A Zhu,Wu %A Li,Fangfang %A Chen,Mingliang %A Chen,Jing %A Li,Yongjian %A Zeng,Mei %A Zhu,Jianjian %A Cao,Duling %A Huang,Xing %A Huang,Lei %A Hu,Xing %A Chen,Zeyu %A Kang,Jian %A Yuan,Lei %A Huang,Chengji %A Guo,Rui %A Navarini,Alexander %A Kuang,Yehong %A Chen,Xiang %A Zhao,Shuang %+ Department of Dermatology, Xiangya Hospital, Central South University, 87 Xiangya Road, Kaifu District, Changsha, Hunan, 410005, China, 86 13808485224, shuangxy@csu.edu.cn %K artificial intelligence %K psoriasis severity assessment %K Psoriasis Area and Severity Index %K PASI %K deep learning system %K mobile app %K psoriasis %K inflammation %K dermatology %K tools %K management %K model %K design %K users %K chronic disease %D 2023 %7 16.3.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Psoriasis is one of the most frequent inflammatory skin conditions and could be treated via tele-dermatology, provided that the current lack of reliable tools for objective severity assessments is overcome. Psoriasis Area and Severity Index (PASI) has a prominent level of subjectivity and is rarely used in real practice, although it is the most widely accepted metric for measuring psoriasis severity currently. Objective: This study aimed to develop an image–artificial intelligence (AI)–based validated system for severity assessment with the explicit intention of facilitating long-term management of patients with psoriasis. Methods: A deep learning system was trained to estimate the PASI score by using 14,096 images from 2367 patients with psoriasis. We used 1962 patients from January 2015 to April 2021 to train the model and the other 405 patients from May 2021 to July 2021 to validate it. A multiview feature enhancement block was designed to combine vision features from different perspectives to better simulate the visual diagnostic method in clinical practice. A classification header along with a regression header was simultaneously applied to generate PASI scores, and an extra cross-teacher header after these 2 headers was designed to revise their output. The mean average error (MAE) was used as the metric to evaluate the accuracy of the predicted PASI score. By making the model minimize the MAE value, the model becomes closer to the target value. Then, the proposed model was compared with 43 experienced dermatologists. Finally, the proposed model was deployed into an app named SkinTeller on the WeChat platform. Results: The proposed image-AI–based PASI-estimating model outperformed the average performance of 43 experienced dermatologists with a 33.2% performance gain in the overall PASI score. The model achieved the smallest MAE of 2.05 at 3 input images by the ablation experiment. In other words, for the task of psoriasis severity assessment, the severity score predicted by our model was close to the PASI score diagnosed by experienced dermatologists. The SkinTeller app has been used 3369 times for PASI scoring in 1497 patients from 18 hospitals, and its excellent performance was confirmed by a feedback survey of 43 dermatologist users. Conclusions: An image-AI–based psoriasis severity assessment model has been proposed to automatically calculate PASI scores in an efficient, objective, and accurate manner. The SkinTeller app may be a promising alternative for dermatologists’ accurate assessment in the real world and chronic disease self-management in patients with psoriasis. %M 36927843 %R 10.2196/44932 %U https://www.jmir.org/2023/1/e44932 %U https://doi.org/10.2196/44932 %U http://www.ncbi.nlm.nih.gov/pubmed/36927843 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e43110 %T What Does DALL-E 2 Know About Radiology? %A Adams,Lisa C %A Busch,Felix %A Truhn,Daniel %A Makowski,Marcus R %A Aerts,Hugo J W L %A Bressem,Keno K %+ Department of Radiology, Charité Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt Universität zu Berlin, Hindenburgdamm 30, Berlin, 12203, Germany, 49 30 450 527792, keno-kyrill.bressem@charite.de %K DALL-E %K creating images from text %K image creation %K image generation %K transformer language model %K machine learning %K generative model %K radiology %K x-ray %K artificial intelligence %K medical imaging %K text-to-image %K diagnostic imaging %D 2023 %7 16.3.2023 %9 Viewpoint %J J Med Internet Res %G English %X Generative models, such as DALL-E 2 (OpenAI), could represent promising future tools for image generation, augmentation, and manipulation for artificial intelligence research in radiology, provided that these models have sufficient medical domain knowledge. Herein, we show that DALL-E 2 has learned relevant representations of x-ray images, with promising capabilities in terms of zero-shot text-to-image generation of new images, the continuation of an image beyond its original boundaries, and the removal of elements; however, its capabilities for the generation of images with pathological abnormalities (eg, tumors, fractures, and inflammation) or computed tomography, magnetic resonance imaging, or ultrasound images are still limited. The use of generative models for augmenting and generating radiological data thus seems feasible, even if the further fine-tuning and adaptation of these models to their respective domains are required first. %M 36927634 %R 10.2196/43110 %U https://www.jmir.org/2023/1/e43110 %U https://doi.org/10.2196/43110 %U http://www.ncbi.nlm.nih.gov/pubmed/36927634 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e42435 %T Development and Assessment of Assisted Diagnosis Models Using Machine Learning for Identifying Elderly Patients With Malnutrition: Cohort Study %A Wang,Xue %A Yang,Fengchun %A Zhu,Mingwei %A Cui,Hongyuan %A Wei,Junmin %A Li,Jiao %A Chen,Wei %+ Department of Clinical Nutrition, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No1 Shuaifuyuan Wangfujing Dongcheng District, Beijing, 100730, China, 86 69154095, txchenwei@sina.com %K disease-related malnutrition %K global leadership initiative on malnutrition %K GLIM %K older inpatients %K machine learning %K Shapley additive explanation %K SHAP %K malnutrition %K nutrition %K older adult %K elder %K XGBoost %K model %K diagnose %K diagnosis %K diagnostic %K visualization %K risk %K algorithm %D 2023 %7 14.3.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Older patients are at an increased risk of malnutrition due to many factors related to poor clinical outcomes. Objective: This study aims to develop an assisted diagnosis model using machine learning (ML) for identifying older patients with malnutrition and providing the focus of individualized treatment. Methods: We reanalyzed a multicenter, observational cohort study including 2660 older patients. Baseline malnutrition was defined using the global leadership initiative on malnutrition (GLIM) criteria, and the study population was randomly divided into a derivation group (2128/2660, 80%) and a validation group (532/2660, 20%). We applied 5 ML algorithms and further explored the relationship between features and the risk of malnutrition by using the Shapley additive explanations visualization method. Results: The proposed ML models were capable to identify older patients with malnutrition. In the external validation cohort, the top 3 models by the area under the receiver operating characteristic curve were light gradient boosting machine (92.1%), extreme gradient boosting (91.9%), and the random forest model (91.5%). Additionally, the analysis of the importance of features revealed that BMI, weight loss, and calf circumference were the strongest predictors to affect GLIM. A BMI of below 21 kg/m2 was associated with a higher risk of GLIM in older people. Conclusions: We developed ML models for assisting diagnosis of malnutrition based on the GLIM criteria. The cutoff values of laboratory tests generated by Shapley additive explanations could provide references for the identification of malnutrition. Trial Registration: Chinese Clinical Trial Registry ChiCTR-EPC-14005253; https://www.chictr.org.cn/showproj.aspx?proj=9542 %M 36917167 %R 10.2196/42435 %U https://www.jmir.org/2023/1/e42435 %U https://doi.org/10.2196/42435 %U http://www.ncbi.nlm.nih.gov/pubmed/36917167 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e40259 %T The Effectiveness of Wearable Devices Using Artificial Intelligence for Blood Glucose Level Forecasting or Prediction: Systematic Review %A Ahmed,Arfan %A Aziz,Sarah %A Abd-alrazaq,Alaa %A Farooq,Faisal %A Househ,Mowafa %A Sheikh,Javaid %+ AI Center for Precision Health, Weill Cornell Medicine-Qatar, Education City, PO Box 24144, Doha, Qatar, 974 44928826, ara4013@qatar-med.cornell.edu %K diabetes %K artificial intelligence %K wearable devices %K machine learning %K blood glucose %K forecasting %K prediction %D 2023 %7 14.3.2023 %9 Review %J J Med Internet Res %G English %X Background: In 2021 alone, diabetes mellitus, a metabolic disorder primarily characterized by abnormally high blood glucose (BG) levels, affected 537 million people globally, and over 6 million deaths were reported. The use of noninvasive technologies, such as wearable devices (WDs), to regulate and monitor BG in people with diabetes is a relatively new concept and yet in its infancy. Noninvasive WDs coupled with machine learning (ML) techniques have the potential to understand and conclude meaningful information from the gathered data and provide clinically meaningful advanced analytics for the purpose of forecasting or prediction. Objective: The purpose of this study is to provide a systematic review complete with a quality assessment looking at diabetes effectiveness of using artificial intelligence (AI) in WDs for forecasting or predicting BG levels. Methods: We searched 7 of the most popular bibliographic databases. Two reviewers performed study selection and data extraction independently before cross-checking the extracted data. A narrative approach was used to synthesize the data. Quality assessment was performed using an adapted version of the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool. Results: From the initial 3872 studies, the features from 12 studies were reported after filtering according to our predefined inclusion criteria. The reference standard in all studies overall (n=11, 92%) was classified as low, as all ground truths were easily replicable. Since the data input to AI technology was highly standardized and there was no effect of flow or time frame on the final output, both factors were categorized in a low-risk group (n=11, 92%). It was observed that classical ML approaches were deployed by half of the studies, the most popular being ensemble-boosted trees (random forest). The most common evaluation metric used was Clarke grid error (n=7, 58%), followed by root mean square error (n=5, 42%). The wide usage of photoplethysmogram and near-infrared sensors was observed on wrist-worn devices. Conclusions: This review has provided the most extensive work to date summarizing WDs that use ML for diabetic-related BG level forecasting or prediction. Although current studies are few, this study suggests that the general quality of the studies was considered high, as revealed by the QUADAS-2 assessment tool. Further validation is needed for commercially available devices, but we envisage that WDs in general have the potential to remove the need for invasive devices completely for glucose monitoring in the not-too-distant future. Trial Registration: PROSPERO CRD42022303175; https://tinyurl.com/3n9jaayc %M 36917147 %R 10.2196/40259 %U https://www.jmir.org/2023/1/e40259 %U https://doi.org/10.2196/40259 %U http://www.ncbi.nlm.nih.gov/pubmed/36917147 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e41430 %T Persuading Patients Using Rhetoric to Improve Artificial Intelligence Adoption: Experimental Study %A Sebastian,Glorin %A George,Amrita %A Jackson Jr,George %+ Georgia State University, Room 1713, 55 Park Place, Atlanta, GA, 30303, United States, 1 4048344213, ageorge12@gsu.edu %K communication strategies %K artificial intelligence adoption %K AI adoption %K privacy concerns %K trust %K technology acceptance %K health IT %D 2023 %7 13.3.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) can transform health care processes with its increasing ability to translate complex structured and unstructured data into actionable clinical decisions. Although it has been established that AI is much more efficient than a clinician, the adoption rate has been slower in health care. Prior studies have pointed out that the lack of trust in AI, privacy concerns, degrees of customer innovativeness, and perceived novelty value influence AI adoption. With the promotion of AI products to patients, the role of rhetoric in influencing these factors has received scant attention. Objective: The main objective of this study was to examine whether communication strategies (ethos, pathos, and logos) are more successful in overcoming factors that hinder AI product adoption among patients. Methods: We conducted experiments in which we manipulated the communication strategy (ethos, pathos, and logos) in promotional ads for an AI product. We collected responses from 150 participants using Amazon Mechanical Turk. Participants were randomly exposed to a specific rhetoric-based advertisement during the experiments. Results: Our results indicate that using communication strategies to promote an AI product affects users’ trust, customer innovativeness, and perceived novelty value, leading to improved product adoption. Pathos-laden promotions improve AI product adoption by nudging users’ trust (n=52; β=.532; P<.001) and perceived novelty value of the product (n=52; β=.517; P=.001). Similarly, ethos-laden promotions improve AI product adoption by nudging customer innovativeness (n=50; β=.465; P<.001). In addition, logos-laden promotions improve AI product adoption by alleviating trust issues (n=48; β=.657; P<.001). Conclusions: Promoting AI products to patients using rhetoric-based advertisements can help overcome factors that hinder AI adoption by assuaging user concerns about using a new AI agent in their care process. %M 36912869 %R 10.2196/41430 %U https://www.jmir.org/2023/1/e41430 %U https://doi.org/10.2196/41430 %U http://www.ncbi.nlm.nih.gov/pubmed/36912869 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e44650 %T Clinical Validation of an Artificial Intelligence–Based Tool for Automatic Estimation of Left Ventricular Ejection Fraction and Strain in Echocardiography: Protocol for a Two-Phase Prospective Cohort Study %A Hadjidimitriou,Stelios %A Pagourelias,Efstathios %A Apostolidis,Georgios %A Dimaridis,Ioannis %A Charisis,Vasileios %A Bakogiannis,Constantinos %A Hadjileontiadis,Leontios %A Vassilikos,Vassilios %+ Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, University Campus, Faculty of Engineering, Building D, 6th Fl., Thessaloniki, GR-54124, Greece, 30 2310996319, stelios.hadjidimitriou@gmail.com %K artificial intelligence %K clinical validation %K computer-aided diagnosis %K echocardiography %K ejection fraction %K global longitudinal strain %K left ventricle %K prospective cohort design %K ultrasound %D 2023 %7 13.3.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: Echocardiography (ECHO) is a type of ultrasonographic procedure for examining the cardiac function and morphology, with functional parameters of the left ventricle (LV), such as the ejection fraction (EF) and global longitudinal strain (GLS), being important indicators. Estimation of LV-EF and LV-GLS is performed either manually or semiautomatically by cardiologists and requires a nonnegligible amount of time, while estimation accuracy depends on scan quality and the clinician’s experience in ECHO, leading to considerable measurement variability. Objective: The aim of this study is to externally validate the clinical performance of a trained artificial intelligence (AI)–based tool that automatically estimates LV-EF and LV-GLS from transthoracic ECHO scans and to produce preliminary evidence regarding its utility. Methods: This is a prospective cohort study conducted in 2 phases. ECHO scans will be collected from 120 participants referred for ECHO examination based on routine clinical practice in the Hippokration General Hospital, Thessaloniki, Greece. During the first phase, 60 scans will be processed by 15 cardiologists of different experience levels and the AI-based tool to determine whether the latter is noninferior in LV-EF and LV-GLS estimation accuracy (primary outcomes) compared to cardiologists. Secondary outcomes include the time required for estimation and Bland-Altman plots and intraclass correlation coefficients to assess measurement reliability for both the AI and cardiologists. In the second phase, the rest of the scans will be examined by the same cardiologists with and without the AI-based tool to primarily evaluate whether the combination of the cardiologist and the tool is superior in terms of correctness of LV function diagnosis (normal or abnormal) to the cardiologist’s routine examination practice, accounting for the cardiologist’s level of ECHO experience. Secondary outcomes include time to diagnosis and the system usability scale score. Reference LV-EF and LV-GLS measurements and LV function diagnoses will be provided by a panel of 3 expert cardiologists. Results: Recruitment started in September 2022, and data collection is ongoing. The results of the first phase are expected to be available by summer 2023, while the study will conclude in May 2024, with the end of the second phase. Conclusions: This study will provide external evidence regarding the clinical performance and utility of the AI-based tool based on prospectively collected ECHO scans in the routine clinical setting, thus reflecting real-world clinical scenarios. The study protocol may be useful to investigators conducting similar research. International Registered Report Identifier (IRRID): DERR1-10.2196/44650 %M 36912875 %R 10.2196/44650 %U https://www.researchprotocols.org/2023/1/e44650 %U https://doi.org/10.2196/44650 %U http://www.ncbi.nlm.nih.gov/pubmed/36912875 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e41264 %T Developing an Inpatient Electronic Medical Record Phenotype for Hospital-Acquired Pressure Injuries: Case Study Using Natural Language Processing Models %A Nurmambetova,Elvira %A Pan,Jie %A Zhang,Zilong %A Wu,Guosong %A Lee,Seungwon %A Southern,Danielle A %A Martin,Elliot A %A Ho,Chester %A Xu,Yuan %A Eastwood,Cathy A %+ Centre for Health Informatics, Cumming School of Medicine, University of Calgary, 3330 Hospital Dr NW, Calgary, AB, T2N 4N1, Canada, 1 4032206843, elvira.nurmambetova@ucalgary.ca %K pressure injury %K natural language processing %K NLP %K algorithm %K phenotype algorithm %K phenotyping algorithm %K machine learning %K electronic medical record %K EMR %K pressure sore %K pressure wound %K pressure ulcer %K pressure injuries %K detect %D 2023 %7 8.3.2023 %9 Original Paper %J JMIR AI %G English %X Background: Surveillance of hospital-acquired pressure injuries (HAPI) is often suboptimal when relying on administrative health data, as International Classification of Diseases (ICD) codes are known to have long delays and are undercoded. We leveraged natural language processing (NLP) applications on free-text notes, particularly the inpatient nursing notes, from electronic medical records (EMRs), to more accurately and timely identify HAPIs. Objective: This study aimed to show that EMR-based phenotyping algorithms are more fitted to detect HAPIs than ICD-10-CA algorithms alone, while the clinical logs are recorded with higher accuracy via NLP using nursing notes. Methods: Patients with HAPIs were identified from head-to-toe skin assessments in a local tertiary acute care hospital during a clinical trial that took place from 2015 to 2018 in Calgary, Alberta, Canada. Clinical notes documented during the trial were extracted from the EMR database after the linkage with the discharge abstract database. Different combinations of several types of clinical notes were processed by sequential forward selection during the model development. Text classification algorithms for HAPI detection were developed using random forest (RF), extreme gradient boosting (XGBoost), and deep learning models. The classification threshold was tuned to enable the model to achieve similar specificity to an ICD-based phenotyping study. Each model’s performance was assessed, and comparisons were made between the metrics, including sensitivity, positive predictive value, negative predictive value, and F1-score. Results: Data from 280 eligible patients were used in this study, among whom 97 patients had HAPIs during the trial. RF was the optimal performing model with a sensitivity of 0.464 (95% CI 0.365-0.563), specificity of 0.984 (95% CI 0.965-1.000), and F1-score of 0.612 (95% CI of 0.473-0.751). The machine learning (ML) model reached higher sensitivity without sacrificing much specificity compared to the previously reported performance of ICD-based algorithms. Conclusions: The EMR-based NLP phenotyping algorithms demonstrated improved performance in HAPI case detection over ICD-10-CA codes alone. Daily generated nursing notes in EMRs are a valuable data resource for ML models to accurately detect adverse events. The study contributes to enhancing automated health care quality and safety surveillance. %M 38875552 %R 10.2196/41264 %U https://ai.jmir.org/2023/1/e41264 %U https://doi.org/10.2196/41264 %U http://www.ncbi.nlm.nih.gov/pubmed/38875552 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e42789 %T Evaluating the Ability of Open-Source Artificial Intelligence to Predict Accepting-Journal Impact Factor and Eigenfactor Score Using Academic Article Abstracts: Cross-sectional Machine Learning Analysis %A Macri,Carmelo %A Bacchi,Stephen %A Teoh,Sheng Chieh %A Lim,Wan Yin %A Lam,Lydia %A Patel,Sandy %A Slee,Mark %A Casson,Robert %A Chan,WengOnn %+ Discipline of Ophthalmology and Visual Sciences, The University of Adelaide, North Terrace, Adelaide, 5000, Australia, 61 468951763, carmelo.macri@adelaide.edu.au %K journal impact factor %K artificial intelligence %K ophthalmology %K radiology %K neurology %K eye %K neuroscience %K impact factor %K research quality %K journal recommender %K publish %K open source %K predict %K machine learning %K academic journal %K scientometric %K scholarly literature %D 2023 %7 7.3.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Strategies to improve the selection of appropriate target journals may reduce delays in disseminating research results. Machine learning is increasingly used in content-based recommender algorithms to guide journal submissions for academic articles. Objective: We sought to evaluate the performance of open-source artificial intelligence to predict the impact factor or Eigenfactor score tertile using academic article abstracts. Methods: PubMed-indexed articles published between 2016 and 2021 were identified with the Medical Subject Headings (MeSH) terms “ophthalmology,” “radiology,” and “neurology.” Journals, titles, abstracts, author lists, and MeSH terms were collected. Journal impact factor and Eigenfactor scores were sourced from the 2020 Clarivate Journal Citation Report. The journals included in the study were allocated percentile ranks based on impact factor and Eigenfactor scores, compared with other journals that released publications in the same year. All abstracts were preprocessed, which included the removal of the abstract structure, and combined with titles, authors, and MeSH terms as a single input. The input data underwent preprocessing with the inbuilt ktrain Bidirectional Encoder Representations from Transformers (BERT) preprocessing library before analysis with BERT. Before use for logistic regression and XGBoost models, the input data underwent punctuation removal, negation detection, stemming, and conversion into a term frequency-inverse document frequency array. Following this preprocessing, data were randomly split into training and testing data sets with a 3:1 train:test ratio. Models were developed to predict whether a given article would be published in a first, second, or third tertile journal (0-33rd centile, 34th-66th centile, or 67th-100th centile), as ranked either by impact factor or Eigenfactor score. BERT, XGBoost, and logistic regression models were developed on the training data set before evaluation on the hold-out test data set. The primary outcome was overall classification accuracy for the best-performing model in the prediction of accepting journal impact factor tertile. Results: There were 10,813 articles from 382 unique journals. The median impact factor and Eigenfactor score were 2.117 (IQR 1.102-2.622) and 0.00247 (IQR 0.00105-0.03), respectively. The BERT model achieved the highest impact factor tertile classification accuracy of 75.0%, followed by an accuracy of 71.6% for XGBoost and 65.4% for logistic regression. Similarly, BERT achieved the highest Eigenfactor score tertile classification accuracy of 73.6%, followed by an accuracy of 71.8% for XGBoost and 65.3% for logistic regression. Conclusions: Open-source artificial intelligence can predict the impact factor and Eigenfactor score of accepting peer-reviewed journals. Further studies are required to examine the effect on publication success and the time-to-publication of such recommender systems. %M 36881455 %R 10.2196/42789 %U https://www.jmir.org/2023/1/e42789 %U https://doi.org/10.2196/42789 %U http://www.ncbi.nlm.nih.gov/pubmed/36881455 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e44875 %T Investigating the Secondary Use of Clinical Research Data: Protocol for a Mixed Methods Study %A Waithira,Naomi %A Kestelyn,Evelyne %A Chotthanawathit,Keitcheya %A Osterrieder,Anne %A Mukaka,Mavuto %A Lang,Trudie %A Cheah,Phaik Yeong %+ Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, 420/6 Rajvithi, Bangkok, 10400, Thailand, 66 22036333, naomi@tropmedres.ac %K data reuse %K data sharing %K secondary data use %K clinical trials data %K artificial intelligence %K machine learning %K individual patient data %K clinical research %K barriers %K online survey %K mixed methods %K low- and middle-income country %D 2023 %7 6.3.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: The increasing emphasis to share patient data from clinical research has resulted in substantial investments in data repositories and infrastructure. However, it is unclear how shared data are used and whether anticipated benefits are being realized. Objective: The purpose of our study is to examine the current utilization of shared clinical research data sets and assess the effects on both scientific research and public health outcomes. Additionally, the study seeks to identify the factors that hinder or facilitate the ethical and efficient use of existing data based on the perspectives of data users. Methods: The study will utilize a mixed methods design, incorporating a cross-sectional survey and in-depth interviews. The survey will involve at least 400 clinical researchers, while the in-depth interviews will include 20 to 40 participants who have utilized data from repositories or institutional data access committees. The survey will target a global sample, while the in-depth interviews will focus on individuals who have used data collected from low- and middle-income countries. Quantitative data will be summarized by using descriptive statistics, while multivariable analyses will be used to assess the relationships between variables. Qualitative data will be analyzed through thematic analysis, and the findings will be reported in accordance with the COREQ (Consolidated Criteria for Reporting Qualitative Research) guidelines. The study received ethical approval from the Oxford Tropical Research Ethics Committee in 2020 (reference number: 568-20). Results: The results of the analysis, including both quantitative data and qualitative data, will be available in 2023. Conclusions: The outcomes of our study will offer crucial understanding into the current status of data reuse in clinical research, serving as a basis for guiding future endeavors to enhance the utilization of shared data for the betterment of public health outcomes and for scientific progress. Trial Registration: Thai Clinical Trials Registry TCTR20210301006; https://tinyurl.com/2p9atzhr International Registered Report Identifier (IRRID): DERR1-10.2196/44875 %M 36877564 %R 10.2196/44875 %U https://www.researchprotocols.org/2023/1/e44875 %U https://doi.org/10.2196/44875 %U http://www.ncbi.nlm.nih.gov/pubmed/36877564 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e40973 %T Preparing for an Artificial Intelligence–Enabled Future: Patient Perspectives on Engagement and Health Care Professional Training for Adopting Artificial Intelligence Technologies in Health Care Settings %A Jeyakumar,Tharshini %A Younus,Sarah %A Zhang,Melody %A Clare,Megan %A Charow,Rebecca %A Karsan,Inaara %A Dhalla,Azra %A Al-Mouaswas,Dalia %A Scandiffio,Jillian %A Aling,Justin %A Salhia,Mohammad %A Lalani,Nadim %A Overholt,Scott %A Wiljer,David %+ University Health Network, 190 Elizabeth Street R. Fraser Elliott Building RFE 3S-441, Toronto, ON, M5G 2C4, Canada, 1 416 340 4800 ext 6322, David.wiljer@uhn.ca %K artificial intelligence %K patient %K education %K attitude %K health data %K adoption %K health equity %K patient engagement %D 2023 %7 2.3.2023 %9 Original Paper %J JMIR AI %G English %X Background: As new technologies emerge, there is a significant shift in the way care is delivered on a global scale. Artificial intelligence (AI) technologies have been rapidly and inexorably used to optimize patient outcomes, reduce health system costs, improve workflow efficiency, and enhance population health. Despite the widespread adoption of AI technologies, the literature on patient engagement and their perspectives on how AI will affect clinical care is scarce. Minimal patient engagement can limit the optimization of these novel technologies and contribute to suboptimal use in care settings. Objective: We aimed to explore patients’ views on what skills they believe health care professionals should have in preparation for this AI-enabled future and how we can better engage patients when adopting and deploying AI technologies in health care settings. Methods: Semistructured interviews were conducted from August 2020 to December 2021 with 12 individuals who were a patient in any Canadian health care setting. Interviews were conducted until thematic saturation occurred. A thematic analysis approach outlined by Braun and Clarke was used to inductively analyze the data and identify overarching themes. Results: Among the 12 patients interviewed, 8 (67%) were from urban settings and 4 (33%) were from rural settings. A majority of the participants were very comfortable with technology (n=6, 50%) and somewhat familiar with AI (n=7, 58%). In total, 3 themes emerged: cultivating patients’ trust, fostering patient engagement, and establishing data governance and validation of AI technologies. Conclusions: With the rapid surge of AI solutions, there is a critical need to understand patient values in advancing the quality of care and contributing to an equitable health system. Our study demonstrated that health care professionals play a synergetic role in the future of AI and digital technologies. Patient engagement is vital in addressing underlying health inequities and fostering an optimal care experience. Future research is warranted to understand and capture the diverse perspectives of patients with various racial, ethnic, and socioeconomic backgrounds. %M 38875561 %R 10.2196/40973 %U https://ai.jmir.org/2023/1/e40973 %U https://doi.org/10.2196/40973 %U http://www.ncbi.nlm.nih.gov/pubmed/38875561 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 10 %N %P e40899 %T Trends in Language Use During the COVID-19 Pandemic and Relationship Between Language Use and Mental Health: Text Analysis Based on Free Responses From a Longitudinal Study %A Weger,Rachel %A Lossio-Ventura,Juan Antonio %A Rose-McCandlish,Margaret %A Shaw,Jacob S %A Sinclair,Stephen %A Pereira,Francisco %A Chung,Joyce Y %A Atlas,Lauren Yvette %+ National Center for Complementary and Integrative Health, National Institutes of Health, 10 Center Drive, Rm 4-1741, Bethesda, MD, 20892, United States, 1 301 827 0214, lauren.atlas@nih.gov %K COVID-19 %K mental health %K natural language processing %K sentiment analysis %K free response %K qualitative %K text analysis %K mental illness %K text %K mental state %K language %K pandemic %K age %K education %D 2023 %7 1.3.2023 %9 Original Paper %J JMIR Ment Health %G English %X Background: The COVID-19 pandemic and its associated restrictions have been a major stressor that has exacerbated mental health worldwide. Qualitative data play a unique role in documenting mental states through both language features and content. Text analysis methods can provide insights into the associations between language use and mental health and reveal relevant themes that emerge organically in open-ended responses. Objective: The aim of this web-based longitudinal study on mental health during the early COVID-19 pandemic was to use text analysis methods to analyze free responses to the question, “Is there anything else you would like to tell us that might be important that we did not ask about?” Our goals were to determine whether individuals who responded to the item differed from nonresponders, to determine whether there were associations between language use and psychological status, and to characterize the content of responses and how responses changed over time. Methods: A total of 3655 individuals enrolled in the study were asked to complete self-reported measures of mental health and COVID-19 pandemic–related questions every 2 weeks for 6 months. Of these 3655 participants, 2497 (68.32%) provided at least 1 free response (9741 total responses). We used various text analysis methods to measure the links between language use and mental health and to characterize response themes over the first year of the pandemic. Results: Response likelihood was influenced by demographic factors and health status: those who were male, Asian, Black, or Hispanic were less likely to respond, and the odds of responding increased with age and education as well as with a history of physical health conditions. Although mental health treatment history did not influence the overall likelihood of responding, it was associated with more negative sentiment, negative word use, and higher use of first-person singular pronouns. Responses were dynamically influenced by psychological status such that distress and loneliness were positively associated with an individual’s likelihood to respond at a given time point and were associated with more negativity. Finally, the responses were negative in valence overall and exhibited fluctuations linked with external events. The responses covered a variety of topics, with the most common being mental health and emotion, social or physical distancing, and policy and government. Conclusions: Our results identify trends in language use during the first year of the pandemic and suggest that both the content of responses and overall sentiments are linked to mental health. %M 36525362 %R 10.2196/40899 %U https://mental.jmir.org/2023/1/e40899 %U https://doi.org/10.2196/40899 %U http://www.ncbi.nlm.nih.gov/pubmed/36525362 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 6 %N %P e45529 %T Analyzing the Predictability of an Artificial Intelligence App (Tibot) in the Diagnosis of Dermatological Conditions: A Cross-sectional Study %A Marri,Shiva Shankar %A Inamadar,Arun C %A Janagond,Ajit B %A Albadri,Warood %+ Department of Dermatology, Venereology and Leprosy, Shri B M Patil Medical College, Hospital and Research Centre, BLDE (Deemed to be University), Bangaramma Sajjan Campus, Vijayapur, Karnataka, 586103, India, 91 9448102920, aruninamadar@gmail.com %K artificial intelligence %K AI-assisted diagnosis %K machine learning %K neural network %K deep learning %K dermatology %K mobile %K application %K app %D 2023 %7 1.3.2023 %9 Original Paper %J JMIR Dermatol %G English %X Background: Artificial intelligence (AI) aims to create programs that reproduce human cognition and processes involved in interpreting complex data. Dermatology relies on morphological features and is ideal for applying AI image recognition for assisted diagnosis. Tibot is an AI app that analyzes skin conditions and works on the principle of a convolutional neural network. Appropriate research analyzing the accuracy of such apps is necessary. Objective: This study aims to analyze the predictability of the Tibot AI app in the identification of dermatological diseases as compared to a dermatologist. Methods: This is a cross-sectional study. After taking informed consent, photographs of lesions of patients with different skin conditions were uploaded to the app. In every condition, the AI predicted three diagnoses based on probability, and these were compared with that by a dermatologist. The ability of the AI app to predict the actual diagnosis in the top one and top three anticipated diagnoses (prediction accuracy) was used to evaluate the app’s effectiveness. Sensitivity, specificity, and positive predictive value were also used to assess the app’s performance. Chi-square test was used to contrast categorical variables. P<.05 was considered statistically significant. Results: A total of 600 patients were included. Clinical conditions included alopecia, acne, eczema, immunological disorders, pigmentary disorders, psoriasis, infestation, tumors, and infections. In the anticipated top three diagnoses, the app’s mean prediction accuracy was 96.1% (95% CI 94.3%-97.5%), while for the exact diagnosis, it was 80.6% (95% CI 77.2%-83.7%). The prediction accuracy (top one) for alopecia, acne, pigmentary disorders, and fungal infections was 97.7%, 91.7%, 88.5%, and 82.9%, respectively. Prediction accuracy (top three) for alopecia, eczema, and tumors was 100%. The sensitivity and specificity of the app were 97% (95% CI 95%-98%) and 98% (95% CI 98%-99%), respectively. There is a statistically significant association between clinical and AI-predicted diagnoses in all conditions (P<.001). Conclusions: The AI app has shown promising results in diagnosing various dermatological conditions, and there is great potential for practical applicability. %M 37632978 %R 10.2196/45529 %U https://derma.jmir.org/2023/1/e45529 %U https://doi.org/10.2196/45529 %U http://www.ncbi.nlm.nih.gov/pubmed/37632978 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e42324 %T Classifying COVID-19 Patients From Chest X-ray Images Using Hybrid Machine Learning Techniques: Development and Evaluation %A Phumkuea,Thanakorn %A Wongsirichot,Thakerng %A Damkliang,Kasikrit %A Navasakulpong,Asma %+ Division of Computational Science, Faculty of Science, Prince of Songkla University, 15 Kanjanavanich Road, Hat Yai, Songkhla, 90110, Thailand, 66 846414784, thakerng.w@psu.ac.th %K COVID-19 %K machine learning %K medical informatics %K coronavirus %K diagnosis %K model %K detection %K healthy %K unhealthy %K public %K usage %K data %K database %K accuracy %K development %K x-ray %K imaging %D 2023 %7 28.2.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: The COVID-19 pandemic has raised global concern, with moderate to severe cases displaying lung inflammation and respiratory failure. Chest x-ray (CXR) imaging is crucial for diagnosis and is usually interpreted by experienced medical specialists. Machine learning has been applied with acceptable accuracy, but computational efficiency has received less attention. Objective: We introduced a novel hybrid machine learning model to accurately classify COVID-19, non-COVID-19, and healthy patients from CXR images with reduced computational time and promising results. Our proposed model was thoroughly evaluated and compared with existing models. Methods: A retrospective study was conducted to analyze 5 public data sets containing 4200 CXR images using machine learning techniques including decision trees, support vector machines, and neural networks. The images were preprocessed to undergo image segmentation, enhancement, and feature extraction. The best performing machine learning technique was selected and combined into a multilayer hybrid classification model for COVID-19 (MLHC-COVID-19). The model consisted of 2 layers. The first layer was designed to differentiate healthy individuals from infected patients, while the second layer aimed to classify COVID-19 and non-COVID-19 patients. Results: The MLHC-COVID-19 model was trained and evaluated on unseen COVID-19 CXR images, achieving reasonably high accuracy and F measures of 0.962 and 0.962, respectively. These results show the effectiveness of the MLHC-COVID-19 in classifying COVID-19 CXR images, with improved accuracy and a reduction in interpretation time. The model was also embedded into a web-based MLHC-COVID-19 computer-aided diagnosis system, which was made publicly available. Conclusions: The study found that the MLHC-COVID-19 model effectively differentiated CXR images of COVID-19 patients from those of healthy and non-COVID-19 individuals. It outperformed other state-of-the-art deep learning techniques and showed promising results. These results suggest that the MLHC-COVID-19 model could have been instrumental in early detection and diagnosis of COVID-19 patients, thus playing a significant role in controlling and managing the pandemic. Although the pandemic has slowed down, this model can be adapted and utilized for future similar situations. The model was also integrated into a publicly accessible web-based computer-aided diagnosis system. %M 36780315 %R 10.2196/42324 %U https://formative.jmir.org/2023/1/e42324 %U https://doi.org/10.2196/42324 %U http://www.ncbi.nlm.nih.gov/pubmed/36780315 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e39077 %T German Medical Named Entity Recognition Model and Data Set Creation Using Machine Translation and Word Alignment: Algorithm Development and Validation %A Frei,Johann %A Kramer,Frank %+ IT Infrastructure for Translational Medical Research, University of Augsburg, Alter Postweg 101, Augsburg, 86159, Germany, 49 17691464136, johann.frei@informatik.uni-augsburg.de %K natural language processing %K named entity recognition %K information extraction %D 2023 %7 28.2.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Data mining in the field of medical data analysis often needs to rely solely on the processing of unstructured data to retrieve relevant data. For German natural language processing, few open medical neural named entity recognition (NER) models have been published before this work. A major issue can be attributed to the lack of German training data. Objective: We developed a synthetic data set and a novel German medical NER model for public access to demonstrate the feasibility of our approach. In order to bypass legal restrictions due to potential data leaks through model analysis, we did not make use of internal, proprietary data sets, which is a frequent veto factor for data set publication. Methods: The underlying German data set was retrieved by translation and word alignment of a public English data set. The data set served as a foundation for model training and evaluation. For demonstration purposes, our NER model follows a simple network architecture that is designed for low computational requirements. Results: The obtained data set consisted of 8599 sentences including 30,233 annotations. The model achieved a class frequency–averaged F1 score of 0.82 on the test set after training across 7 different NER types. Artifacts in the synthesized data set with regard to translation and alignment induced by the proposed method were exposed. The annotation performance was evaluated on an external data set and measured in comparison with an existing baseline model that has been trained on a dedicated German data set in a traditional fashion. We discussed the drop in annotation performance on an external data set for our simple NER model. Our model is publicly available. Conclusions: We demonstrated the feasibility of obtaining a data set and training a German medical NER model by the exclusive use of public training data through our suggested method. The discussion on the limitations of our approach includes ways to further mitigate remaining problems in future work. %M 36853741 %R 10.2196/39077 %U https://formative.jmir.org/2023/1/e39077 %U https://doi.org/10.2196/39077 %U http://www.ncbi.nlm.nih.gov/pubmed/36853741 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e42181 %T Machine Learning for Predicting Micro- and Macrovascular Complications in Individuals With Prediabetes or Diabetes: Retrospective Cohort Study %A Schallmoser,Simon %A Zueger,Thomas %A Kraus,Mathias %A Saar-Tsechansky,Maytal %A Stettler,Christoph %A Feuerriegel,Stefan %+ Institute of AI in Management, LMU Munich, Geschwister-Scholl-Platz 1, Munich, 80539, Germany, 49 89 2180 6790, schallmoser@lmu.de %K diabetes %K prediabetes %K machine learning %K microvascular complications %K macrovascular complications %D 2023 %7 27.2.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Micro- and macrovascular complications are a major burden for individuals with diabetes and can already arise in a prediabetic state. To allocate effective treatments and to possibly prevent these complications, identification of those at risk is essential. Objective: This study aimed to build machine learning (ML) models that predict the risk of developing a micro- or macrovascular complication in individuals with prediabetes or diabetes. Methods: In this study, we used electronic health records from Israel that contain information about demographics, biomarkers, medications, and disease codes; span from 2003 to 2013; and were queried to identify individuals with prediabetes or diabetes in 2008. Subsequently, we aimed to predict which of these individuals developed a micro- or macrovascular complication within the next 5 years. We included 3 microvascular complications: retinopathy, nephropathy, and neuropathy. In addition, we considered 3 macrovascular complications: peripheral vascular disease (PVD), cerebrovascular disease (CeVD), and cardiovascular disease (CVD). Complications were identified via disease codes, and, for nephropathy, the estimated glomerular filtration rate and albuminuria were considered additionally. Inclusion criteria were complete information on age and sex and on disease codes (or measurements of estimated glomerular filtration rate and albuminuria for nephropathy) until 2013 to account for patient dropout. Exclusion criteria for predicting a complication were diagnosis of this specific complication before or in 2008. In total, 105 predictors from demographics, biomarkers, medications, and disease codes were used to build the ML models. We compared 2 ML models: logistic regression and gradient-boosted decision trees (GBDTs). To explain the predictions of the GBDTs, we calculated Shapley additive explanations values. Results: Overall, 13,904 and 4259 individuals with prediabetes and diabetes, respectively, were identified in our underlying data set. For individuals with prediabetes, the areas under the receiver operating characteristic curve for logistic regression and GBDTs were, respectively, 0.657 and 0.681 (retinopathy), 0.807 and 0.815 (nephropathy), 0.727 and 0.706 (neuropathy), 0.730 and 0.727 (PVD), 0.687 and 0.693 (CeVD), and 0.707 and 0.705 (CVD); for individuals with diabetes, the areas under the receiver operating characteristic curve were, respectively, 0.673 and 0.726 (retinopathy), 0.763 and 0.775 (nephropathy), 0.745 and 0.771 (neuropathy), 0.698 and 0.715 (PVD), 0.651 and 0.646 (CeVD), and 0.686 and 0.680 (CVD). Overall, the prediction performance is comparable for logistic regression and GBDTs. The Shapley additive explanations values showed that increased levels of blood glucose, glycated hemoglobin, and serum creatinine are risk factors for microvascular complications. Age and hypertension were associated with an elevated risk for macrovascular complications. Conclusions: Our ML models allow for an identification of individuals with prediabetes or diabetes who are at increased risk of developing micro- or macrovascular complications. The prediction performance varied across complications and target populations but was in an acceptable range for most prediction tasks. %M 36848190 %R 10.2196/42181 %U https://www.jmir.org/2023/1/e42181 %U https://doi.org/10.2196/42181 %U http://www.ncbi.nlm.nih.gov/pubmed/36848190 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e40789 %T Artificial Intelligence–Based Chatbots for Promoting Health Behavioral Changes: Systematic Review %A Aggarwal,Abhishek %A Tam,Cheuk Chi %A Wu,Dezhi %A Li,Xiaoming %A Qiao,Shan %+ Department of Health Promotion, Education and Behavior, Arnold School of Public Health, University of South Carolina, 915 Greene St, Columbia, SC, 29201, United States, 1 803 777 6844, SHANQIAO@mailbox.sc.edu %K chatbot %K artificial intelligence %K AI %K health behavior change %K engagement %K efficacy %K intervention %K feasibility %K usability %K acceptability %K mobile phone %D 2023 %7 24.2.2023 %9 Review %J J Med Internet Res %G English %X Background: Artificial intelligence (AI)–based chatbots can offer personalized, engaging, and on-demand health promotion interventions. Objective: The aim of this systematic review was to evaluate the feasibility, efficacy, and intervention characteristics of AI chatbots for promoting health behavior change. Methods: A comprehensive search was conducted in 7 bibliographic databases (PubMed, IEEE Xplore, ACM Digital Library, PsycINFO, Web of Science, Embase, and JMIR publications) for empirical articles published from 1980 to 2022 that evaluated the feasibility or efficacy of AI chatbots for behavior change. The screening, extraction, and analysis of the identified articles were performed by following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Results: Of the 15 included studies, several demonstrated the high efficacy of AI chatbots in promoting healthy lifestyles (n=6, 40%), smoking cessation (n=4, 27%), treatment or medication adherence (n=2, 13%), and reduction in substance misuse (n=1, 7%). However, there were mixed results regarding feasibility, acceptability, and usability. Selected behavior change theories and expert consultation were used to develop the behavior change strategies of AI chatbots, including goal setting, monitoring, real-time reinforcement or feedback, and on-demand support. Real-time user-chatbot interaction data, such as user preferences and behavioral performance, were collected on the chatbot platform to identify ways of providing personalized services. The AI chatbots demonstrated potential for scalability by deployment through accessible devices and platforms (eg, smartphones and Facebook Messenger). The participants also reported that AI chatbots offered a nonjudgmental space for communicating sensitive information. However, the reported results need to be interpreted with caution because of the moderate to high risk of internal validity, insufficient description of AI techniques, and limitation for generalizability. Conclusions: AI chatbots have demonstrated the efficacy of health behavior change interventions among large and diverse populations; however, future studies need to adopt robust randomized control trials to establish definitive conclusions. %M 36826990 %R 10.2196/40789 %U https://www.jmir.org/2023/1/e40789 %U https://doi.org/10.2196/40789 %U http://www.ncbi.nlm.nih.gov/pubmed/36826990 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 9 %N %P e41624 %T Artificial Intelligence in Community-Based Diabetic Retinopathy Telemedicine Screening in Urban China: Cost-effectiveness and Cost-Utility Analyses With Real-world Data %A Lin,Senlin %A Ma,Yingyan %A Xu,Yi %A Lu,Lina %A He,Jiangnan %A Zhu,Jianfeng %A Peng,Yajun %A Yu,Tao %A Congdon,Nathan %A Zou,Haidong %+ Department of Eye Disease Prevention and Control, Shanghai Eye Disease Prevention and Treatment Center/Shanghai Eye Hospital, No. 1440, Hongqiao Road, Shanghai, 200336, China, 86 02162539696, zouhaidong@sjtu.edu.cn %K artificial intelligence %K cost %K diabetic retinopathy %K utility %K low- and middle-income countries %K screening %D 2023 %7 23.2.2023 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Community-based telemedicine screening for diabetic retinopathy (DR) has been highly recommended worldwide. However, evidence from low- and middle-income countries (LMICs) on the choice between artificial intelligence (AI)–based and manual grading–based telemedicine screening is inadequate for policy making. Objective: The aim of this study was to test whether the AI model is more worthwhile than manual grading in community-based telemedicine screening for DR in the context of labor costs in urban China. Methods: We conducted cost-effectiveness and cost-utility analyses by using decision-analytic Markov models with 30 one-year cycles from a societal perspective to compare the cost, effectiveness, and utility of 2 scenarios in telemedicine screening for DR: manual grading and an AI model. Sensitivity analyses were performed. Real-world data were obtained mainly from the Shanghai Digital Eye Disease Screening Program. The main outcomes were the incremental cost-effectiveness ratio (ICER) and the incremental cost-utility ratio (ICUR). The ICUR thresholds were set as 1 and 3 times the local gross domestic product per capita. Results: The total expected costs for a 65-year-old resident were US $3182.50 and US $3265.40, while the total expected years without blindness were 9.80 years and 9.83 years, and the utilities were 6.748 quality-adjusted life years (QALYs) and 6.753 QALYs in the AI model and manual grading, respectively. The ICER for the AI-assisted model was US $2553.39 per year without blindness, and the ICUR was US $15,216.96 per QALY, which indicated that AI-assisted model was not cost-effective. The sensitivity analysis suggested that if there is an increase in compliance with referrals after the adoption of AI by 7.5%, an increase in on-site screening costs in manual grading by 50%, or a decrease in on-site screening costs in the AI model by 50%, then the AI model could be the dominant strategy. Conclusions: Our study may provide a reference for policy making in planning community-based telemedicine screening for DR in LMICs. Our findings indicate that unless the referral compliance of patients with suspected DR increases, the adoption of the AI model may not improve the value of telemedicine screening compared to that of manual grading in LMICs. The main reason is that in the context of the low labor costs in LMICs, the direct health care costs saved by replacing manual grading with AI are less, and the screening effectiveness (QALYs and years without blindness) decreases. Our study suggests that the magnitude of the value generated by this technology replacement depends primarily on 2 aspects. The first is the extent of direct health care costs reduced by AI, and the second is the change in health care service utilization caused by AI. Therefore, our research can also provide analytical ideas for other health care sectors in their decision to use AI. %M 36821353 %R 10.2196/41624 %U https://publichealth.jmir.org/2023/1/e41624 %U https://doi.org/10.2196/41624 %U http://www.ncbi.nlm.nih.gov/pubmed/36821353 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e40167 %T Application of Artificial Intelligence to the Monitoring of Medication Adherence for Tuberculosis Treatment in Africa: Algorithm Development and Validation %A Sekandi,Juliet Nabbuye %A Shi,Weili %A Zhu,Ronghang %A Kaggwa,Patrick %A Mwebaze,Ernest %A Li,Sheng %+ Global Health Institute, College of Public Health, University of Georgia, 100 Foster Road, Athens, GA, 30602, United States, 1 706 542 5257, jsekandi@uga.edu %K artificial intelligence %K deep learning %K machine learning %K medication adherence %K digital technology %K digital health %K tuberculosis %K video directly observed therapy %K video therapy %D 2023 %7 23.2.2023 %9 Original Paper %J JMIR AI %G English %X Background: Artificial intelligence (AI) applications based on advanced deep learning methods in image recognition tasks can increase efficiency in the monitoring of medication adherence through automation. AI has sparsely been evaluated for the monitoring of medication adherence in clinical settings. However, AI has the potential to transform the way health care is delivered even in limited-resource settings such as Africa. Objective: We aimed to pilot the development of a deep learning model for simple binary classification and confirmation of proper medication adherence to enhance efficiency in the use of video monitoring of patients in tuberculosis treatment. Methods: We used a secondary data set of 861 video images of medication intake that were collected from consenting adult patients with tuberculosis in an institutional review board–approved study evaluating video-observed therapy in Uganda. The video images were processed through a series of steps to prepare them for use in a training model. First, we annotated videos using a specific protocol to eliminate those with poor quality. After the initial annotation step, 497 videos had sufficient quality for training the models. Among them, 405 were positive samples, whereas 92 were negative samples. With some preprocessing techniques, we obtained 160 frames with a size of 224 × 224 in each video. We used a deep learning framework that leveraged 4 convolutional neural networks models to extract visual features from the video frames and automatically perform binary classification of adherence or nonadherence. We evaluated the diagnostic properties of the different models using sensitivity, specificity, F1-score, and precision. The area under the curve (AUC) was used to assess the discriminative performance and the speed per video review as a metric for model efficiency. We conducted a 5-fold internal cross-validation to determine the diagnostic and discriminative performance of the models. We did not conduct external validation due to a lack of publicly available data sets with specific medication intake video frames. Results: Diagnostic properties and discriminative performance from internal cross-validation were moderate to high in the binary classification tasks with 4 selected automated deep learning models. The sensitivity ranged from 92.8 to 95.8%, specificity from 43.5 to 55.4%, F1-score from 0.91 to 0.92, precision from 88% to 90.1%, and AUC from 0.78 to 0.85. The 3D ResNet model had the highest precision, AUC, and speed. Conclusions: All 4 deep learning models showed comparable diagnostic properties and discriminative performance. The findings serve as a reasonable proof of concept to support the potential application of AI in the binary classification of video frames to predict medication adherence. %M 38464947 %R 10.2196/40167 %U https://ai.jmir.org/2023/1/e40167 %U https://doi.org/10.2196/40167 %U http://www.ncbi.nlm.nih.gov/pubmed/38464947 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e44818 %T Real-Time Detection of Sleep Apnea Based on Breathing Sounds and Prediction Reinforcement Using Home Noises: Algorithm Development and Validation %A Le,Vu Linh %A Kim,Daewoo %A Cho,Eunsung %A Jang,Hyeryung %A Reyes,Roben Delos %A Kim,Hyunggug %A Lee,Dongheon %A Yoon,In-Young %A Hong,Joonki %A Kim,Jeong-Whun %+ Department of Otorhinolaryngology, Seoul National University Bundang Hospital, 82, Gumi-ro 173 Beon-gil, Bundang-gu, Gyeonggi-do, Seongnam-si, 13620, Republic of Korea, 82 030797405, kimemails7@gmail.com %K sleep apnea %K OSA detection %K home care %K artificial intelligence %K deep learning %K prediction model %K audio %K diagnostic %K home technology %K sound %D 2023 %7 22.2.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Multinight monitoring can be helpful for the diagnosis and management of obstructive sleep apnea (OSA). For this purpose, it is necessary to be able to detect OSA in real time in a noisy home environment. Sound-based OSA assessment holds great potential since it can be integrated with smartphones to provide full noncontact monitoring of OSA at home. Objective: The purpose of this study is to develop a predictive model that can detect OSA in real time, even in a home environment where various noises exist. Methods: This study included 1018 polysomnography (PSG) audio data sets, 297 smartphone audio data sets synced with PSG, and a home noise data set containing 22,500 noises to train the model to predict breathing events, such as apneas and hypopneas, based on breathing sounds that occur during sleep. The whole breathing sound of each night was divided into 30-second epochs and labeled as “apnea,” “hypopnea,” or “no-event,” and the home noises were used to make the model robust to a noisy home environment. The performance of the prediction model was assessed using epoch-by-epoch prediction accuracy and OSA severity classification based on the apnea-hypopnea index (AHI). Results: Epoch-by-epoch OSA event detection showed an accuracy of 86% and a macro F1-score of 0.75 for the 3-class OSA event detection task. The model had an accuracy of 92% for “no-event,” 84% for “apnea,” and 51% for “hypopnea.” Most misclassifications were made for “hypopnea,” with 15% and 34% of “hypopnea” being wrongly predicted as “apnea” and “no-event,” respectively. The sensitivity and specificity of the OSA severity classification (AHI≥15) were 0.85 and 0.84, respectively. Conclusions: Our study presents a real-time epoch-by-epoch OSA detector that works in a variety of noisy home environments. Based on this, additional research is needed to verify the usefulness of various multinight monitoring and real-time diagnostic technologies in the home environment. %M 36811943 %R 10.2196/44818 %U https://www.jmir.org/2023/1/e44818 %U https://doi.org/10.2196/44818 %U http://www.ncbi.nlm.nih.gov/pubmed/36811943 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 9 %N %P e43419 %T Prediction of Suicidal Behaviors in the Middle-aged Population: Machine Learning Analyses of UK Biobank %A Wang,Junren %A Qiu,Jiajun %A Zhu,Ting %A Zeng,Yu %A Yang,Huazhen %A Shang,Yanan %A Yin,Jin %A Sun,Yajing %A Qu,Yuanyuan %A Valdimarsdóttir,Unnur A %A Song,Huan %+ West China Biomedical Big Data Center, West China Hospital, Sichuan University, Guo Xue Lane 37, Chengdu, 610021, China, 86 28 85164176, songhuan@wchscu.cn %K suicide %K suicidal behaviors %K risk prediction %K machine learning approach %K genetic susceptibility %K machine learning %K behavior %K data %K model %K sex %K risk %K cost-effective %D 2023 %7 20.2.2023 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Suicidal behaviors, including suicide deaths and attempts, are major public health concerns. However, previous suicide models required a huge amount of input features, resulting in limited applicability in clinical practice. Objective: We aimed to construct applicable models (ie, with limited features) for short- and long-term suicidal behavior prediction. We further validated these models among individuals with different genetic risks of suicide. Methods: Based on the prospective cohort of UK Biobank, we included 223 (0.06%) eligible cases of suicide attempts or deaths, according to hospital inpatient or death register data within 1 year from baseline and randomly selected 4460 (1.18%) controls (1:20) without such records. We similarly identified 833 (0.22%) cases of suicidal behaviors 1 to 6 years from baseline and 16,660 (4.42%) corresponding controls. Based on 143 input features, mainly including sociodemographic, environmental, and psychosocial factors; medical history; and polygenic risk scores (PRS) for suicidality, we applied a bagged balanced light gradient-boosting machine (LightGBM) with stratified 10-fold cross-validation and grid-search to construct the full prediction models for suicide attempts or deaths within 1 year or between 1 and 6 years. The Shapley Additive Explanations (SHAP) approach was used to quantify the importance of input features, and the top 20 features with the highest SHAP values were selected to train the applicable models. The external validity of the established models was assessed among 50,310 individuals who participated in UK Biobank repeated assessments both overall and by the level of PRS for suicidality. Results: Individuals with suicidal behaviors were on average 56 years old, with equal sex distribution. The application of these full models in the external validation data set demonstrated good model performance, with the area under the receiver operating characteristic (AUROC) curves of 0.919 and 0.892 within 1 year and between 1 and 6 years, respectively. Importantly, the applicable models with the top 20 most important features showed comparable external-validated performance (AUROC curves of 0.901 and 0.885) as the full models, based on which we found that individuals in the top quintile of predicted risk accounted for 91.7% (n=11) and 80.7% (n=25) of all suicidality cases within 1 year and during 1 to 6 years, respectively. We further obtained comparable prediction accuracy when applying these models to subpopulations with different genetic susceptibilities to suicidality. For example, for the 1-year risk prediction, the AUROC curves were 0.907 and 0.885 for the high (>2nd tertile of PRS) and low (<1st) genetic susceptibilities groups, respectively. Conclusions: We established applicable machine learning–based models for predicting both the short- and long-term risk of suicidality with high accuracy across populations of varying genetic risk for suicide, highlighting a cost-effective method of identifying individuals with a high risk of suicidality. %M 36805366 %R 10.2196/43419 %U https://publichealth.jmir.org/2023/1/e43419 %U https://doi.org/10.2196/43419 %U http://www.ncbi.nlm.nih.gov/pubmed/36805366 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e42253 %T Predicting Patient Mortality for Earlier Palliative Care Identification in Medicare Advantage Plans: Features of a Machine Learning Model %A Bowers,Anne %A Drake,Chelsea %A Makarkin,Alexi E %A Monzyk,Robert %A Maity,Biswajit %A Telle,Andrew %+ Evernorth Health, Inc, One Express Way, St. Louis, MO, 63121, United States, 1 860 810 6523, anne.bowers@evernorth.com %K palliative %K palliative care %K machine learning %K social determinants %K Medicare Advantage %K Medicare %K predict %K algorithm %K mortality %K older adult %D 2023 %7 20.2.2023 %9 Original Paper %J JMIR AI %G English %X Background: Machine learning (ML) can offer greater precision and sensitivity in predicting Medicare patient end of life and potential need for palliative services compared to provider recommendations alone. However, earlier ML research on older community dwelling Medicare beneficiaries has provided insufficient exploration of key model feature impacts and the role of the social determinants of health. Objective: This study describes the development of a binary classification ML model predicting 1-year mortality among Medicare Advantage plan members aged ≥65 years (N=318,774) and further examines the top features of the predictive model. Methods: A light gradient-boosted trees model configuration was selected based on 5-fold cross-validation. The model was trained with 80% of cases (n=255,020) using randomized feature generation periods, with 20% (n=63,754) reserved as a holdout for validation. The final algorithm used 907 feature inputs extracted primarily from claims and administrative data capturing patient diagnoses, service utilization, demographics, and census tract–based social determinants index measures. Results: The total sample had an actual mortality prevalence of 3.9% in the 2018 outcome period. The final model correctly predicted 44.2% of patient expirations among the top 1% of highest risk members (AUC=0.84; 95% CI 0.83-0.85) versus 24.0% predicted by the model iteration using only age, gender, and select high-risk utilization features (AUC=0.74; 95% CI 0.73-0.74). The most important algorithm features included patient demographics, diagnoses, pharmacy utilization, mean costs, and certain social determinants of health. Conclusions: The final ML model better predicts Medicare Advantage member end of life using a variety of routinely collected data and supports earlier patient identification for palliative care. %M 38875557 %R 10.2196/42253 %U https://ai.jmir.org/2023/1/e42253 %U https://doi.org/10.2196/42253 %U http://www.ncbi.nlm.nih.gov/pubmed/38875557 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e38439 %T Nighttime Continuous Contactless Smartphone-Based Cough Monitoring for the Ward: Validation Study %A Barata,Filipe %A Cleres,David %A Tinschert,Peter %A Iris Shih,Chen-Hsuan %A Rassouli,Frank %A Boesch,Maximilian %A Brutsche,Martin %A Fleisch,Elgar %+ Center for Digital Health Interventions, Department of Management, Technology, and Economics, ETH Zurich, Weinbergstrasse 56/58, Zurich, 8092, Switzerland, 41 44 632 35 0, fbarata@ethz.ch %K cough monitoring %K ward monitoring %K mobile sensing %K machine learning %K convolutional neural network %K COVID-19 %K mobile phone %D 2023 %7 20.2.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Clinical deterioration can go unnoticed in hospital wards for hours. Mobile technologies such as wearables and smartphones enable automated, continuous, noninvasive ward monitoring and allow the detection of subtle changes in vital signs. Cough can be effectively monitored through mobile technologies in the ward, as it is not only a symptom of prevalent respiratory diseases such as asthma, lung cancer, and COVID-19 but also a predictor of acute health deterioration. In past decades, many efforts have been made to develop an automatic cough counting tool. To date, however, there is neither a standardized, sufficiently validated method nor a scalable cough monitor that can be deployed on a consumer-centric device that reports cough counts continuously. These shortcomings limit the tracking of coughing and, consequently, hinder the monitoring of disease progression in prevalent respiratory diseases such as asthma, chronic obstructive pulmonary disease, and COVID-19 in the ward. Objective: This exploratory study involved the validation of an automated smartphone-based monitoring system for continuous cough counting in 2 different modes in the ward. Unlike previous studies that focused on evaluating cough detection models on unseen data, the focus of this work is to validate a holistic smartphone-based cough detection system operating in near real time. Methods: Automated cough counts were measured consistently on devices and on computers and compared with cough and noncough sounds counted manually over 8-hour long nocturnal recordings in 9 patients with pneumonia in the ward. The proposed cough detection system consists primarily of an Android app running on a smartphone that detects coughs and records sounds and secondarily of a backend that continuously receives the cough detection information and displays the hourly cough counts. Cough detection is based on an ensemble convolutional neural network developed and trained on asthmatic cough data. Results: In this validation study, a total of 72 hours of recordings from 9 participants with pneumonia, 4 of whom were infected with SARS-CoV-2, were analyzed. All the recordings were subjected to manual analysis by 2 blinded raters. The proposed system yielded a sensitivity and specificity of 72% and 99% on the device and 82% and 99% on the computer, respectively, for detecting coughs. The mean differences between the automated and human rater cough counts were −1.0 (95% CI −12.3 to 10.2) and −0.9 (95% CI −6.5 to 4.8) coughs per hour within subject for the on-device and on-computer modes, respectively. Conclusions: The proposed system thus represents a smartphone cough counter that can be used for continuous hourly assessment of cough frequency in the ward. %M 36655551 %R 10.2196/38439 %U https://formative.jmir.org/2023/1/e38439 %U https://doi.org/10.2196/38439 %U http://www.ncbi.nlm.nih.gov/pubmed/36655551 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e42717 %T Deep Learning With Chest Radiographs for Making Prognoses in Patients With COVID-19: Retrospective Cohort Study %A Lee,Hyun Woo %A Yang,Hyun Jun %A Kim,Hyungjin %A Kim,Ue-Hwan %A Kim,Dong Hyun %A Yoon,Soon Ho %A Ham,Soo-Youn %A Nam,Bo Da %A Chae,Kum Ju %A Lee,Dabee %A Yoo,Jin Young %A Bak,So Hyeon %A Kim,Jin Young %A Kim,Jin Hwan %A Kim,Ki Beom %A Jung,Jung Im %A Lim,Jae-Kwang %A Lee,Jong Eun %A Chung,Myung Jin %A Lee,Young Kyung %A Kim,Young Seon %A Lee,Sang Min %A Kwon,Woocheol %A Park,Chang Min %A Kim,Yun-Hyeon %A Jeong,Yeon Joo %A Jin,Kwang Nam %A Goo,Jin Mo %+ Department of Radiology, Seoul Metropolitan Government-Seoul National University Boramae Medical Center, 20, Boramae-ro 5-gil, Dongjak-gu, Seoul, 07061, Republic of Korea, 82 2 870 2536, wlsrhkdska@gmail.com %K COVID-19 %K deep learning %K artificial intelligence %K radiography, thoracic %K prognosis %K AI model %K prediction model %K clinical outcome %K medical imaging %K machine learning %D 2023 %7 16.2.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: An artificial intelligence (AI) model using chest radiography (CXR) may provide good performance in making prognoses for COVID-19. Objective: We aimed to develop and validate a prediction model using CXR based on an AI model and clinical variables to predict clinical outcomes in patients with COVID-19. Methods: This retrospective longitudinal study included patients hospitalized for COVID-19 at multiple COVID-19 medical centers between February 2020 and October 2020. Patients at Boramae Medical Center were randomly classified into training, validation, and internal testing sets (at a ratio of 8:1:1, respectively). An AI model using initial CXR images as input, a logistic regression model using clinical information, and a combined model using the output of the AI model (as CXR score) and clinical information were developed and trained to predict hospital length of stay (LOS) ≤2 weeks, need for oxygen supplementation, and acute respiratory distress syndrome (ARDS). The models were externally validated in the Korean Imaging Cohort of COVID-19 data set for discrimination and calibration. Results: The AI model using CXR and the logistic regression model using clinical variables were suboptimal to predict hospital LOS ≤2 weeks or the need for oxygen supplementation but performed acceptably in the prediction of ARDS (AI model area under the curve [AUC] 0.782, 95% CI 0.720-0.845; logistic regression model AUC 0.878, 95% CI 0.838-0.919). The combined model performed better in predicting the need for oxygen supplementation (AUC 0.704, 95% CI 0.646-0.762) and ARDS (AUC 0.890, 95% CI 0.853-0.928) compared to the CXR score alone. Both the AI and combined models showed good calibration for predicting ARDS (P=.079 and P=.859). Conclusions: The combined prediction model, comprising the CXR score and clinical information, was externally validated as having acceptable performance in predicting severe illness and excellent performance in predicting ARDS in patients with COVID-19. %M 36795468 %R 10.2196/42717 %U https://www.jmir.org/2023/1/e42717 %U https://doi.org/10.2196/42717 %U http://www.ncbi.nlm.nih.gov/pubmed/36795468 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e37685 %T The Need to Prioritize Model-Updating Processes in Clinical Artificial Intelligence (AI) Models: Protocol for a Scoping Review %A Otokiti,Ahmed Umar %A Ozoude,Makuochukwu Maryann %A Williams,Karmen S %A Sadiq-onilenla,Rasheedat A %A Ojo,Soji Akin %A Wasarme,Leyla B %A Walsh,Samantha %A Edomwande,Maxwell %+ Digital Health Solutions, LLC, 455 Tarrytown Road, Suite 1181, White Plains, NY, 10607, United States, 1 7188241878, ahmedotoks@yahoo.com %K model updating %K model calibration %K artificial intelligence %K machine learning %K direct clinical care %D 2023 %7 16.2.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: With an increase in the number of artificial intelligence (AI) and machine learning (ML) algorithms available for clinical settings, appropriate model updating and implementation of updates are imperative to ensure applicability, reproducibility, and patient safety. Objective: The objective of this scoping review was to evaluate and assess the model-updating practices of AI and ML clinical models that are used in direct patient-provider clinical decision-making. Methods: We used the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) checklist and the PRISMA-P protocol guidance in addition to a modified CHARMS (Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies) checklist to conduct this scoping review. A comprehensive medical literature search of databases, including Embase, MEDLINE, PsycINFO, Cochrane, Scopus, and Web of Science, was conducted to identify AI and ML algorithms that would impact clinical decision-making at the level of direct patient care. Our primary end point is the rate at which model updating is recommended by published algorithms; we will also conduct an assessment of study quality and risk of bias in all publications reviewed. In addition, we will evaluate the rate at which published algorithms include ethnic and gender demographic distribution information in their training data as a secondary end point. Results: Our initial literature search yielded approximately 13,693 articles, with approximately 7810 articles to consider for full reviews among our team of 7 reviewers. We plan to complete the review process and disseminate the results by spring of 2023. Conclusions: Although AI and ML applications in health care have the potential to improve patient care by reducing errors between measurement and model output, currently there exists more hype than hope because of the lack of proper external validation of these models. We expect to find that the AI and ML model-updating methods are proxies for model applicability and generalizability on implementation. Our findings will add to the field by determining the degree to which published models meet the criteria for clinical validity, real-life implementation, and best practices to optimize model development, and in so doing, reduce the overpromise and underachievement of the contemporary model development process. International Registered Report Identifier (IRRID): PRR1-10.2196/37685 %M 36795464 %R 10.2196/37685 %U https://www.researchprotocols.org/2023/1/e37685 %U https://doi.org/10.2196/37685 %U http://www.ncbi.nlm.nih.gov/pubmed/36795464 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e44238 %T Deep-Learning Model for Influenza Prediction From Multisource Heterogeneous Data in a Megacity: Model Development and Evaluation %A Yang,Liuyang %A Li,Gang %A Yang,Jin %A Zhang,Ting %A Du,Jing %A Liu,Tian %A Zhang,Xingxing %A Han,Xuan %A Li,Wei %A Ma,Libing %A Feng,Luzhao %A Yang,Weizhong %+ School of Population Medicine and Public Health, Chinese Academy of Medical Sciences & Peking Union Medical College, No. 9, Dongdan Santiao, Dongcheng District, Beijing, Beijing, 100730, China, 86 1 391 181 9068, yangweizhong@cams.cn %K influenza %K ILI %K multisource heterogeneous data %K deep learning %K MAL model %K megacity %D 2023 %7 13.2.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: In megacities, there is an urgent need to establish more sensitive forecasting and early warning methods for acute respiratory infectious diseases. Existing prediction and early warning models for influenza and other acute respiratory infectious diseases have limitations and therefore there is room for improvement. Objective: The aim of this study was to explore a new and better-performing deep-learning model to predict influenza trends from multisource heterogeneous data in a megacity. Methods: We collected multisource heterogeneous data from the 26th week of 2012 to the 25th week of 2019, including influenza-like illness (ILI) cases and virological surveillance, data of climate and demography, and search engines data. To avoid collinearity, we selected the best predictor according to the weight and correlation of each factor. We established a new multiattention-long short-term memory (LSTM) deep-learning model (MAL model), which was used to predict the percentage of ILI (ILI%) cases and the product of ILI% and the influenza-positive rate (ILI%×positive%), respectively. We also combined the data in different forms and added several machine-learning and deep-learning models commonly used in the past to predict influenza trends for comparison. The R2 value, explained variance scores, mean absolute error, and mean square error were used to evaluate the quality of the models. Results: The highest correlation coefficients were found for the Baidu search data for ILI% and for air quality for ILI%×positive%. We first used the MAL model to calculate the ILI%, and then combined ILI% with climate, demographic, and Baidu data in different forms. The ILI%+climate+demography+Baidu model had the best prediction effect, with the explained variance score reaching 0.78, R2 reaching 0.76, mean absolute error of 0.08, and mean squared error of 0.01. Similarly, we used the MAL model to calculate the ILI%×positive% and combined this prediction with different data forms. The ILI%×positive%+climate+demography+Baidu model had the best prediction effect, with an explained variance score reaching 0.74, R2 reaching 0.70, mean absolute error of 0.02, and mean squared error of 0.02. Comparisons with random forest, extreme gradient boosting, LSTM, and gated current unit models showed that the MAL model had the best prediction effect. Conclusions: The newly established MAL model outperformed existing models. Natural factors and search engine query data were more helpful in forecasting ILI patterns in megacities. With more timely and effective prediction of influenza and other respiratory infectious diseases and the epidemic intensity, early and better preparedness can be achieved to reduce the health damage to the population. %M 36780207 %R 10.2196/44238 %U https://www.jmir.org/2023/1/e44238 %U https://doi.org/10.2196/44238 %U http://www.ncbi.nlm.nih.gov/pubmed/36780207 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e41992 %T The Clinical Suitability of an Artificial Intelligence–Enabled Pain Assessment Tool for Use in Infants: Feasibility and Usability Evaluation Study %A Hughes,Jeffery David %A Chivers,Paola %A Hoti,Kreshnik %+ Faculty of Medicine, University of Prishtina, 31 George Bush St, Prishtina, 10000, Kosovo, 383 44945173, kreshnik.hoti@uni-pr.edu %K pain assessment %K clinical utility %K sensitivity %K specificity %K immunization %K accuracy %K precision %K PainChek Infant %K infant %K newborn %K baby %K babies %K pain %K facial %K artificial intelligence %K machine learning %K model %K detection %K assessment %D 2023 %7 13.2.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Infants are unable to self-report their pain, which, therefore, often goes underrecognized and undertreated. Adequate assessment of pain, including procedural pain, which has short- and long-term consequences, is critical for its management. The introduction of mobile health–based (mHealth) pain assessment tools could address current challenges and is an area requiring further research. Objective: The purpose of this study is to evaluate the accuracy and feasibility aspects of PainChek Infant and, therefore, assess its applicability in the intended setting. Methods: By observing infants just before, during, and after immunization, we evaluated the accuracy and precision at different cutoff scores of PainChek Infant, which is a point-of-care mHealth–based solution that uses artificial intelligence to detect pain and intensity based solely on facial expression. We used receiver operator characteristic analysis to assess interpretability and establish a cutoff score. Clinician comprehensibility was evaluated using a standardized questionnaire. Other feasibility aspects were evaluated based on comparison with currently available observational pain assessment tools for use in infants with procedural pain. Results: Both PainChek Infant Standard and Adaptive modes demonstrated high accuracy (area under the curve 0.964 and 0.966, respectively). At a cutoff score of ≥2, accuracy and precision were 0.908 and 0.912 for Standard and 0.912 and 0.897 for Adaptive modes, respectively. Currently available data allowed evaluation of 16 of the 17 feasibility aspects, with only the cost of the outcome measurement instrument unable to be evaluated since it is yet to be determined. PainChek Infant performed well across feasibility aspects, including interpretability (cutoff score defined), ease of administration, completion time (3 seconds), and clinician comprehensibility. Conclusions: This work provides information on the feasibility of using PainChek Infant in clinical practice for procedural pain assessment and monitoring, and demonstrates the accuracy and precision of the tool at the defined cutoff score. %M 36780223 %R 10.2196/41992 %U https://www.jmir.org/2023/1/e41992 %U https://doi.org/10.2196/41992 %U http://www.ncbi.nlm.nih.gov/pubmed/36780223 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e42665 %T The Impacts of Computer-Aided Detection of Colorectal Polyps on Subsequent Colonoscopy Surveillance Intervals: Simulation Study %A Lui,Ka Luen Thomas %A Liu,Sze Hang Kevin %A Leung,Kathy %A Wu,Joseph T %A Zauber,Ann G %A Leung,Wai Keung %+ Department of Medicine, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, 102 Pokfulam Road, Hong Kong, Hong Kong, 852 22553348, waikleung@hku.hk %K artificial intelligence %K surveillance colonoscopy %K colonic polyp %K polyp %K colonoscopy %K computer-aided %K detect %K adenoma %K endoscopic %K endoscopy %K simulation %K simulated %K surveillance %D 2023 %7 10.2.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Computer-aided detection (CADe) of colorectal polyps has been shown to increase adenoma detection rates, which would potentially shorten subsequent surveillance intervals. Objective: The purpose of this study is to simulate the potential changes in subsequent colonoscopy surveillance intervals after the application of CADe in a large cohort of patients. Methods: We simulated the projected increase in polyp and adenoma detection by universal CADe application in our patients who had undergone colonoscopy with complete endoscopic and histological findings between 2016 and 2020. The simulation was based on bootstrapping the published performance of CADe. The corresponding changes in surveillance intervals for each patient, as recommended by the US Multi-Society Task Force on Colorectal Cancer (USMSTF) or the European Society of Gastrointestinal Endoscopy (ESGE), were determined after the CADe was determined. Results: A total of 3735 patients who had undergone colonoscopy were included. Based on the simulated CADe effect, the application of CADe would result in 19.1% (n=714) and 1.9% (n=71) of patients having shorter surveillance intervals, according to the USMSTF and ESGE guidelines, respectively. In particular, all (or 2.7% (n=101) of the total) patients who were originally scheduled to have 3-5 years of surveillance would have their surveillance intervals shortened to 3 years, following the USMSTF guidelines. The changes in this group of patients were largely attributed to an increase in the number of adenomas (n=75, 74%) rather than serrated lesions being detected. Conclusions: Widespread adoption of CADe would inevitably increase the demand for surveillance colonoscopies with the shortening of original surveillance intervals, particularly following the current USMSTF guideline. %M 36763451 %R 10.2196/42665 %U https://www.jmir.org/2023/1/e42665 %U https://doi.org/10.2196/42665 %U http://www.ncbi.nlm.nih.gov/pubmed/36763451 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e40211 %T Automated Sleep Stages Classification Using Convolutional Neural Network From Raw and Time-Frequency Electroencephalogram Signals: Systematic Evaluation Study %A Haghayegh,Shahab %A Hu,Kun %A Stone,Katie %A Redline,Susan %A Schernhammer,Eva %+ Harvard Medical School, Channing Division of Network Medicine, 181 Longwood Avenue, Boston, MA, 02115, United States, 1 5129543436, shaghayegh@bwh.harvard.edu %K SleepInceptionNet %K polysomnography %K PSG %K electroencephalogram %K EEG %K spectrogram %K scalogram %K short-time Fourier transform %K continuous wavelet transform %K Welch power spectral density %K LeNet %K ResNet %K Alex Net %K inception %K convolutional neural network %D 2023 %7 10.2.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Most existing automated sleep staging methods rely on multimodal data, and scoring a specific epoch requires not only the current epoch but also a sequence of consecutive epochs that precede and follow the epoch. Objective: We proposed and tested a convolutional neural network called SleepInceptionNet, which allows sleep classification of a single epoch using a single-channel electroencephalogram (EEG). Methods: SleepInceptionNet is based on our systematic evaluation of the effects of different EEG preprocessing methods, EEG channels, and convolutional neural networks on automatic sleep staging performance. The evaluation was performed using polysomnography data of 883 participants (937,975 thirty-second epochs). Raw data of individual EEG channels (ie, frontal, central, and occipital) and 3 specific transformations of the data, including power spectral density, continuous wavelet transform, and short-time Fourier transform, were used separately as the inputs of the convolutional neural network models. To classify sleep stages, 7 sequential deep neural networks were tested for the 1D data (ie, raw EEG and power spectral density), and 16 image classifier convolutional neural networks were tested for the 2D data (ie, continuous wavelet transform and short-time Fourier transform time-frequency images). Results: The best model, SleepInceptionNet, which uses time-frequency images developed by the continuous wavelet transform method from central single-channel EEG data as input to the InceptionV3 image classifier algorithm, achieved a Cohen κ agreement of 0.705 (SD 0.077) in reference to the gold standard polysomnography. Conclusions: SleepInceptionNet may allow real-time automated sleep staging in free-living conditions using a single-channel EEG, which may be useful for on-demand intervention or treatment during specific sleep stages. %M 36763454 %R 10.2196/40211 %U https://www.jmir.org/2023/1/e40211 %U https://doi.org/10.2196/40211 %U http://www.ncbi.nlm.nih.gov/pubmed/36763454 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 9 %N %P e45312 %T How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment %A Gilson,Aidan %A Safranek,Conrad W %A Huang,Thomas %A Socrates,Vimig %A Chi,Ling %A Taylor,Richard Andrew %A Chartash,David %+ Section for Biomedical Informatics and Data Science, Yale University School of Medicine, 300 George Street, Suite 501, New Haven, CT, 06511, United States, 1 203 737 5379, david.chartash@yale.edu %K natural language processing %K NLP %K MedQA %K generative pre-trained transformer %K GPT %K medical education %K chatbot %K artificial intelligence %K education technology %K ChatGPT %K conversational agent %K machine learning %K USMLE %D 2023 %7 8.2.2023 %9 Original Paper %J JMIR Med Educ %G English %X Background: Chat Generative Pre-trained Transformer (ChatGPT) is a 175-billion-parameter natural language processing model that can generate conversation-style responses to user input. Objective: This study aimed to evaluate the performance of ChatGPT on questions within the scope of the United States Medical Licensing Examination (USMLE) Step 1 and Step 2 exams, as well as to analyze responses for user interpretability. Methods: We used 2 sets of multiple-choice questions to evaluate ChatGPT’s performance, each with questions pertaining to Step 1 and Step 2. The first set was derived from AMBOSS, a commonly used question bank for medical students, which also provides statistics on question difficulty and the performance on an exam relative to the user base. The second set was the National Board of Medical Examiners (NBME) free 120 questions. ChatGPT’s performance was compared to 2 other large language models, GPT-3 and InstructGPT. The text output of each ChatGPT response was evaluated across 3 qualitative metrics: logical justification of the answer selected, presence of information internal to the question, and presence of information external to the question. Results: Of the 4 data sets, AMBOSS-Step1, AMBOSS-Step2, NBME-Free-Step1, and NBME-Free-Step2, ChatGPT achieved accuracies of 44% (44/100), 42% (42/100), 64.4% (56/87), and 57.8% (59/102), respectively. ChatGPT outperformed InstructGPT by 8.15% on average across all data sets, and GPT-3 performed similarly to random chance. The model demonstrated a significant decrease in performance as question difficulty increased (P=.01) within the AMBOSS-Step1 data set. We found that logical justification for ChatGPT’s answer selection was present in 100% of outputs of the NBME data sets. Internal information to the question was present in 96.8% (183/189) of all questions. The presence of information external to the question was 44.5% and 27% lower for incorrect answers relative to correct answers on the NBME-Free-Step1 (P<.001) and NBME-Free-Step2 (P=.001) data sets, respectively. Conclusions: ChatGPT marks a significant improvement in natural language processing models on the tasks of medical question answering. By performing at a greater than 60% threshold on the NBME-Free-Step-1 data set, we show that the model achieves the equivalent of a passing score for a third-year medical student. Additionally, we highlight ChatGPT’s capacity to provide logic and informational context across the majority of answers. These facts taken together make a compelling case for the potential applications of ChatGPT as an interactive medical education tool to support learning. %M 36753318 %R 10.2196/45312 %U https://mededu.jmir.org/2023/1/e45312 %U https://doi.org/10.2196/45312 %U http://www.ncbi.nlm.nih.gov/pubmed/36753318 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e43007 %T Voice Assistants’ Responses to Questions About the COVID-19 Vaccine: National Cross-sectional Study %A Sossenheimer,Philip %A Hong,Grace %A Devon-Sand,Anna %A Lin,Steven %+ Department of Medicine, Stanford University School of Medicine, 211 Quarry Road, Suite 405, Palo Alto, CA, 94304, United States, 1 650 725 7966, stevenlin@stanford.edu %K artificial intelligence %K mHealth %K misinformation %K public health %K vaccination hesitancy %K vaccination %K online %K COVID-19 %K public health %K information %K users %K smartphone %K mobile phone %D 2023 %7 8.2.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Artificial intelligence-powered voice assistants (VAs), such as Apple Siri, Google Assistant, and Amazon Alexa, interact with users in natural language and are capable of responding to simple commands, searching the internet, and answering questions. Despite being an increasingly popular way for the public to access health information, VAs could be a source of ambiguous or potentially biased information. Objective: In response to the ongoing prevalence of vaccine misinformation and disinformation, this study aims to evaluate how smartphone VAs respond to information- and recommendation-seeking inquiries regarding the COVID-19 vaccine. Methods: A national cross-sectional survey of English-speaking adults who owned a smartphone with a VA installed was conducted online from April 22 to 28, 2021. The primary outcomes were the VAs’ responses to 2 questions: “Should I get the COVID vaccine?” and “Is the COVID vaccine safe?” Directed content analysis was used to assign a negative, neutral, or positive connotation to each response and website title provided by the VAs. Statistical significance was assessed using the t test (parametric) or Mann-Whitney U (nonparametric) test for continuous variables and the chi-square or Fisher exact test for categorical variables. Results: Of the 466 survey respondents included in the final analysis, 404 (86.7%) used Apple Siri, 53 (11.4%) used Google Assistant, and 9 (1.9%) used Amazon Alexa. In response to the question “Is the COVID vaccine safe?” 419 (89.9%) users received a direct response, of which 408 (97.3%) had a positive connotation encouraging users to get vaccinated. Of the websites presented, only 5.3% (11/207) had a positive connotation and 94.7% (196/207) had a neutral connotation. In response to the question “Should I get the COVID vaccine?” 93.1% (434/466) of users received a list of websites, of which 91.5% (1155/1262) had a neutral connotation. For both COVID-19 vaccine–related questions, there was no association between the connotation of a response and the age, gender, zip code, race or ethnicity, and education level of the respondent. Conclusions: Our study found that VAs were much more likely to respond directly with positive connotations to the question “Is the COVID vaccine safe?” but not respond directly and provide a list of websites with neutral connotations to the question “Should I get the COVID vaccine?” To our knowledge, this is the first study to evaluate how VAs respond to both information- and recommendation-seeking inquiries regarding the COVID-19 vaccine. These findings add to our growing understanding of both the opportunities and pitfalls of VAs in supporting public health information dissemination. %M 36719815 %R 10.2196/43007 %U https://formative.jmir.org/2023/1/e43007 %U https://doi.org/10.2196/43007 %U http://www.ncbi.nlm.nih.gov/pubmed/36719815 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e43734 %T Explainable Machine Learning Techniques To Predict Amiodarone-Induced Thyroid Dysfunction Risk: Multicenter, Retrospective Study With External Validation %A Lu,Ya-Ting %A Chao,Horng-Jiun %A Chiang,Yi-Chun %A Chen,Hsiang-Yin %+ Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, R714, 7th Floor, Health and Science Building No.250, Wuxing Street, Xinyi Distict, Taipei, 110, Taiwan, 886 2 2736 1661 ext 6175, shawn@tmu.edu.tw %K amiodarone %K thyroid dysfunction %K machine learning %K oversampling %K extreme gradient boosting %K adverse effect %K resampling %K thyroid %K predict %K risk %D 2023 %7 7.2.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Machine learning offers new solutions for predicting life-threatening, unpredictable amiodarone-induced thyroid dysfunction. Traditional regression approaches for adverse-effect prediction without time-series consideration of features have yielded suboptimal predictions. Machine learning algorithms with multiple data sets at different time points may generate better performance in predicting adverse effects. Objective: We aimed to develop and validate machine learning models for forecasting individualized amiodarone-induced thyroid dysfunction risk and to optimize a machine learning–based risk stratification scheme with a resampling method and readjustment of the clinically derived decision thresholds. Methods: This study developed machine learning models using multicenter, delinked electronic health records. It included patients receiving amiodarone from January 2013 to December 2017. The training set was composed of data from Taipei Medical University Hospital and Wan Fang Hospital, while data from Taipei Medical University Shuang Ho Hospital were used as the external test set. The study collected stationary features at baseline and dynamic features at the first, second, third, sixth, ninth, 12th, 15th, 18th, and 21st months after amiodarone initiation. We used 16 machine learning models, including extreme gradient boosting, adaptive boosting, k-nearest neighbor, and logistic regression models, along with an original resampling method and 3 other resampling methods, including oversampling with the borderline-synthesized minority oversampling technique, undersampling–edited nearest neighbor, and over- and undersampling hybrid methods. The model performance was compared based on accuracy; Precision, recall, F1-score, geometric mean, area under the curve of the receiver operating characteristic curve (AUROC), and the area under the precision-recall curve (AUPRC). Feature importance was determined by the best model. The decision threshold was readjusted to identify the best cutoff value and a Kaplan-Meier survival analysis was performed. Results: The training set contained 4075 patients from Taipei Medical University Hospital and Wan Fang Hospital, of whom 583 (14.3%) developed amiodarone-induced thyroid dysfunction, while the external test set included 2422 patients from Taipei Medical University Shuang Ho Hospital, of whom 275 (11.4%) developed amiodarone-induced thyroid dysfunction. The extreme gradient boosting oversampling machine learning model demonstrated the best predictive outcomes among all 16 models. The accuracy; Precision, recall, F1-score, G-mean, AUPRC, and AUROC were 0.923, 0.632, 0.756, 0.688, 0.845, 0.751, and 0.934, respectively. After readjusting the cutoff, the best value was 0.627, and the F1-score reached 0.699. The best threshold was able to classify 286 of 2422 patients (11.8%) as high-risk subjects, among which 275 were true-positive patients in the testing set. A shorter treatment duration; higher levels of thyroid-stimulating hormone and high-density lipoprotein cholesterol; and lower levels of free thyroxin, alkaline phosphatase, and low-density lipoprotein were the most important features. Conclusions: Machine learning models combined with resampling methods can predict amiodarone-induced thyroid dysfunction and serve as a support tool for individualized risk prediction and clinical decision support. %M 36749620 %R 10.2196/43734 %U https://www.jmir.org/2023/1/e43734 %U https://doi.org/10.2196/43734 %U http://www.ncbi.nlm.nih.gov/pubmed/36749620 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e42936 %T Strategies to Improve the Impact of Artificial Intelligence on Health Equity: Scoping Review %A Berdahl,Carl Thomas %A Baker,Lawrence %A Mann,Sean %A Osoba,Osonde %A Girosi,Federico %+ RAND Corporation, 1776 Main Street, Santa Monica, CA, 90401, United States, 1 3104233091, cberdahl@rand.org %K artificial intelligence %K machine learning %K health equity %K health care disparities %K algorithmic bias %K social determinants of health %K decision making %K algorithms %K gray literature %K equity %K health data %D 2023 %7 7.2.2023 %9 Review %J JMIR AI %G English %X Background: Emerging artificial intelligence (AI) applications have the potential to improve health, but they may also perpetuate or exacerbate inequities. Objective: This review aims to provide a comprehensive overview of the health equity issues related to the use of AI applications and identify strategies proposed to address them. Methods: We searched PubMed, Web of Science, the IEEE (Institute of Electrical and Electronics Engineers) Xplore Digital Library, ProQuest U.S. Newsstream, Academic Search Complete, the Food and Drug Administration (FDA) website, and ClinicalTrials.gov to identify academic and gray literature related to AI and health equity that were published between 2014 and 2021 and additional literature related to AI and health equity during the COVID-19 pandemic from 2020 and 2021. Literature was eligible for inclusion in our review if it identified at least one equity issue and a corresponding strategy to address it. To organize and synthesize equity issues, we adopted a 4-step AI application framework: Background Context, Data Characteristics, Model Design, and Deployment. We then created a many-to-many mapping of the links between issues and strategies. Results: In 660 documents, we identified 18 equity issues and 15 strategies to address them. Equity issues related to Data Characteristics and Model Design were the most common. The most common strategies recommended to improve equity were improving the quantity and quality of data, evaluating the disparities introduced by an application, increasing model reporting and transparency, involving the broader community in AI application development, and improving governance. Conclusions: Stakeholders should review our many-to-many mapping of equity issues and strategies when planning, developing, and implementing AI applications in health care so that they can make appropriate plans to ensure equity for populations affected by their products. AI application developers should consider adopting equity-focused checklists, and regulators such as the FDA should consider requiring them. Given that our review was limited to documents published online, developers may have unpublished knowledge of additional issues and strategies that we were unable to identify. %M 38875587 %R 10.2196/42936 %U https://ai.jmir.org/2023/1/e42936 %U https://doi.org/10.2196/42936 %U http://www.ncbi.nlm.nih.gov/pubmed/38875587 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 10 %N %P e42045 %T Methodological and Quality Flaws in the Use of Artificial Intelligence in Mental Health Research: Systematic Review %A Tornero-Costa,Roberto %A Martinez-Millana,Antonio %A Azzopardi-Muscat,Natasha %A Lazeri,Ledia %A Traver,Vicente %A Novillo-Ortiz,David %+ Division of Country Health Policies and Systems, World Health Organization, Regional Office for Europe, Marmorej 51, Copenhagen, 2100, Denmark, 45 45 33 7198, dnovillo@who.int %K artificial intelligence %K mental health %K health research %K review methodology %K systematic review %K research methodology %K research quality %K trial methodology %D 2023 %7 2.2.2023 %9 Review %J JMIR Ment Health %G English %X Background: Artificial intelligence (AI) is giving rise to a revolution in medicine and health care. Mental health conditions are highly prevalent in many countries, and the COVID-19 pandemic has increased the risk of further erosion of the mental well-being in the population. Therefore, it is relevant to assess the current status of the application of AI toward mental health research to inform about trends, gaps, opportunities, and challenges. Objective: This study aims to perform a systematic overview of AI applications in mental health in terms of methodologies, data, outcomes, performance, and quality. Methods: A systematic search in PubMed, Scopus, IEEE Xplore, and Cochrane databases was conducted to collect records of use cases of AI for mental health disorder studies from January 2016 to November 2021. Records were screened for eligibility if they were a practical implementation of AI in clinical trials involving mental health conditions. Records of AI study cases were evaluated and categorized by the International Classification of Diseases 11th Revision (ICD-11). Data related to trial settings, collection methodology, features, outcomes, and model development and evaluation were extracted following the CHARMS (Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies) guideline. Further, evaluation of risk of bias is provided. Results: A total of 429 nonduplicated records were retrieved from the databases and 129 were included for a full assessment—18 of which were manually added. The distribution of AI applications in mental health was found unbalanced between ICD-11 mental health categories. Predominant categories were Depressive disorders (n=70) and Schizophrenia or other primary psychotic disorders (n=26). Most interventions were based on randomized controlled trials (n=62), followed by prospective cohorts (n=24) among observational studies. AI was typically applied to evaluate quality of treatments (n=44) or stratify patients into subgroups and clusters (n=31). Models usually applied a combination of questionnaires and scales to assess symptom severity using electronic health records (n=49) as well as medical images (n=33). Quality assessment revealed important flaws in the process of AI application and data preprocessing pipelines. One-third of the studies (n=56) did not report any preprocessing or data preparation. One-fifth of the models were developed by comparing several methods (n=35) without assessing their suitability in advance and a small proportion reported external validation (n=21). Only 1 paper reported a second assessment of a previous AI model. Risk of bias and transparent reporting yielded low scores due to a poor reporting of the strategy for adjusting hyperparameters, coefficients, and the explainability of the models. International collaboration was anecdotal (n=17) and data and developed models mostly remained private (n=126). Conclusions: These significant shortcomings, alongside the lack of information to ensure reproducibility and transparency, are indicative of the challenges that AI in mental health needs to face before contributing to a solid base for knowledge generation and for being a support tool in mental health management. %M 36729567 %R 10.2196/42045 %U https://mental.jmir.org/2023/1/e42045 %U https://doi.org/10.2196/42045 %U http://www.ncbi.nlm.nih.gov/pubmed/36729567 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e42965 %T Assessing the Feasibility of a Text-Based Conversational Agent for Asthma Support: Protocol for a Mixed Methods Observational Study %A Calvo,Rafael A %A Peters,Dorian %A Moradbakhti,Laura %A Cook,Darren %A Rizos,Georgios %A Schuller,Bjoern %A Kallis,Constantinos %A Wong,Ernie %A Quint,Jennifer %+ Dyson School of Design Engineering, Imperial College London, Imperial College Rd, London, SW7 2DB, United Kingdom, 44 020 7594 8888, r.calvo@imperial.ac.uk %K conversational agent %K chatbot %K health %K well-being %K artificial intelligence %K health education %K behavior change %K asthma %D 2023 %7 2.2.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: Despite efforts, the UK death rate from asthma is the highest in Europe, and 65% of people with asthma in the United Kingdom do not receive the professional care they are entitled to. Experts have recommended the use of digital innovations to help address the issues of poor outcomes and lack of care access. An automated SMS text messaging–based conversational agent (ie, chatbot) created to provide access to asthma support in a familiar format via a mobile phone has the potential to help people with asthma across demographics and at scale. Such a chatbot could help improve the accuracy of self-assessed risk, improve asthma self-management, increase access to professional care, and ultimately reduce asthma attacks and emergencies. Objective: The aims of this study are to determine the feasibility and usability of a text-based conversational agent that processes a patient’s text responses and short sample voice recordings to calculate an estimate of their risk for an asthma exacerbation and then offers follow-up information for lowering risk and improving asthma control; assess the levels of engagement for different groups of users, particularly those who do not access professional services and those with poor asthma control; and assess the extent to which users of the chatbot perceive it as helpful for improving their understanding and self-management of their condition. Methods: We will recruit 300 adults through four channels for broad reach: Facebook, YouGov, Asthma + Lung UK social media, and the website Healthily (a health self-management app). Participants will be screened, and those who meet inclusion criteria (adults diagnosed with asthma and who use WhatsApp) will be provided with a link to access the conversational agent through WhatsApp on their mobile phones. Participants will be sent scheduled and randomly timed messages to invite them to engage in dialogue about their asthma risk during the period of study. After a data collection period (28 days), participants will respond to questionnaire items related to the quality of the interaction. A pre- and postquestionnaire will measure asthma control before and after the intervention. Results: This study was funded in March 2021 and started in January 2022. We developed a prototype conversational agent, which was iteratively improved with feedback from people with asthma, asthma nurses, and specialist doctors. Fortnightly reviews of iterations by the clinical team began in September 2022 and are ongoing. This feasibility study will start recruitment in January 2023. The anticipated completion of the study is July 2023. A future randomized controlled trial will depend on the outcomes of this study and funding. Conclusions: This feasibility study will inform a follow-up pilot and larger randomized controlled trial to assess the impact of a conversational agent on asthma outcomes, self-management, behavior change, and access to care. International Registered Report Identifier (IRRID): PRR1-10.2196/42965 %M 36729586 %R 10.2196/42965 %U https://www.researchprotocols.org/2023/1/e42965 %U https://doi.org/10.2196/42965 %U http://www.ncbi.nlm.nih.gov/pubmed/36729586 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 10 %N %P e41017 %T The Effects of a Health Care Chatbot’s Complexity and Persona on User Trust, Perceived Usability, and Effectiveness: Mixed Methods Study %A Biro,Joshua %A Linder,Courtney %A Neyens,David %+ Department of Industrial Engineering, Clemson University, 100 Freeman Hall, Clemson, SC, 29634, United States, 1 8646564719, dneyens@clemson.edu %K electronic health record %K EHR %K health information %K health education %K patient education %K chatbot %K virtual agent %K virtual assistant %K usability %K trust %K adoption %K artificial intelligence %K effectiveness %D 2023 %7 1.2.2023 %9 Original Paper %J JMIR Hum Factors %G English %X Background: The rising adoption of telehealth provides new opportunities for more effective and equitable health care information mediums. The ability of chatbots to provide a conversational, personal, and comprehendible avenue for learning about health care information make them a promising tool for addressing health care inequity as health care trends continue toward web-based and remote processes. Although chatbots have been studied in the health care domain for their efficacy for smoking cessation, diet recommendation, and other assistive applications, few studies have examined how specific design characteristics influence the effectiveness of chatbots in providing health information. Objective: Our objective was to investigate the influence of different design considerations on the effectiveness of an educational health care chatbot. Methods: A 2×3 between-subjects study was performed with 2 independent variables: a chatbot’s complexity of responses (eg, technical or nontechnical language) and the presented qualifications of the chatbot’s persona (eg, doctor, nurse, or nursing student). Regression models were used to evaluate the impact of these variables on 3 outcome measures: effectiveness, usability, and trust. A qualitative transcript review was also done to review how participants engaged with the chatbot. Results: Analysis of 71 participants found that participants who received technical language responses were significantly more likely to be in the high effectiveness group, which had higher improvements in test scores (odds ratio [OR] 2.73, 95% CI 1.05-7.41; P=.04). Participants with higher health literacy (OR 2.04, 95% CI 1.11-4.00, P=.03) were significantly more likely to trust the chatbot. The participants engaged with the chatbot in a variety of ways, with some taking a conversational approach and others treating the chatbot more like a search engine. Conclusions: Given their increasing popularity, it is vital that we consider how chatbots are designed and implemented. This study showed that factors such as chatbots’ persona and language complexity are two design considerations that influence the ability of chatbots to successfully provide health care information. %M 36724004 %R 10.2196/41017 %U https://humanfactors.jmir.org/2023/1/e41017 %U https://doi.org/10.2196/41017 %U http://www.ncbi.nlm.nih.gov/pubmed/36724004 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 9 %N %P e34982 %T Gastroenteritis Forecasting Assessing the Use of Web and Electronic Health Record Data With a Linear and a Nonlinear Approach: Comparison Study %A Poirier,Canelle %A Bouzillé,Guillaume %A Bertaud,Valérie %A Cuggia,Marc %A Santillana,Mauricio %A Lavenu,Audrey %+ Computational Health Informatics Program, Boston Children's Hospital, 300 Longwood Avenue, Boston, MA, 02115, United States, 1 617 355 6000, canelle.poirier@outlook.fr %K infectious disease %K acute gastroenteritis %K modeling %K modeling disease outbreaks %K machine learning %K public health %K machine learning in public health %K forecasting %K digital data %D 2023 %7 31.1.2023 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Disease surveillance systems capable of producing accurate real-time and short-term forecasts can help public health officials design timely public health interventions to mitigate the effects of disease outbreaks in affected populations. In France, existing clinic-based disease surveillance systems produce gastroenteritis activity information that lags real time by 1 to 3 weeks. This temporal data gap prevents public health officials from having a timely epidemiological characterization of this disease at any point in time and thus leads to the design of interventions that do not take into consideration the most recent changes in dynamics. Objective: The goal of this study was to evaluate the feasibility of using internet search query trends and electronic health records to predict acute gastroenteritis (AG) incidence rates in near real time, at the national and regional scales, and for long-term forecasts (up to 10 weeks). Methods: We present 2 different approaches (linear and nonlinear) that produce real-time estimates, short-term forecasts, and long-term forecasts of AG activity at 2 different spatial scales in France (national and regional). Both approaches leverage disparate data sources that include disease-related internet search activity, electronic health record data, and historical disease activity. Results: Our results suggest that all data sources contribute to improving gastroenteritis surveillance for long-term forecasts with the prominent predictive power of historical data owing to the strong seasonal dynamics of this disease. Conclusions: The methods we developed could help reduce the impact of the AG peak by making it possible to anticipate increased activity by up to 10 weeks. %M 36719726 %R 10.2196/34982 %U https://publichealth.jmir.org/2023/1/e34982 %U https://doi.org/10.2196/34982 %U http://www.ncbi.nlm.nih.gov/pubmed/36719726 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e41577 %T Prediction of Next Glucose Measurement in Hospitalized Patients by Comparing Various Regression Methods: Retrospective Cohort Study %A Zale,Andrew D %A Abusamaan,Mohammed S %A McGready,John %A Mathioudakis,Nestoras %+ Division of Endocrinology, Diabetes & Metabolism, Department of Medicine, Johns Hopkins University School of Medicine, 1830 E Monument Street, Baltimore, MD, 21205, United States, 1 667 306 8085, nmathio1@jhmi.edu %K hospital %K glucose %K inpatient %K prediction %K regression %K machine learning %D 2023 %7 31.1.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Continuous glucose monitors have shown great promise in improving outpatient blood glucose (BG) control; however, continuous glucose monitors are not routinely used in hospitals, and glucose management is driven by point-of-care (finger stick) and serum glucose measurements in most patients. Objective: This study aimed to evaluate times series approaches for prediction of inpatient BG using only point-of-care and serum glucose observations. Methods: Our data set included electronic health record data from 184,320 admissions, from patients who received at least one unit of subcutaneous insulin, had at least 4 BG measurements, and were discharged between January 1, 2015, and May 31, 2019, from 5 Johns Hopkins Health System hospitals. A total of 2,436,228 BG observations were included after excluding measurements obtained in quick succession, from patients who received intravenous insulin, or from critically ill patients. After exclusion criteria, 2.85% (3253/113,976), 32.5% (37,045/113,976), and 1.06% (1207/113,976) of admissions had a coded diagnosis of type 1, type 2, and other diabetes, respectively. The outcome of interest was the predicted value of the next BG measurement (mg/dL). Multiple time series predictors were created and analyzed by comparing those predictors and the index BG measurement (sample-and-hold technique) with next BG measurement. The population was classified by glycemic variability based on the coefficient of variation. To compare the performance of different time series predictors among one another, R2, root mean squared error, and Clarke Error Grid were calculated and compared with the next BG measurement. All these time series predictors were then used together in Cubist, linear, random forest, partial least squares, and k-nearest neighbor methods. Results: The median number of BG measurements from 113,976 admissions was 12 (IQR 5-24). The R2 values for the sample-and-hold, 2-hour, 4-hour, 16-hour, and 24-hour moving average were 0.529, 0.504, 0.481, 0.467, and 0.459, respectively. The R2 values for 4-hour moving average based on glycemic variability were 0.680, 0.480, 0.290, and 0.205 for low, medium, high, and very high glucose variability, respectively. The proportion of BG predictions in zone A of the Clarke Error Grid analysis was 61%, 59%, 27%, and 53% for 4-hour moving average, 24-hour moving average, 3 observation rolling regression, and recursive regression predictors, respectively. In a fully adjusted Cubist, linear, random forest, partial least squares, and k-nearest neighbor model, the R2 values were 0.563, 0.526, 0.538, and 0.472, respectively. Conclusions: When analyzing time series predictors independently, increasing variability in a patient’s BG decreased predictive accuracy. Similarly, inclusion of older BG measurements decreased predictive accuracy. These relationships become weaker as glucose variability increases. Machine learning techniques marginally augmented the performance of time series predictors for predicting a patient’s next BG measurement. Further studies should determine the potential of using time series analyses for prediction of inpatient dysglycemia. %M 36719713 %R 10.2196/41577 %U https://formative.jmir.org/2023/1/e41577 %U https://doi.org/10.2196/41577 %U http://www.ncbi.nlm.nih.gov/pubmed/36719713 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e39103 %T The Association Between Comorbidities and Prescribed Drugs in Patients With Suspected Obstructive Sleep Apnea: Inductive Rule Learning Approach %A Ferreira-Santos,Daniela %A Pereira Rodrigues,Pedro %+ Department of Community Medicine, Information and Decision Sciences, Faculty of Medicine, University of Porto, Rua Dr Plácido da Costa, s/n, Porto, 4200-450, Portugal, 351 225513622, danielasantos@med.up.pt %K association rule mining %K drug %K electronic health records %K obstructive sleep apnea %K problem list %K comorbidities %K prescribed drugs %K sleep apnea %K disease-drug associations %K diagnoses %K clinical data %K EHR %D 2023 %7 30.1.2023 %9 Research Letter %J J Med Internet Res %G English %X %M 36716086 %R 10.2196/39103 %U https://www.jmir.org/2023/1/e39103 %U https://doi.org/10.2196/39103 %U http://www.ncbi.nlm.nih.gov/pubmed/36716086 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 10 %N %P e40533 %T Usability and Credibility of a COVID-19 Vaccine Chatbot for Young Adults and Health Workers in the United States: Formative Mixed Methods Study %A Weeks,Rose %A Sangha,Pooja %A Cooper,Lyra %A Sedoc,João %A White,Sydney %A Gretz,Shai %A Toledo,Assaf %A Lahav,Dan %A Hartner,Anna-Maria %A Martin,Nina M %A Lee,Jae Hyoung %A Slonim,Noam %A Bar-Zeev,Naor %+ International Vaccine Access Center, Department of International Health, Johns Hopkins Bloomberg School of Public Health, 415 N Washington Street 5th Floor, Baltimore, MD, 21231, United States, 1 443 287 4832, rweeks@jhu.edu %K COVID-19 %K chatbot development %K risk communication %K vaccine hesitancy %K conversational agent %K health information %K chatbot %K natural language processing %K usability %K user feedback %D 2023 %7 30.1.2023 %9 Original Paper %J JMIR Hum Factors %G English %X Background: The COVID-19 pandemic raised novel challenges in communicating reliable, continually changing health information to a broad and sometimes skeptical public, particularly around COVID-19 vaccines, which, despite being comprehensively studied, were the subject of viral misinformation. Chatbots are a promising technology to reach and engage populations during the pandemic. To inform and communicate effectively with users, chatbots must be highly usable and credible. Objective: We sought to understand how young adults and health workers in the United States assessed the usability and credibility of a web-based chatbot called Vira, created by the Johns Hopkins Bloomberg School of Public Health and IBM Research using natural language processing technology. Using a mixed method approach, we sought to rapidly improve Vira’s user experience to support vaccine decision-making during the peak of the COVID-19 pandemic. Methods: We recruited racially and ethnically diverse young people and health workers, with both groups from urban areas of the United States. We used the validated Chatbot Usability Questionnaire to understand the tool’s navigation, precision, and persona. We also conducted 11 interviews with health workers and young people to understand the user experience, whether they perceived the chatbot as confidential and trustworthy, and how they would use the chatbot. We coded and categorized emerging themes to understand the determining factors for participants’ assessment of chatbot usability and credibility. Results: In all, 58 participants completed a web-based usability questionnaire and 11 completed in-depth interviews. Most questionnaire respondents said the chatbot was “easy to navigate” (51/58, 88%) and “very easy to use” (50/58, 86%), and many (45/58, 78%) said its responses were relevant. The mean Chatbot Usability Questionnaire score was 70.2 (SD 12.1) and scores ranged from 40.6 to 95.3. Interview participants felt the chatbot achieved high usability due to its strong functionality, performance, and perceived confidentiality and that the chatbot could attain high credibility with a redesign of its cartoonish visual persona. Young people said they would use the chatbot to discuss vaccination with hesitant friends or family members, whereas health workers used or anticipated using the chatbot to support community outreach, save time, and stay up to date. Conclusions: This formative study conducted during the pandemic’s peak provided user feedback for an iterative redesign of Vira. Using a mixed method approach provided multidimensional feedback, identifying how the chatbot worked well—being easy to use, answering questions appropriately, and using credible branding—while offering tangible steps to improve the product’s visual design. Future studies should evaluate how chatbots support personal health decision-making, particularly in the context of a public health emergency, and whether such outreach tools can reduce staff burnout. Randomized studies should also be conducted to measure how chatbots countering health misinformation affect user knowledge, attitudes, and behavior. %M 36409300 %R 10.2196/40533 %U https://humanfactors.jmir.org/2023/1/e40533 %U https://doi.org/10.2196/40533 %U http://www.ncbi.nlm.nih.gov/pubmed/36409300 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e38397 %T The Application of Artificial Intelligence in Health Care Resource Allocation Before and During the COVID-19 Pandemic: Scoping Review %A Wu,Hao %A Lu,Xiaoyu %A Wang,Hanyu %+ School of International Studies, Peking University, No 5 Yiheyuan Road, Haidian District, Beijing, 100871, China, 86 13261712766, wang.hanyu@outlook.com %K artificial intelligence %K resource distribution %K health care %K COVID-19 %K health equality %K eHealth %K digital health %D 2023 %7 30.1.2023 %9 Review %J JMIR AI %G English %X Background: Imbalanced health care resource distribution has been central to unequal health outcomes and political tension around the world. Artificial intelligence (AI) has emerged as a promising tool for facilitating resource distribution, especially during emergencies. However, no comprehensive review exists on the use and ethics of AI in health care resource distribution. Objective: This study aims to conduct a scoping review of the application of AI in health care resource distribution, and explore the ethical and political issues in such situations. Methods: A scoping review was conducted following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews). A comprehensive search of relevant literature was conducted in MEDLINE (Ovid), PubMed, Web of Science, and Embase from inception to February 2022. The review included qualitative and quantitative studies investigating the application of AI in health care resource allocation. Results: The review involved 22 articles, including 9 on model development and 13 on theoretical discussions, qualitative studies, or review studies. Of the 9 on model development and validation, 5 were conducted in emerging economies, 3 in developed countries, and 1 in a global context. In terms of content, 4 focused on resource distribution at the health system level and 5 focused on resource allocation at the hospital level. Of the 13 qualitative studies, 8 were discussions on the COVID-19 pandemic and the rest were on hospital resources, outbreaks, screening, human resources, and digitalization. Conclusions: This scoping review synthesized evidence on AI in health resource distribution, focusing on the COVID-19 pandemic. The results suggest that the application of AI has the potential to improve efficacy in resource distribution, especially during emergencies. Efficient data sharing and collecting structures are needed to make reliable and evidence-based decisions. Health inequality, distributive justice, and transparency must be considered when deploying AI models in real-world situations. %M 27917920 %R 10.2196/38397 %U https://ai.jmir.org/2023/1/e38397 %U https://doi.org/10.2196/38397 %U http://www.ncbi.nlm.nih.gov/pubmed/27917920 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e41138 %T Artificial Intelligence in Medicine: Text Mining of Health Care Workers’ Opinions %A Nitiéma,Pascal %+ Department of Information Systems, Arizona State University, 300 E Lemon Street, Tempe, AZ, 85287, United States, 1 602 543 4088, pnitiema@asu.edu %K artificial intelligence in medicine %K artificial intelligence in health care %K artificial intelligence %K AI %K algorithm %K machine learning %K deep learning %K structural topic modeling %D 2023 %7 27.1.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) is being increasingly adopted in the health care industry for administrative tasks, patient care operations, and medical research. Objective: We aimed to examine health care workers’ opinions about the adoption and implementation of AI-powered technology in the health care industry. Methods: Data were comments about AI posted on a web-based forum by 905 health care professionals from at least 77 countries, from May 2013 to October 2021. Structural topic modeling was used to identify the topics of discussion, and hierarchical clustering was performed to determine how these topics cluster into different groups. Results: Overall, 12 topics were identified from the collected comments. These comments clustered into 2 groups: impact of AI on health care system and practice and AI as a tool for disease screening, diagnosis, and treatment. Topics associated with negative sentiments included concerns about AI replacing human workers, impact of AI on traditional medical diagnostic procedures (ie, patient history and physical examination), accuracy of the algorithm, and entry of IT companies into the health care industry. Concerns about the legal liability for using AI in treating patients were also discussed. Positive topics about AI included the opportunity offered by the technology for improving the accuracy of image-based diagnosis and for enhancing personalized medicine. Conclusions: The adoption and implementation of AI applications in the health care industry are eliciting both enthusiasm and concerns about patient care quality and the future of health care professions. The successful implementation of AI-powered technologies requires the involvement of all stakeholders, including patients, health care organization workers, health insurance companies, and government regulatory agencies. %M 36584303 %R 10.2196/41138 %U https://www.jmir.org/2023/1/e41138 %U https://doi.org/10.2196/41138 %U http://www.ncbi.nlm.nih.gov/pubmed/36584303 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e40922 %T User-Chatbot Conversations During the COVID-19 Pandemic: Study Based on Topic Modeling and Sentiment Analysis %A Chin,Hyojin %A Lima,Gabriel %A Shin,Mingi %A Zhunis,Assem %A Cha,Chiyoung %A Choi,Junghoi %A Cha,Meeyoung %+ Data Science Group, Institute for Basic Science, 55, Expo-ro, Yuseong-gu, Daejeon, 34126, Republic of Korea, 82 428788114, meeyoung.cha@gmail.com %K chatbot %K COVID-19 %K topic modeling %K sentiment analysis %K infodemiology %K discourse %K public perception %K public health %K infoveillance %K conversational agent %K global health %K health information %D 2023 %7 27.1.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Chatbots have become a promising tool to support public health initiatives. Despite their potential, little research has examined how individuals interacted with chatbots during the COVID-19 pandemic. Understanding user-chatbot interactions is crucial for developing services that can respond to people’s needs during a global health emergency. Objective: This study examined the COVID-19 pandemic–related topics online users discussed with a commercially available social chatbot and compared the sentiment expressed by users from 5 culturally different countries. Methods: We analyzed 19,782 conversation utterances related to COVID-19 covering 5 countries (the United States, the United Kingdom, Canada, Malaysia, and the Philippines) between 2020 and 2021, from SimSimi, one of the world’s largest open-domain social chatbots. We identified chat topics using natural language processing methods and analyzed their emotional sentiments. Additionally, we compared the topic and sentiment variations in the COVID-19–related chats across countries. Results: Our analysis identified 18 emerging topics, which could be categorized into the following 5 overarching themes: “Questions on COVID-19 asked to the chatbot” (30.6%), “Preventive behaviors” (25.3%), “Outbreak of COVID-19” (16.4%), “Physical and psychological impact of COVID-19” (16.0%), and “People and life in the pandemic” (11.7%). Our data indicated that people considered chatbots as a source of information about the pandemic, for example, by asking health-related questions. Users turned to SimSimi for conversation and emotional messages when offline social interactions became limited during the lockdown period. Users were more likely to express negative sentiments when conversing about topics related to masks, lockdowns, case counts, and their worries about the pandemic. In contrast, small talk with the chatbot was largely accompanied by positive sentiment. We also found cultural differences, with negative words being used more often by users in the United States than by those in Asia when talking about COVID-19. Conclusions: Based on the analysis of user-chatbot interactions on a live platform, this work provides insights into people’s informational and emotional needs during a global health crisis. Users sought health-related information and shared emotional messages with the chatbot, indicating the potential use of chatbots to provide accurate health information and emotional support. Future research can look into different support strategies that align with the direction of public health policy. %M 36596214 %R 10.2196/40922 %U https://www.jmir.org/2023/1/e40922 %U https://doi.org/10.2196/40922 %U http://www.ncbi.nlm.nih.gov/pubmed/36596214 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e35452 %T Development of an Artificial Intelligence–Guided Citizen-Centric Predictive Model for the Uptake of Maternal Health Services Among Pregnant Women Living in Urban Slum Settings in India: Protocol for a Cross-sectional Study With a Mixed Methods Design %A Shrivastava,Rahul %A Singhal,Manmohan %A Gupta,Mansi %A Joshi,Ashish %+ School of Pharmaceutical and Population Health Informatics, Faculty of Pharmacy, DIT University, Mussoorie, Diversion Road, Makka Wala, Dehradun, 248009, India, 91 9926405906, rahul.shrivastavamph14@gmail.com %K citizen centric %K maternal health %K informatics %K predictive model %K artificial intelligence %K development %K evaluation %K machine learning %D 2023 %7 27.1.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: Pregnant women are considered a “high-risk” group with limited access to health facilities in urban slums in India. Barriers to using health services appropriately may lead to maternal and child mortality, morbidity, low birth weight, and children with stunted growth. With the increase in the use of artificial intelligence (AI) and machine learning in the health sector, we plan to develop a predictive model that can enable substantial uptake of maternal health services and improvements in adverse pregnancy health care outcomes from early diagnostics to treatment in urban slum settings. Objective: The objective of our study is to develop and evaluate the AI-guided citizen-centric platform that will support the uptake of maternal health services among pregnant women seeking antenatal care living in urban slum settings. Methods: We will conduct a cross-sectional study using a mixed methods approach to enroll 225 pregnant women aged 18-44 years, living in the urban slums of Delhi for more than 6 months, seeking antenatal care, and who have smartphones. Quantitative and qualitative data will be collected using an Open Data Kit Android-based tool. Variables gathered will include sociodemographics, clinical history, pregnancy history, dietary history, COVID-19 history, health care facility data, socioeconomic status, and pregnancy outcomes. All data gathered will be aggregated into a common database. We will use AI to predict the early at-risk pregnancy outcomes (in terms of the type of delivery method, term, and related complications) depending on the needs of the beneficiaries translating into effective service-delivery improvements in enhancing the use of maternal health services among pregnant women seeking antenatal care. The proposed research will help policy makers to prioritize resource planning, resource allocation, and the development of programs and policies to enhance maternal health outcomes. The academic research study has received ethical approval from the University Research Ethics Committee of Dehradun Institute of Technology (DIT) University, Dehradun, India. Results: The study was approved by the University Research Ethics Committee of DIT University, Dehradun, on July 4, 2021. Enrollment of the eligible participants will begin by April 2022 followed by the development of the predictive model by October 2022 till January 2023. The proposed AI-guided citizen-centric tool will be designed, developed, implemented, and evaluated using principles of human-centered design that will help to predict early at-risk pregnancy outcomes. Conclusions: The proposed internet-enabled AI-guided prediction model will help identify the potential risk associated with pregnancies and enhance the uptake of maternal health services among those seeking antenatal care for safer deliveries. We will explore the scalability of the proposed platform up to different geographic locations for adoption for similar and other health conditions. International Registered Report Identifier (IRRID): PRR1-10.2196/35452 %M 36705968 %R 10.2196/35452 %U https://www.researchprotocols.org/2023/1/e35452 %U https://doi.org/10.2196/35452 %U http://www.ncbi.nlm.nih.gov/pubmed/36705968 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 9 %N %P e41162 %T Predicting Risky Sexual Behavior Among College Students Through Machine Learning Approaches: Cross-sectional Analysis of Individual Data From 1264 Universities in 31 Provinces in China %A Li,Xuan %A Zhang,Hanxiyue %A Zhao,Shuangyu %A Tang,Kun %+ Vanke School of Public Health, Tsinghua University, No 30 Shuangqing Road, Haidian District, Beijing, 100084, China, 86 13671129425, tangk@mail.tsinghua.edu.cn %K risky sexual behavior %K sexually transmitted infections %K college students %K machine learning %K prediction %K students %K risk factor %K STI %K intervention %K China %K sex %D 2023 %7 25.1.2023 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Risky sexual behavior (RSB), the most direct risk factor for sexually transmitted infections (STIs), is common among college students. Thus, identifying relevant risk factors and predicting RSB are important to intervene and prevent RSB among college students. Objective: We aim to establish a predictive model for RSB among college students to facilitate timely intervention and the prevention of RSB to help limit STI contraction. Methods: We included a total of 8794 heterosexual Chinese students who self-reported engaging in sexual intercourse from November 2019 to February 2020. We identified RSB among those students and attributed it to 4 dimensions: whether contraception was used, whether the contraceptive method was safe, whether students engaged in casual sex or sex with multiple partners, and integrated RSB (which combined the first 3 dimensions). Overall, 126 predictors were included in this study, including demographic characteristics, daily habits, physical and mental health, relationship status, sexual knowledge, sexual education, sexual attitude, and previous sexual experience. For each type of RSB, we compared 8 machine learning (ML) models: multiple logistic regression (MLR), naive Bayes (BYS), linear discriminant analysis (LDA), random forest (RF), gradient boosting machine (GBM), extreme gradient boosting (XGBoost), deep learning (DL), and the ensemble model. The optimal model for both RSB prediction and risk factor identification was selected based on a set of validation indicators. An MLR model was applied to investigate the association between RSB and identified risk factors through ML methods. Results: In total, 5328 (60.59%) students were found to have previously engaged in RSB. Among them, 3682 (41.87%) did not use contraception every time they had sexual intercourse, 3602 (40.96%) had previously used an ineffective or unsafe contraceptive method, and 1157 (13.16%) had engaged in casual sex or sex with multiple partners. XGBoost achieved the optimal predictive performance on all 4 types of RSB, with the area under the receiver operator characteristic curve (AUROC) reaching 0.78, 0.72, 0.94, and 0.80 for contraceptive use, safe contraceptive method use, engagement in casual sex or with multiple partners, and integrated RSB, respectively. By ensuring the stability of various validation indicators, the 12 most predictive variables were then selected using XGBoost, including the participants’ relationship status, sexual knowledge, sexual attitude, and previous sexual experience. Through MLR, RSB was found to be significantly associated with less sexual knowledge, more liberal sexual attitudes, single relationship status, and increased sexual experience. Conclusions: RSB is prevalent among college students. The XGBoost model is an effective approach to predict RSB and identify corresponding risk factors. This study presented an opportunity to promote sexual and reproductive health through ML models, which can help targeted interventions aimed at different subgroups and the precise surveillance and prevention of RSB among college students through risk probability prediction. %M 36696166 %R 10.2196/41162 %U https://publichealth.jmir.org/2023/1/e41162 %U https://doi.org/10.2196/41162 %U http://www.ncbi.nlm.nih.gov/pubmed/36696166 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e40565 %T Artificial Intelligence and Precision Health Through Lenses of Ethics and Social Determinants of Health: Protocol for a State-of-the-Art Literature Review %A Wamala-Andersson,Sarah %A Richardson,Matt X %A Landerdahl Stridsberg,Sara %A Ryan,Jillian %A Sukums,Felix %A Goh,Yong-Shian %+ Department of Health and Welfare Technology, School of Health, Care and Social Welfare, Malardalen University, Hamngatan 15, Eskilstuna, 63220, Sweden, 46 0766980150, sarah.wamala.andersson@mdh.se %K artificial intelligence %K clinical outcome %K detection %K diagnosis %K diagnostic %K disease management %K ethical framework %K ethical %K ethics %K health outcome %K health promotion %K literature review %K patient centered %K person centered %K precision health %K precision medicine %K prevention %K review methodology %K search strategy %K social determinant %D 2023 %7 24.1.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background:  Precision health is a rapidly developing field, largely driven by the development of artificial intelligence (AI)–related solutions. AI facilitates complex analysis of numerous health data risk assessment, early detection of disease, and initiation of timely preventative health interventions that can be highly tailored to the individual. Despite such promise, ethical concerns arising from the rapid development and use of AI-related technologies have led to development of national and international frameworks to address responsible use of AI. Objective:  We aimed to address research gaps and provide new knowledge regarding (1) examples of existing AI applications and what role they play regarding precision health, (2) what salient features can be used to categorize them, (3) what evidence exists for their effects on precision health outcomes, (4) how do these AI applications comply with established ethical and responsible framework, and (5) how these AI applications address equity and social determinants of health (SDOH). Methods:  This protocol delineates a state-of-the-art literature review of novel AI-based applications in precision health. Published and unpublished studies were retrieved from 6 electronic databases. Articles included in this study were from the inception of the databases to January 2023. The review will encompass applications that use AI as a primary or supporting system or method when primarily applied for precision health purposes in human populations. It includes any geographical location or setting, including the internet, community-based, and acute or clinical settings, reporting clinical, behavioral, and psychosocial outcomes, including detection-, diagnosis-, promotion-, prevention-, management-, and treatment-related outcomes. Results:   This is step 1 toward a full state-of-the-art literature review with data analyses, results, and discussion of findings, which will also be published. The anticipated consequences on equity from the perspective of SDOH will be analyzed. Keyword cluster relationships and analyses will be visualized to indicate which research foci are leading the development of the field and where research gaps exist. Results will be presented based on the data analysis plan that includes primary analyses, visualization of sources, and secondary analyses. Implications for future research and person-centered public health will be discussed. Conclusions:  Results from the review will potentially guide the continued development of AI applications, future research in reducing the knowledge gaps, and improvement of practice related to precision health. New insights regarding examples of existing AI applications, their salient features, their role regarding precision health, and the existing evidence that exists for their effects on precision health outcomes will be demonstrated. Additionally, a demonstration of how existing AI applications address equity and SDOH and comply with established ethical and responsible frameworks will be provided. International Registered Report Identifier (IRRID): PRR1-10.2196/40565 %M 36692922 %R 10.2196/40565 %U https://www.researchprotocols.org/2023/1/e40565 %U https://doi.org/10.2196/40565 %U http://www.ncbi.nlm.nih.gov/pubmed/36692922 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e42672 %T Wearable Artificial Intelligence for Anxiety and Depression: Scoping Review %A Abd-alrazaq,Alaa %A AlSaad,Rawan %A Aziz,Sarah %A Ahmed,Arfan %A Denecke,Kerstin %A Househ,Mowafa %A Farooq,Faisal %A Sheikh,Javaid %+ AI Center for Precision Health, Weill Cornell Medicine-Qatar, P.O. Box 5825, Doha Al Luqta St, Ar-Rayyan, Doha, Qatar, 974 55708549, alaa_alzoubi88@yahoo.com %K wearable artificial intelligence %K artificial intelligence %K wearable devices %K anxiety %K depression %K scoping review %K mobile phone %D 2023 %7 19.1.2023 %9 Review %J J Med Internet Res %G English %X Background: Anxiety and depression are the most common mental disorders worldwide. Owing to the lack of psychiatrists around the world, the incorporation of artificial intelligence (AI) into wearable devices (wearable AI) has been exploited to provide mental health services. Objective: This review aimed to explore the features of wearable AI used for anxiety and depression to identify application areas and open research issues. Methods: We searched 8 electronic databases (MEDLINE, PsycINFO, Embase, CINAHL, IEEE Xplore, ACM Digital Library, Scopus, and Google Scholar) and included studies that met the inclusion criteria. Then, we checked the studies that cited the included studies and screened studies that were cited by the included studies. The study selection and data extraction were carried out by 2 reviewers independently. The extracted data were aggregated and summarized using narrative synthesis. Results: Of the 1203 studies identified, 69 (5.74%) were included in this review. Approximately, two-thirds of the studies used wearable AI for depression, whereas the remaining studies used it for anxiety. The most frequent application of wearable AI was in diagnosing anxiety and depression; however, none of the studies used it for treatment purposes. Most studies targeted individuals aged between 18 and 65 years. The most common wearable device used in the studies was Actiwatch AW4 (Cambridge Neurotechnology Ltd). Wrist-worn devices were the most common type of wearable device in the studies. The most commonly used category of data for model development was physical activity data, followed by sleep data and heart rate data. The most frequently used data set from open sources was Depresjon. The most commonly used algorithm was random forest, followed by support vector machine. Conclusions: Wearable AI can offer great promise in providing mental health services related to anxiety and depression. Wearable AI can be used by individuals for the prescreening assessment of anxiety and depression. Further reviews are needed to statistically synthesize the studies’ results related to the performance and effectiveness of wearable AI. Given its potential, technology companies should invest more in wearable AI for the treatment of anxiety and depression. %M 36656625 %R 10.2196/42672 %U https://www.jmir.org/2023/1/e42672 %U https://doi.org/10.2196/42672 %U http://www.ncbi.nlm.nih.gov/pubmed/36656625 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e39786 %T Intelligent Physical Robots in Health Care: Systematic Literature Review %A Huang,Rong %A Li,Hongxiu %A Suomi,Reima %A Li,Chenglong %A Peltoniemi,Teijo %+ Department of Management and Entrepreneurship, Turku School of Economics, University of Turku, Rehtorinpellonkatu 3, Turku, 20500, Finland, 358 417560485, rong.r.huang@utu.fi %K intelligent physical robot %K artificial intelligence %K health care %K literature review %D 2023 %7 18.1.2023 %9 Review %J J Med Internet Res %G English %X Background: Intelligent physical robots based on artificial intelligence have been argued to bring about dramatic changes in health care services. Previous research has examined the use of intelligent physical robots in the health care context from different perspectives; however, an overview of the antecedents and consequences of intelligent physical robot use in health care is lacking in the literature. Objective: In this paper, we aimed to provide an overview of the antecedents and consequences of intelligent physical robot use in health care and to propose potential agendas for future research through a systematic literature review. Methods: We conducted a systematic literature review on intelligent physical robots in the health care field following the guidelines of PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses). Literature searches were conducted in 5 databases (PubMed, Scopus, PsycINFO, Embase, and CINAHL) in May 2021, focusing on studies using intelligent physical robots for health care purposes. Subsequently, the quality of the included studies was assessed using the Mixed Methods Appraisal Tool. We performed an exploratory content analysis and synthesized the findings extracted from the included articles. Results: A total of 94 research articles were included in the review. Intelligent physical robots, including mechanoid, humanoid, android, and animalistic robots, have been used in hospitals, nursing homes, mental health care centers, laboratories, and patients’ homes by both end customers and health care professionals. The antecedents for intelligent physical robot use are categorized into individual-, organization-, and robot-related factors. Intelligent physical robot use in the health care context leads to both non–health-related consequences (emotional outcomes, attitude and evaluation outcomes, and behavioral outcomes) and consequences for (physical, mental, and social) health promotion for individual users. Accordingly, an integrative framework was proposed to obtain an overview of the antecedents and consequences of intelligent physical robot use in the health care context. Conclusions: This study contributes to the literature by summarizing current knowledge in the field of intelligent physical robot use in health care, by identifying the antecedents and the consequences of intelligent physical robot use, and by proposing potential future research agendas in the specific area based on the research findings in the literature and the identified knowledge gaps. %M 36652280 %R 10.2196/39786 %U https://www.jmir.org/2023/1/e39786 %U https://doi.org/10.2196/39786 %U http://www.ncbi.nlm.nih.gov/pubmed/36652280 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 9 %N %P e40080 %T Reporting, Monitoring, and Handling of Adverse Drug Reactions in Australia: Scoping Review %A Fossouo Tagne,Joel %A Yakob,Reginald Amin %A Dang,Thu Ha %A Mcdonald,Rachael %A Wickramasinghe,Nilmini %+ Department of Health and Biostatistics, School of Health Sciences, Swinburne University of Technology, John Street, Hawthorn, Melbourne, 3122, Australia, 61 0412478610, jfossouotagne@swin.edu.au %K pharmacovigilance %K adverse drug reactions %K primary care %K digital health %D 2023 %7 16.1.2023 %9 Review %J JMIR Public Health Surveill %G English %X Background: Adverse drug reactions (ADRs) are unintended consequences of medication use and may result in hospitalizations or deaths. Timely reporting of ADRs to regulators is essential for drug monitoring, research, and maintaining patient safety, but it has not been standardized in Australia. Objective: We sought to explore the ways that ADRs are monitored or reported in Australia. We reviewed how consumers and health care professionals participate in ADR monitoring and reporting. Methods: The Arksey and O’Malley framework provided a methodology to sort the data according to key themes and issues. Web of Science, Scopus, Embase, PubMed, CINAHL, and Computer & Applied Sciences Complete databases were used to extract articles published from 2010 to 2021. Two reviewers screened the papers for eligibility, extracted key data, and provided descriptive analysis of the data. Results: Seven articles met the inclusion criteria. The Adverse Medicine Events Line (telephone reporting service) was introduced in 2003 to support consumer reporting of ADRs; however, only 10.4% of consumers were aware of ADR reporting schemes. Consumers who experience side effects were more likely to report ADRs to their doctors or pharmacists than to the drug manufacturer. The documentation of ADR reports in hospital electronic health records showed that nurses and pharmacists were significantly less likely than doctors to omit the description of the drug reaction, and pharmacists were significantly more likely to enter the correct classification of the drug reaction than doctors. Review and analysis of all ADR reports submitted to the Therapeutic Goods Administration highlighted a decline in physician contribution from 28% of ADR reporting in 2003 to 4% in 2016; however, within this same time period, hospital and community pharmacists were a major source of ADR reporting (ie, 16%). In 2014, there was an increase in ADR reporting by community pharmacists following the introduction of the GuildLink ADR web-based reporting system; however, a year later, the reporting levels dropped. In 2018, the Therapeutic Goods Administration introduced a black triangle scheme on the packaging of newly approved medicines, to remind and encourage ADR reporting on new medicines, but this was only marginally successful at increasing the quantity of ADR reports. Conclusions: Despite the existence of national and international guidelines for ADR reporting and management, there is substantial interinstitutional variability in the standards of ADR reporting among individual health care facilities. There is room for increased ADR reporting rates among consumers and health care professionals. A thorough assessment of the barriers and enablers to ADR reporting at the primary health care institutional levels is essential. Interventions to increase ADR reporting, for example, the black triangle scheme (alert or awareness) or GuildLink (digital health), have only had marginal effects and may benefit from further improvement revisions and awareness programs. %M 36645706 %R 10.2196/40080 %U https://publichealth.jmir.org/2023/1/e40080 %U https://doi.org/10.2196/40080 %U http://www.ncbi.nlm.nih.gov/pubmed/36645706 %0 Journal Article %@ 2561-9128 %I JMIR Publications %V 6 %N %P e39044 %T Perioperative Risk Assessment of Patients Using the MyRISK Digital Score Completed Before the Preanesthetic Consultation: Prospective Observational Study %A Ferré,Fabrice %A Laurent,Rodolphe %A Furelau,Philippine %A Doumard,Emmanuel %A Ferrier,Anne %A Bosch,Laetitia %A Ba,Cyndie %A Menut,Rémi %A Kurrek,Matt %A Geeraerts,Thomas %A Piau,Antoine %A Minville,Vincent %+ Département d’Anesthésie-Réanimation, Hôpital Pierre-Paul Riquet, Centre Hospitalier Universitaire Purpan, Place du Dr Baylac, Toulouse, 31300, France, 33 0561779988, fabriceferre31@gmail.com %K chatbot %K digital health %K preanesthetic consultation %K perioperative risk %K machine learning %K mobile phone %D 2023 %7 16.1.2023 %9 Original Paper %J JMIR Perioper Med %G English %X Background: The ongoing COVID-19 pandemic has highlighted the potential of digital health solutions to adapt the organization of care in a crisis context. Objective: Our aim was to describe the relationship between the MyRISK score, derived from self-reported data collected by a chatbot before the preanesthetic consultation, and the occurrence of postoperative complications. Methods: This was a single-center prospective observational study that included 401 patients. The 16 items composing the MyRISK score were selected using the Delphi method. An algorithm was used to stratify patients with low (green), intermediate (orange), and high (red) risk. The primary end point concerned postoperative complications occurring in the first 6 months after surgery (composite criterion), collected by telephone and by consulting the electronic medical database. A logistic regression analysis was carried out to identify the explanatory variables associated with the complications. A machine learning model was trained to predict the MyRISK score using a larger data set of 1823 patients classified as green or red to reclassify individuals classified as orange as either modified green or modified red. User satisfaction and usability were assessed. Results: Of the 389 patients analyzed for the primary end point, 16 (4.1%) experienced a postoperative complication. A red score was independently associated with postoperative complications (odds ratio 5.9, 95% CI 1.5-22.3; P=.009). A modified red score was strongly correlated with postoperative complications (odds ratio 21.8, 95% CI 2.8-171.5; P=.003) and predicted postoperative complications with high sensitivity (94%) and high negative predictive value (99%) but with low specificity (49%) and very low positive predictive value (7%; area under the receiver operating characteristic curve=0.71). Patient satisfaction numeric rating scale and system usability scale median scores were 8.0 (IQR 7.0-9.0) out of 10 and 90.0 (IQR 82.5-95.0) out of 100, respectively. Conclusions: The MyRISK digital perioperative risk score established before the preanesthetic consultation was independently associated with the occurrence of postoperative complications. Its negative predictive strength was increased using a machine learning model to reclassify patients identified as being at intermediate risk. This reliable numerical categorization could be used to objectively refer patients with low risk to teleconsultation. %M 36645704 %R 10.2196/39044 %U https://periop.jmir.org/2023/1/e39044 %U https://doi.org/10.2196/39044 %U http://www.ncbi.nlm.nih.gov/pubmed/36645704 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e41043 %T An Accurate Deep Learning–Based System for Automatic Pill Identification: Model Development and Validation %A Heo,Junyeong %A Kang,Youjin %A Lee,SangKeun %A Jeong,Dong-Hwa %A Kim,Kang-Min %+ Department of Artificial Intelligence, The Catholic University of Korea, T908 Michael Building, The Catholic University of Korea, 43 Jibong-ro, Bucheon, 14662, Republic of Korea, 82 10 6707 6977, donghwa@catholic.ac.kr %K pill identification %K pill retrieval %K pill recognition %K automatic pill search %K deep learning %K machine learning %K character-level language model %D 2023 %7 13.1.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Medication errors account for a large proportion of all medical errors. In most homes, patients take a variety of medications for a long period. However, medication errors frequently occur because patients often throw away the containers of their medications. Objective: We proposed a deep learning–based system for reducing medication errors by accurately identifying prescription pills. Given the pill images, our system located the pills in the respective pill databases in South Korea and the United States. Methods: We organized the system into a pill recognition step and pill retrieval step, and we applied deep learning models to train not only images of the pill but also imprinted characters. In the pill recognition step, there are 3 modules that recognize the 3 features of pills and their imprints separately and correct the recognized imprint to fit the actual data. We adopted image classification and text detection models for the feature and imprint recognition modules, respectively. In the imprint correction module, we introduced a language model for the first time in the pill identification system and proposed a novel coordinate encoding technique for effective correction in the language model. We identified pills using similarity scores of pill characteristics with those in the database. Results: We collected the open pill database from South Korea and the United States in May 2022. We used a total of 24,404 pill images in our experiments. The experimental results show that the predicted top-1 candidates achieve accuracy levels of 85.6% (South Korea) and 74.5% (United States) for the types of pills not trained on 2 different databases (South Korea and the United States). Furthermore, the predicted top-1 candidate accuracy of our system was 78% with consumer-granted images, which was achieved by training only 1 image per pill. The results demonstrate that our system could identify and retrieve new pills without additional model updates. Finally, we confirmed through an ablation study that the language model that we emphasized significantly improves the pill identification ability of the system. Conclusions: Our study proposes the possibility of reducing medical errors by showing that the introduction of artificial intelligence can identify numerous pills with high precision in real time. Our study suggests that the proposed system can reduce patients’ misuse of medications and help medical staff focus on higher-level tasks by simplifying time-consuming lower-level tasks such as pill identification. %M 36637893 %R 10.2196/41043 %U https://www.jmir.org/2023/1/e41043 %U https://doi.org/10.2196/41043 %U http://www.ncbi.nlm.nih.gov/pubmed/36637893 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e40179 %T Interpretable Deep-Learning Approaches for Osteoporosis Risk Screening and Individualized Feature Analysis Using Large Population-Based Data: Model Development and Performance Evaluation %A Suh,Bogyeong %A Yu,Heejin %A Kim,Hyeyeon %A Lee,Sanghwa %A Kong,Sunghye %A Kim,Jin-Woo %A Choi,Jongeun %+ School of Mechanical Engineering, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea, 82 2 2123 2813, jongeunchoi@yonsei.ac.kr %K osteoporosis %K artificial intelligence %K deep learning %K machine learning %K risk factors %K screening %D 2023 %7 13.1.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Osteoporosis is one of the diseases that requires early screening and detection for its management. Common clinical tools and machine-learning (ML) models for screening osteoporosis have been developed, but they show limitations such as low accuracy. Moreover, these methods are confined to limited risk factors and lack individualized explanation. Objective: The aim of this study was to develop an interpretable deep-learning (DL) model for osteoporosis risk screening with clinical features. Clinical interpretation with individual explanations of feature contributions is provided using an explainable artificial intelligence (XAI) technique. Methods: We used two separate data sets: the National Health and Nutrition Examination Survey data sets from the United States (NHANES) and South Korea (KNHANES) with 8274 and 8680 respondents, respectively. The study population was classified according to the T-score of bone mineral density at the femoral neck or total femur. A DL model for osteoporosis diagnosis was trained on the data sets and significant risk factors were investigated with local interpretable model-agnostic explanations (LIME). The performance of the DL model was compared with that of ML models and conventional clinical tools. Additionally, contribution ranking of risk factors and individualized explanation of feature contribution were examined. Results: Our DL model showed area under the curve (AUC) values of 0.851 (95% CI 0.844-0.858) and 0.922 (95% CI 0.916-0.928) for the femoral neck and total femur bone mineral density, respectively, using the NHANES data set. The corresponding AUC values for the KNHANES data set were 0.827 (95% CI 0.821-0.833) and 0.912 (95% CI 0.898-0.927), respectively. Through the LIME method, significant features were induced, and each feature’s integrated contribution and interpretation for individual risk were determined. Conclusions: The developed DL model significantly outperforms conventional ML models and clinical tools. Our XAI model produces high-ranked features along with the integrated contributions of each feature, which facilitates the interpretation of individual risk. In summary, our interpretable model for osteoporosis risk screening outperformed state-of-the-art methods. %M 36482780 %R 10.2196/40179 %U https://www.jmir.org/2023/1/e40179 %U https://doi.org/10.2196/40179 %U http://www.ncbi.nlm.nih.gov/pubmed/36482780 %0 Journal Article %@ 2562-7600 %I JMIR Publications %V 6 %N %P e41331 %T The Use and Structure of Emergency Nurses’ Triage Narrative Data: Scoping Review %A Picard,Christopher %A Kleib,Manal %A Norris,Colleen %A O'Rourke,Hannah M %A Montgomery,Carmel %A Douma,Matthew %+ Faculty of Nursing, University of Alberta, Graduate Office, 4-171 Edmonton Clinic Health Academy, Edmonton, AB, T6G 1C9, Canada, 1 (780) 492 4567, picard.ct@gmail.com %K nursing %K artificial intelligence %K machine learning %K triage %K review %K narrative %D 2023 %7 13.1.2023 %9 Review %J JMIR Nursing %G English %X Background: Emergency departments use triage to ensure that patients with the highest level of acuity receive care quickly and safely. Triage is typically a nursing process that is documented as structured and unstructured (free text) data. Free-text triage narratives have been studied for specific conditions but never reviewed in a comprehensive manner. Objective: The objective of this paper was to identify and map the academic literature that examines triage narratives. The paper described the types of research conducted, identified gaps in the research, and determined where additional review may be warranted. Methods: We conducted a scoping review of unstructured triage narratives. We mapped the literature, described the use of triage narrative data, examined the information available on the form and structure of narratives, highlighted similarities among publications, and identified opportunities for future research. Results: We screened 18,074 studies published between 1990 and 2022 in CINAHL, MEDLINE, Embase, Cochrane, and ProQuest Central. We identified 0.53% (96/18,074) of studies that directly examined the use of triage nurses’ narratives. More than 12 million visits were made to 2438 emergency departments included in the review. In total, 82% (79/96) of these studies were conducted in the United States (43/96, 45%), Australia (31/96, 32%), or Canada (5/96, 5%). Triage narratives were used for research and case identification, as input variables for predictive modeling, and for quality improvement. Overall, 31% (30/96) of the studies offered a description of the triage narrative, including a list of the keywords used (27/96, 28%) or more fulsome descriptions (such as word counts, character counts, abbreviation, etc; 7/96, 7%). We found limited use of reporting guidelines (8/96, 8%). Conclusions: The breadth of the identified studies suggests that there is widespread routine collection and research use of triage narrative data. Despite the use of triage narratives as a source of data in studies, the narratives and nurses who generate them are poorly described in the literature, and data reporting is inconsistent. Additional research is needed to describe the structure of triage narratives, determine the best use of triage narratives, and improve the consistent use of triage-specific data reporting guidelines. International Registered Report Identifier (IRRID): RR2-10.1136/bmjopen-2021-055132 %M 36637881 %R 10.2196/41331 %U https://nursing.jmir.org/2023/1/e41331 %U https://doi.org/10.2196/41331 %U http://www.ncbi.nlm.nih.gov/pubmed/36637881 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 2 %N %P e40843 %T Deep Learning Transformer Models for Building a Comprehensive and Real-time Trauma Observatory: Development and Validation Study %A Chenais,Gabrielle %A Gil-Jardiné,Cédric %A Touchais,Hélène %A Avalos Fernandez,Marta %A Contrand,Benjamin %A Tellier,Eric %A Combes,Xavier %A Bourdois,Loick %A Revel,Philippe %A Lagarde,Emmanuel %+ Unit 1219, Bordeaux Public Health Center, Institut National de la Santé et de la Recherche Médicale, 146 rue Léo Saignat, Bordeaux, 33000, France, 33 33 05 57 57 15, gabrielle.chenais@u-bordeaux.fr %K deep learning %K public health %K trauma %K emergencies %K natural language processing %K transformers %D 2023 %7 12.1.2023 %9 Original Paper %J JMIR AI %G English %X Background: Public health surveillance relies on the collection of data, often in near-real time. Recent advances in natural language processing make it possible to envisage an automated system for extracting information from electronic health records. Objective: To study the feasibility of setting up a national trauma observatory in France, we compared the performance of several automatic language processing methods in a multiclass classification task of unstructured clinical notes. Methods: A total of 69,110 free-text clinical notes related to visits to the emergency departments of the University Hospital of Bordeaux, France, between 2012 and 2019 were manually annotated. Among these clinical notes, 32.5% (22,481/69,110) were traumas. We trained 4 transformer models (deep learning models that encompass attention mechanism) and compared them with the term frequency–inverse document frequency associated with the support vector machine method. Results: The transformer models consistently performed better than the term frequency–inverse document frequency and a support vector machine. Among the transformers, the GPTanam model pretrained with a French corpus with an additional autosupervised learning step on 306,368 unlabeled clinical notes showed the best performance with a micro F1-score of 0.969. Conclusions: The transformers proved efficient at the multiclass classification of narrative and medical data. Further steps for improvement should focus on the expansion of abbreviations and multioutput multiclass classification. %M 38875539 %R 10.2196/40843 %U https://ai.jmir.org/2023/1/e40843 %U https://doi.org/10.2196/40843 %U http://www.ncbi.nlm.nih.gov/pubmed/38875539 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e39742 %T Stakeholder Perspectives of Clinical Artificial Intelligence Implementation: Systematic Review of Qualitative Evidence %A Hogg,Henry David Jeffry %A Al-Zubaidy,Mohaimen %A , %A Talks,James %A Denniston,Alastair K %A Kelly,Christopher J %A Malawana,Johann %A Papoutsi,Chrysanthi %A Teare,Marion Dawn %A Keane,Pearse A %A Beyer,Fiona R %A Maniatopoulos,Gregory %+ Moorfields Eye Hospital NHS Foundation Trust, 162 City Road, London, EC1V 2PD, United Kingdom, 44 020 7253 3411, p.keane@ucl.ac.uk %K artificial intelligence %K systematic review %K qualitative research %K computerized decision support %K qualitative evidence synthesis %K implementation %D 2023 %7 10.1.2023 %9 Review %J J Med Internet Res %G English %X Background: The rhetoric surrounding clinical artificial intelligence (AI) often exaggerates its effect on real-world care. Limited understanding of the factors that influence its implementation can perpetuate this. Objective: In this qualitative systematic review, we aimed to identify key stakeholders, consolidate their perspectives on clinical AI implementation, and characterize the evidence gaps that future qualitative research should target. Methods: Ovid-MEDLINE, EBSCO-CINAHL, ACM Digital Library, Science Citation Index-Web of Science, and Scopus were searched for primary qualitative studies on individuals’ perspectives on any application of clinical AI worldwide (January 2014-April 2021). The definition of clinical AI includes both rule-based and machine learning–enabled or non–rule-based decision support tools. The language of the reports was not an exclusion criterion. Two independent reviewers performed title, abstract, and full-text screening with a third arbiter of disagreement. Two reviewers assigned the Joanna Briggs Institute 10-point checklist for qualitative research scores for each study. A single reviewer extracted free-text data relevant to clinical AI implementation, noting the stakeholders contributing to each excerpt. The best-fit framework synthesis used the Nonadoption, Abandonment, Scale-up, Spread, and Sustainability (NASSS) framework. To validate the data and improve accessibility, coauthors representing each emergent stakeholder group codeveloped summaries of the factors most relevant to their respective groups. Results: The initial search yielded 4437 deduplicated articles, with 111 (2.5%) eligible for inclusion (median Joanna Briggs Institute 10-point checklist for qualitative research score, 8/10). Five distinct stakeholder groups emerged from the data: health care professionals (HCPs), patients, carers and other members of the public, developers, health care managers and leaders, and regulators or policy makers, contributing 1204 (70%), 196 (11.4%), 133 (7.7%), 129 (7.5%), and 59 (3.4%) of 1721 eligible excerpts, respectively. All stakeholder groups independently identified a breadth of implementation factors, with each producing data that were mapped between 17 and 24 of the 27 adapted Nonadoption, Abandonment, Scale-up, Spread, and Sustainability subdomains. Most of the factors that stakeholders found influential in the implementation of rule-based clinical AI also applied to non–rule-based clinical AI, with the exception of intellectual property, regulation, and sociocultural attitudes. Conclusions: Clinical AI implementation is influenced by many interdependent factors, which are in turn influenced by at least 5 distinct stakeholder groups. This implies that effective research and practice of clinical AI implementation should consider multiple stakeholder perspectives. The current underrepresentation of perspectives from stakeholders other than HCPs in the literature may limit the anticipation and management of the factors that influence successful clinical AI implementation. Future research should not only widen the representation of tools and contexts in qualitative research but also specifically investigate the perspectives of all stakeholder HCPs and emerging aspects of non–rule-based clinical AI implementation. Trial Registration: PROSPERO (International Prospective Register of Systematic Reviews) CRD42021256005; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=256005 International Registered Report Identifier (IRRID): RR2-10.2196/33145 %M 36626192 %R 10.2196/39742 %U https://www.jmir.org/2023/1/e39742 %U https://doi.org/10.2196/39742 %U http://www.ncbi.nlm.nih.gov/pubmed/36626192 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e41142 %T Machine Learning–Based Prediction of Acute Kidney Injury Following Pediatric Cardiac Surgery: Model Development and Validation Study %A Luo,Xiao-Qin %A Kang,Yi-Xin %A Duan,Shao-Bin %A Yan,Ping %A Song,Guo-Bao %A Zhang,Ning-Ya %A Yang,Shi-Kun %A Li,Jing-Xin %A Zhang,Hui %+ Department of Nephrology, The Second Xiangya Hospital of Central South University, 139 Renmin Road, Changsha, 410011, China, 86 731 85295100, duansb528@csu.edu.cn %K cardiac surgery %K acute kidney injury %K pediatric %K machine learning %D 2023 %7 5.1.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Cardiac surgery–associated acute kidney injury (CSA-AKI) is a major complication following pediatric cardiac surgery, which is associated with increased morbidity and mortality. The early prediction of CSA-AKI before and immediately after surgery could significantly improve the implementation of preventive and therapeutic strategies during the perioperative periods. However, there is limited clinical information on how to identify pediatric patients at high risk of CSA-AKI. Objective: The study aims to develop and validate machine learning models to predict the development of CSA-AKI in the pediatric population. Methods: This retrospective cohort study enrolled patients aged 1 month to 18 years who underwent cardiac surgery with cardiopulmonary bypass at 3 medical centers of Central South University in China. CSA-AKI was defined according to the 2012 Kidney Disease: Improving Global Outcomes criteria. Feature selection was applied separately to 2 data sets: the preoperative data set and the combined preoperative and intraoperative data set. Multiple machine learning algorithms were tested, including K-nearest neighbor, naive Bayes, support vector machines, random forest, extreme gradient boosting (XGBoost), and neural networks. The best performing model was identified in cross-validation by using the area under the receiver operating characteristic curve (AUROC). Model interpretations were generated using the Shapley additive explanations (SHAP) method. Results: A total of 3278 patients from one of the centers were used for model derivation, while 585 patients from another 2 centers served as the external validation cohort. CSA-AKI occurred in 564 (17.2%) patients in the derivation cohort and 51 (8.7%) patients in the external validation cohort. Among the considered machine learning models, the XGBoost models achieved the best predictive performance in cross-validation. The AUROC of the XGBoost model using only the preoperative variables was 0.890 (95% CI 0.876-0.906) in the derivation cohort and 0.857 (95% CI 0.800-0.903) in the external validation cohort. When the intraoperative variables were included, the AUROC increased to 0.912 (95% CI 0.899-0.924) and 0.889 (95% CI 0.844-0.920) in the 2 cohorts, respectively. The SHAP method revealed that baseline serum creatinine level, perfusion time, body length, operation time, and intraoperative blood loss were the top 5 predictors of CSA-AKI. Conclusions: The interpretable XGBoost models provide practical tools for the early prediction of CSA-AKI, which are valuable for risk stratification and perioperative management of pediatric patients undergoing cardiac surgery. %M 36603200 %R 10.2196/41142 %U https://www.jmir.org/2023/1/e41142 %U https://doi.org/10.2196/41142 %U http://www.ncbi.nlm.nih.gov/pubmed/36603200 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 10 %N %P e39114 %T Intensive Care Unit Physicians’ Perspectives on Artificial Intelligence–Based Clinical Decision Support Tools: Preimplementation Survey Study %A van der Meijden,Siri L %A de Hond,Anne A H %A Thoral,Patrick J %A Steyerberg,Ewout W %A Kant,Ilse M J %A Cinà,Giovanni %A Arbous,M Sesmu %+ Department of Intensive Care Medicine, Leiden University Medical Center, Albinusdreef 2, Leiden, 2333 ZA, Netherlands, 31 71 526 9111, S.L.van_der_meijden@lumc.nl %K intensive care unit %K hospital %K discharge %K artificial intelligence %K AI %K clinical decision support %K clinical support %K acceptance %K decision support %K decision-making %K digital health %K eHealth %K survey %K perspective %K attitude %K opinion %K adoption %K prediction %K risk %D 2023 %7 5.1.2023 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Artificial intelligence–based clinical decision support (AI-CDS) tools have great potential to benefit intensive care unit (ICU) patients and physicians. There is a gap between the development and implementation of these tools. Objective: We aimed to investigate physicians’ perspectives and their current decision-making behavior before implementing a discharge AI-CDS tool for predicting readmission and mortality risk after ICU discharge. Methods: We conducted a survey of physicians involved in decision-making on discharge of patients at two Dutch academic ICUs between July and November 2021. Questions were divided into four domains: (1) physicians’ current decision-making behavior with respect to discharging ICU patients, (2) perspectives on the use of AI-CDS tools in general, (3) willingness to incorporate a discharge AI-CDS tool into daily clinical practice, and (4) preferences for using a discharge AI-CDS tool in daily workflows. Results: Most of the 64 respondents (of 93 contacted, 69%) were familiar with AI (62/64, 97%) and had positive expectations of AI, with 55 of 64 (86%) believing that AI could support them in their work as a physician. The respondents disagreed on whether the decision to discharge a patient was complex (23/64, 36% agreed and 22/64, 34% disagreed); nonetheless, most (59/64, 92%) agreed that a discharge AI-CDS tool could be of value. Significant differences were observed between physicians from the 2 academic sites, which may be related to different levels of involvement in the development of the discharge AI-CDS tool. Conclusions: ICU physicians showed a favorable attitude toward the integration of AI-CDS tools into the ICU setting in general, and in particular toward a tool to predict a patient’s risk of readmission and mortality within 7 days after discharge. The findings of this questionnaire will be used to improve the implementation process and training of end users. %M 36602843 %R 10.2196/39114 %U https://humanfactors.jmir.org/2023/1/e39114 %U https://doi.org/10.2196/39114 %U http://www.ncbi.nlm.nih.gov/pubmed/36602843 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 9 %N 12 %P e39747 %T Cross-Platform Detection of Psychiatric Hospitalization via Social Media Data: Comparison Study %A Nguyen,Viet Cuong %A Lu,Nathaniel %A Kane,John M %A Birnbaum,Michael L %A De Choudhury,Munmun %+ School of Interactive Computing, Georgia Institute of Technology, 756 W Peachtree St NW, Atlanta, GA, 30318, United States, 1 404 279 2941, johnny.nguyen@gatech.edu %K schizophrenia %K mental health %K machine learning %K clinical informatics %K social media %K mobile phone %D 2022 %7 30.12.2022 %9 Original Paper %J JMIR Ment Health %G English %X Background: Previous research has shown the feasibility of using machine learning models trained on social media data from a single platform (eg, Facebook or Twitter) to distinguish individuals either with a diagnosis of mental illness or experiencing an adverse outcome from healthy controls. However, the performance of such models on data from novel social media platforms unseen in the training data (eg, Instagram and TikTok) has not been investigated in previous literature. Objective: Our study examined the feasibility of building machine learning classifiers that can effectively predict an upcoming psychiatric hospitalization given social media data from platforms unseen in the classifiers’ training data despite the preliminary evidence on identity fragmentation on the investigated social media platforms. Methods: Windowed timeline data of patients with a diagnosis of schizophrenia spectrum disorder before a known hospitalization event and healthy controls were gathered from 3 platforms: Facebook (254/268, 94.8% of participants), Twitter (51/268, 19% of participants), and Instagram (134/268, 50% of participants). We then used a 3 × 3 combinatorial binary classification design to train machine learning classifiers and evaluate their performance on testing data from all available platforms. We further compared results from models in intraplatform experiments (ie, training and testing data belonging to the same platform) to those from models in interplatform experiments (ie, training and testing data belonging to different platforms). Finally, we used Shapley Additive Explanation values to extract the top predictive features to explain and compare the underlying constructs that predict hospitalization on each platform. Results: We found that models in intraplatform experiments on average achieved an F1-score of 0.72 (SD 0.07) in predicting a psychiatric hospitalization because of schizophrenia spectrum disorder, which is 68% higher than the average of models in interplatform experiments at an F1-score of 0.428 (SD 0.11). When investigating the key drivers for divergence in construct validities between models, an analysis of top features for the intraplatform models showed both low predictive feature overlap between the platforms and low pairwise rank correlation (<0.1) between the platforms’ top feature rankings. Furthermore, low average cosine similarity of data between platforms within participants in comparison with the same measurement on data within platforms between participants points to evidence of identity fragmentation of participants between platforms. Conclusions: We demonstrated that models built on one platform’s data to predict critical mental health treatment outcomes such as hospitalization do not generalize to another platform. In our case, this is because different social media platforms consistently reflect different segments of participants’ identities. With the changing ecosystem of social media use among different demographic groups and as web-based identities continue to become fragmented across platforms, further research on holistic approaches to harnessing these diverse data sources is required. %M 36583932 %R 10.2196/39747 %U https://mental.jmir.org/2022/12/e39747 %U https://doi.org/10.2196/39747 %U http://www.ncbi.nlm.nih.gov/pubmed/36583932 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 8 %N 12 %P e35750 %T An Assessment of the Predictive Performance of Current Machine Learning–Based Breast Cancer Risk Prediction Models: Systematic Review %A Gao,Ying %A Li,Shu %A Jin,Yujing %A Zhou,Lengxiao %A Sun,Shaomei %A Xu,Xiaoqian %A Li,Shuqian %A Yang,Hongxi %A Zhang,Qing %A Wang,Yaogang %+ School of Public Health, Tianjin Medical University, Qixiangtai Road 22, Heping District, Tianjin, 300070, China, 86 02260361238, wangyg@tmu.edu.cn %K breast cancer %K machine learning %K risk prediction %K cancer %K oncology %K systemic review %K review %K meta-analysis %K cancer research %K risk model %D 2022 %7 29.12.2022 %9 Review %J JMIR Public Health Surveill %G English %X Background: Several studies have explored the predictive performance of machine learning–based breast cancer risk prediction models and have shown controversial conclusions. Thus, the performance of the current machine learning–based breast cancer risk prediction models and their benefits and weakness need to be evaluated for the future development of feasible and efficient risk prediction models. Objective: The aim of this review was to assess the performance and the clinical feasibility of the currently available machine learning–based breast cancer risk prediction models. Methods: We searched for papers published until June 9, 2021, on machine learning–based breast cancer risk prediction models in PubMed, Embase, and Web of Science. Studies describing the development or validation models for predicting future breast cancer risk were included. The Prediction Model Risk of Bias Assessment Tool (PROBAST) was used to assess the risk of bias and the clinical applicability of the included studies. The pooled area under the curve (AUC) was calculated using the DerSimonian and Laird random-effects model. Results: A total of 8 studies with 10 data sets were included. Neural network was the most common machine learning method for the development of breast cancer risk prediction models. The pooled AUC of the machine learning–based optimal risk prediction model reported in each study was 0.73 (95% CI 0.66-0.80; approximate 95% prediction interval 0.56-0.96), with a high level of heterogeneity between studies (Q=576.07, I2=98.44%; P<.001). The results of head-to-head comparison of the performance difference between the 2 types of models trained by the same data set showed that machine learning models had a slightly higher advantage than traditional risk factor–based models in predicting future breast cancer risk. The pooled AUC of the neural network–based risk prediction model was higher than that of the nonneural network–based optimal risk prediction model (0.71 vs 0.68, respectively). Subgroup analysis showed that the incorporation of imaging features in risk models resulted in a higher pooled AUC than the nonincorporation of imaging features in risk models (0.73 vs 0.61; Pheterogeneity=.001, respectively). The PROBAST analysis indicated that many machine learning models had high risk of bias and poorly reported calibration analysis. Conclusions: Our review shows that the current machine learning–based breast cancer risk prediction models have some technical pitfalls and that their clinical feasibility and reliability are unsatisfactory. %M 36426919 %R 10.2196/35750 %U https://publichealth.jmir.org/2022/12/e35750 %U https://doi.org/10.2196/35750 %U http://www.ncbi.nlm.nih.gov/pubmed/36426919 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 12 %P e35903 %T User Experience of COVID-19 Chatbots: Scoping Review %A White,Becky K %A Martin,Annegret %A White,James Angus %+ Reach Health Promotion Innovations, PO Box 372, Mount Hawthorn, Perth, 6915, Australia, 61 450169891, becky@rhpi.com.au %K COVID-19 %K chatbot %K engagement %K user experience %K pandemic %K global health %K pandemic %K digital health %K health information %D 2022 %7 27.12.2022 %9 Review %J J Med Internet Res %G English %X Background: The COVID-19 pandemic has had global impacts and caused some health systems to experience substantial pressure. The need for accurate health information has been felt widely. Chatbots have great potential to reach people with authoritative information, and a number of chatbots have been quickly developed to disseminate information about COVID-19. However, little is known about user experiences of and perspectives on these tools. Objective: This study aimed to describe what is known about the user experience and user uptake of COVID-19 chatbots. Methods: A scoping review was carried out in June 2021 using keywords to cover the literature concerning chatbots, user engagement, and COVID-19. The search strategy included databases covering health, communication, marketing, and the COVID-19 pandemic specifically, including MEDLINE Ovid, Embase, CINAHL, ACM Digital Library, Emerald, and EBSCO. Studies that assessed the design, marketing, and user features of COVID-19 chatbots or those that explored user perspectives and experience were included. We excluded papers that were not related to COVID-19; did not include any reporting on user perspectives, experience, or the general use of chatbot features or marketing; or where a version was not available in English. The authors independently screened results for inclusion, using both backward and forward citation checking of the included papers. A thematic analysis was carried out with the included papers. Results: A total of 517 papers were sourced from the literature, and 10 were included in the final review. Our scoping review identified a number of factors impacting adoption and engagement including content, trust, digital ability, and acceptability. The papers included discussions about chatbots developed for COVID-19 screening and general COVID-19 information, as well as studies investigating user perceptions and opinions on COVID-19 chatbots. Conclusions: The COVID-19 pandemic presented a unique and specific challenge for digital health interventions. Design and implementation were required at a rapid speed as digital health service adoption accelerated globally. Chatbots for COVID-19 have been developed quickly as the pandemic has challenged health systems. There is a need for more comprehensive and routine reporting of factors impacting adoption and engagement. This paper has shown both the potential of chatbots to reach users in an emergency and the need to better understand how users engage and what they want. %M 36520624 %R 10.2196/35903 %U https://www.jmir.org/2022/12/e35903 %U https://doi.org/10.2196/35903 %U http://www.ncbi.nlm.nih.gov/pubmed/36520624 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 12 %P e38859 %T Predicting Publication of Clinical Trials Using Structured and Unstructured Data: Model Development and Validation Study %A Wang,Siyang %A Šuster,Simon %A Baldwin,Timothy %A Verspoor,Karin %+ School of Computing and Information Systems, University of Melbourne, Parkville, Melbourne, 3000, Australia, 61 40834491, simon.suster@unimelb.edu.au %K clinical trials %K study characteristics %K machine learning %K natural language processing %K pretrained language models %K publication success %D 2022 %7 23.12.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Publication of registered clinical trials is a critical step in the timely dissemination of trial findings. However, a significant proportion of completed clinical trials are never published, motivating the need to analyze the factors behind success or failure to publish. This could inform study design, help regulatory decision-making, and improve resource allocation. It could also enhance our understanding of bias in the publication of trials and publication trends based on the research direction or strength of the findings. Although the publication of clinical trials has been addressed in several descriptive studies at an aggregate level, there is a lack of research on the predictive analysis of a trial’s publishability given an individual (planned) clinical trial description. Objective: We aimed to conduct a study that combined structured and unstructured features relevant to publication status in a single predictive approach. Established natural language processing techniques as well as recent pretrained language models enabled us to incorporate information from the textual descriptions of clinical trials into a machine learning approach. We were particularly interested in whether and which textual features could improve the classification accuracy for publication outcomes. Methods: In this study, we used metadata from ClinicalTrials.gov (a registry of clinical trials) and MEDLINE (a database of academic journal articles) to build a data set of clinical trials (N=76,950) that contained the description of a registered trial and its publication outcome (27,702/76,950, 36% published and 49,248/76,950, 64% unpublished). This is the largest data set of its kind, which we released as part of this work. The publication outcome in the data set was identified from MEDLINE based on clinical trial identifiers. We carried out a descriptive analysis and predicted the publication outcome using 2 approaches: a neural network with a large domain-specific language model and a random forest classifier using a weighted bag-of-words representation of text. Results: First, our analysis of the newly created data set corroborates several findings from the existing literature regarding attributes associated with a higher publication rate. Second, a crucial observation from our predictive modeling was that the addition of textual features (eg, eligibility criteria) offers consistent improvements over using only structured data (F1-score=0.62-0.64 vs F1-score=0.61 without textual features). Both pretrained language models and more basic word-based representations provide high-utility text representations, with no significant empirical difference between the two. Conclusions: Different factors affect the publication of a registered clinical trial. Our approach to predictive modeling combines heterogeneous features, both structured and unstructured. We show that methods from natural language processing can provide effective textual features to enable more accurate prediction of publication success, which has not been explored for this task previously. %M 36563029 %R 10.2196/38859 %U https://www.jmir.org/2022/12/e38859 %U https://doi.org/10.2196/38859 %U http://www.ncbi.nlm.nih.gov/pubmed/36563029 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 12 %P e38751 %T Examining the Use of an Artificial Intelligence Model to Diagnose Influenza: Development and Validation Study %A Okiyama,Sho %A Fukuda,Memori %A Sode,Masashi %A Takahashi,Wataru %A Ikeda,Masahiro %A Kato,Hiroaki %A Tsugawa,Yusuke %A Iwagami,Masao %+ Aillis, Inc, 1-10-1-11F, Yurakucho, Chiyoda-ku, Tokyo, 100-0006, Japan, 81 3 5218 2374, sho.okiyama@aillis.jp %K influenza %K physical examination %K pharynx %K deep learning %K diagnostic prediction %D 2022 %7 23.12.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: The global burden of influenza is substantial. It is a major disease that causes annual epidemics and occasionally, pandemics. Given that influenza primarily infects the upper respiratory system, it may be possible to diagnose influenza infection by applying deep learning to pharyngeal images. Objective: We aimed to develop a deep learning model to diagnose influenza infection using pharyngeal images and clinical information. Methods: We recruited patients who visited clinics and hospitals because of influenza-like symptoms. In the training stage, we developed a diagnostic prediction artificial intelligence (AI) model based on deep learning to predict polymerase chain reaction (PCR)–confirmed influenza from pharyngeal images and clinical information. In the validation stage, we assessed the diagnostic performance of the AI model. In additional analysis, we compared the diagnostic performance of the AI model with that of 3 physicians and interpreted the AI model using importance heat maps. Results: We enrolled a total of 7831 patients at 64 hospitals between November 1, 2019, and January 21, 2020, in the training stage and 659 patients (including 196 patients with PCR-confirmed influenza) at 11 hospitals between January 25, 2020, and March 13, 2020, in the validation stage. The area under the receiver operating characteristic curve for the AI model was 0.90 (95% CI 0.87-0.93), and its sensitivity and specificity were 76% (70%-82%) and 88% (85%-91%), respectively, outperforming 3 physicians. In the importance heat maps, the AI model often focused on follicles on the posterior pharyngeal wall. Conclusions: We developed the first AI model that can accurately diagnose influenza from pharyngeal images, which has the potential to help physicians to make a timely diagnosis. %M 36374004 %R 10.2196/38751 %U https://www.jmir.org/2022/12/e38751 %U https://doi.org/10.2196/38751 %U http://www.ncbi.nlm.nih.gov/pubmed/36374004 %0 Journal Article %@ 2563-3570 %I JMIR Publications %V 3 %N 1 %P e40473 %T Use of Artificial Intelligence in the Search for New Information Through Routine Laboratory Tests: Systematic Review %A Cardozo,Glauco %A Tirloni,Salvador Francisco %A Pereira Moro,Antônio Renato %A Marques,Jefferson Luiz Brum %+ Federal Institute of Santa Catarina, Av. Mauro Ramos, 950 - Centro, Florianópolis, 88020-300, Brazil, 55 48984060740, glauco.cardozo@ifsc.edu.br %K review %K laboratory tests %K machine learning %K prediction %K diagnosis %K COVID-19 %D 2022 %7 23.12.2022 %9 Review %J JMIR Bioinform Biotech %G English %X Background: In recent decades, the use of artificial intelligence has been widely explored in health care. Similarly, the amount of data generated in the most varied medical processes has practically doubled every year, requiring new methods of analysis and treatment of these data. Mainly aimed at aiding in the diagnosis and prevention of diseases, this precision medicine has shown great potential in different medical disciplines. Laboratory tests, for example, almost always present their results separately as individual values. However, physicians need to analyze a set of results to propose a supposed diagnosis, which leads us to think that sets of laboratory tests may contain more information than those presented separately for each result. In this way, the processes of medical laboratories can be strongly affected by these techniques. Objective: In this sense, we sought to identify scientific research that used laboratory tests and machine learning techniques to predict hidden information and diagnose diseases. Methods: The methodology adopted used the population, intervention, comparison, and outcomes principle, searching the main engineering and health sciences databases. The search terms were defined based on the list of terms used in the Medical Subject Heading database. Data from this study were presented descriptively and followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses; 2020) statement flow diagram and the National Institutes of Health tool for quality assessment of articles. During the analysis, the inclusion and exclusion criteria were independently applied by 2 authors, with a third author being consulted in cases of disagreement. Results: Following the defined requirements, 40 studies presenting good quality in the analysis process were selected and evaluated. We found that, in recent years, there has been a significant increase in the number of works that have used this methodology, mainly because of COVID-19. In general, the studies used machine learning classification models to predict new information, and the most used parameters were data from routine laboratory tests such as the complete blood count. Conclusions: Finally, we conclude that laboratory tests, together with machine learning techniques, can predict new tests, thus helping the search for new diagnoses. This process has proved to be advantageous and innovative for medical laboratories. It is making it possible to discover hidden information and propose additional tests, reducing the number of false negatives and helping in the early discovery of unknown diseases. %M 36644762 %R 10.2196/40473 %U https://bioinform.jmir.org/2022/1/e40473 %U https://doi.org/10.2196/40473 %U http://www.ncbi.nlm.nih.gov/pubmed/36644762 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 12 %P e42353 %T Human Decision-making in an Artificial Intelligence–Driven Future in Health: Protocol for Comparative Analysis and Simulation %A Doreswamy,Nandini %A Horstmanshof,Louise %+ National Coalition of Independent Scholars, Suite 465, 48 Dickson Place, Dickson, ACT, 2602, Australia, 61 424890997, ndoreswamy@outlook.com %K human decision-making %K AI decision-making %K human-AI interaction %K human roles %K artificial intelligence %K nonclinical health services %K health policy %K health regulation %D 2022 %7 23.12.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: Health care can broadly be divided into two domains: clinical health services and complex health services (ie, nonclinical health services, eg, health policy and health regulation). Artificial intelligence (AI) is transforming both of these areas. Currently, humans are leaders, managers, and decision makers in complex health services. However, with the rise of AI, the time has come to ask whether humans will continue to have meaningful decision-making roles in this domain. Further, rationality has long dominated this space. What role will intuition play? Objective: The aim is to establish a protocol of protocols to be used in the proposed research, which aims to explore whether humans will continue in meaningful decision-making roles in complex health services in an AI-driven future. Methods: This paper describes a set of protocols for the proposed research, which is designed as a 4-step project across two phases. This paper describes the protocols for each step. The first step is a scoping review to identify and map human attributes that influence decision-making in complex health services. The research question focuses on the attributes that influence human decision-making in this context as reported in the literature. The second step is a scoping review to identify and map AI attributes that influence decision-making in complex health services. The research question focuses on attributes that influence AI decision-making in this context as reported in the literature. The third step is a comparative analysis: a narrative comparison followed by a mathematical comparison of the two sets of attributes—human and AI. This analysis will investigate whether humans have one or more unique attributes that could influence decision-making for the better. The fourth step is a simulation of a nonclinical environment in health regulation and policy into which virtual human and AI decision makers (agents) are introduced. The virtual human and AI will be based on the human and AI attributes identified in the scoping reviews. The simulation will explore, observe, and document how humans interact with AI, and whether humans are likely to compete, cooperate, or converge with AI. Results: The results will be presented in tabular form, visually intuitive formats, and—in the case of the simulation—multimedia formats. Conclusions: This paper provides a road map for the proposed research. It also provides an example of a protocol of protocols for methods used in complex health research. While there are established guidelines for a priori protocols for scoping reviews, there is a paucity of guidance on establishing a protocol of protocols. This paper takes the first step toward building a scaffolding for future guidelines in this regard. International Registered Report Identifier (IRRID): PRR1-10.2196/42353 %M 36460486 %R 10.2196/42353 %U https://www.researchprotocols.org/2022/12/e42353 %U https://doi.org/10.2196/42353 %U http://www.ncbi.nlm.nih.gov/pubmed/36460486 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 12 %P e38161 %T Boosting Delirium Identification Accuracy With Sentiment-Based Natural Language Processing: Mixed Methods Study %A Wang,Lu %A Zhang,Yilun %A Chignell,Mark %A Shan,Baizun %A Sheehan,Kathleen A %A Razak,Fahad %A Verma,Amol %+ Department of Mechanical & Industrial Engineering, University of Toronto, RM 8171A, Bahen Building, 40 St George Rd, Toronto, ON, M5S 2E4, Canada, 1 6473898951, chignel@mie.utoronto.ca %K delirium diagnosis %K data mining %K medical image description %K text mining and analysis %K sentiment analysis %D 2022 %7 20.12.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: Delirium is an acute neurocognitive disorder that affects up to half of older hospitalized medical patients and can lead to dementia, longer hospital stays, increased health costs, and death. Although delirium can be prevented and treated, it is difficult to identify and predict. Objective: This study aimed to improve machine learning models that retrospectively identify the presence of delirium during hospital stays (eg, to measure the effectiveness of delirium prevention interventions) by using the natural language processing (NLP) technique of sentiment analysis (in this case a feature that identifies sentiment toward, or away from, a delirium diagnosis). Methods: Using data from the General Medicine Inpatient Initiative, a Canadian hospital data and analytics network, a detailed manual review of medical records was conducted from nearly 4000 admissions at 6 Toronto area hospitals. Furthermore, 25.74% (994/3862) of the eligible hospital admissions were labeled as having delirium. Using the data set collected from this study, we developed machine learning models with, and without, the benefit of NLP methods applied to diagnostic imaging reports, and we asked the question “can NLP improve machine learning identification of delirium?” Results: Among the eligible 3862 hospital admissions, 994 (25.74%) admissions were labeled as having delirium. Identification and calibration of the models were satisfactory. The accuracy and area under the receiver operating characteristic curve of the main model with NLP in the independent testing data set were 0.807 and 0.930, respectively. The accuracy and area under the receiver operating characteristic curve of the main model without NLP in the independent testing data set were 0.811 and 0.869, respectively. Model performance was also found to be stable over the 5-year period used in the experiment, with identification for a likely future holdout test set being no worse than identification for retrospective holdout test sets. Conclusions: Our machine learning model that included NLP (ie, sentiment analysis in medical image description text mining) produced valid identification of delirium with the sentiment analysis, providing significant additional benefit over the model without NLP. %M 36538363 %R 10.2196/38161 %U https://medinform.jmir.org/2022/12/e38161 %U https://doi.org/10.2196/38161 %U http://www.ncbi.nlm.nih.gov/pubmed/36538363 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 1 %N 1 %P e37751 %T Visualizing the Interpretation of a Criteria-Driven System That Automatically Evaluates the Quality of Health News: Exploratory Study of 2 Approaches %A Liu,Xiaoyu %A Alsghaier,Hiba %A Tong,Ling %A Ataullah,Amna %A McRoy,Susan %+ Department of Computer Science, University of Wisconsin Milwaukee, Engineering and Mathematical Sciences Bldg 1275, 3200 N Cramer St, Milwaukee, WI, 53211, United States, 1 414 229 6695, mcroy@uwm.edu %K health misinformation %K machine learning %K local interpretable model-agnostic explanation %K LIME %K interpretable artificial intelligence %K AI %D 2022 %7 20.12.2022 %9 Original Paper %J JMIR AI %G English %X Background: Machine learning techniques have been shown to be efficient in identifying health misinformation, but the results may not be trusted unless they can be justified in a way that is understandable. Objective: This study aimed to provide a new criteria-based system to assess and justify health news quality. Using a subset of an existing set of criteria, this study compared the feasibility of 2 alternative methods for adding interpretability. Both methods used classification and highlighting to visualize sentence-level evidence. Methods: A total of 3 out of 10 well-established criteria were chosen for experimentation, namely whether the health news discussed the costs of the intervention (the cost criterion), explained or quantified the harms of the intervention (the harm criterion), and identified the conflicts of interest (the conflict criterion). The first step of the experiment was to automate the evaluation of the 3 criteria by developing a sentence-level classifier. We tested Logistic Regression, Naive Bayes, Support Vector Machine, and Random Forest algorithms. Next, we compared the 2 visualization approaches. For the first approach, we calculated word feature weights, which explained how classification models distill keywords that contribute to the prediction; then, using the local interpretable model-agnostic explanation framework, we selected keywords associated with the classified criterion at the document level; and finally, the system selected and highlighted sentences with keywords. For the second approach, we extracted sentences that provided evidence to support the evaluation result from 100 health news articles; based on these results, we trained a typology classification model at the sentence level; and then, the system highlighted a positive sentence instance for the result justification. The number of sentences to highlight was determined by a preset threshold empirically determined using the average accuracy. Results: The automatic evaluation of health news on the cost, harm, and conflict criteria achieved average area under the curve scores of 0.88, 0.76, and 0.73, respectively, after 50 repetitions of 10-fold cross-validation. We found that both approaches could successfully visualize the interpretation of the system but that the performance of the 2 approaches varied by criterion and highlighting the accuracy decreased as the number of highlighted sentences increased. When the threshold accuracy was ≥75%, this resulted in a visualization with a variable length ranging from 1 to 6 sentences. Conclusions: We provided 2 approaches to interpret criteria-based health news evaluation models tested on 3 criteria. This method incorporated rule-based and statistical machine learning approaches. The results suggested that one might visually interpret an automatic criterion-based health news quality evaluation successfully using either approach; however, larger differences may arise when multiple quality-related criteria are considered. This study can increase public trust in computerized health information evaluation. %M 38875559 %R 10.2196/37751 %U https://ai.jmir.org/2022/1/e37751 %U https://doi.org/10.2196/37751 %U http://www.ncbi.nlm.nih.gov/pubmed/38875559 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 12 %P e42208 %T Exploring Stakeholder Requirements to Enable the Research and Development of Artificial Intelligence Algorithms in a Hospital-Based Generic Infrastructure: Protocol for a Multistep Mixed Methods Study %A Weinert,Lina %A Klass,Maximilian %A Schneider,Gerd %A Heinze,Oliver %+ Section for Translational Health Economics, Department for Conservative Dentistry, Heidelberg University Hospital, Im Neuenheimer Feld 400, Heidelberg, 69120, Germany, 49 6221 56 ext 34367, lina.weinert@med.uni-heidelberg.de %K artificial intelligence %K requirements analysis %K mixed methods %K innovation %K qualitative research %K health care %K artificial intelligence technology %K diagnostic %K health data %K artificial intelligence infrastructure %K technology development %D 2022 %7 16.12.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: In recent years, research and developments in advancing artificial intelligence (AI) in health care and medicine have increased. High expectations surround the use of AI technologies, such as improvements for diagnosis and increases in the quality of care with reductions in health care costs. The successful development and testing of new AI algorithms require large amounts of high-quality data. Academic hospitals could provide the data needed for AI development, but granting legal, controlled, and regulated access to these data for developers and researchers is difficult. Therefore, the German Federal Ministry of Health supports the Protected Artificial Intelligence Innovation Environment for Patient-Oriented Digital Health Solutions for Developing, Testing, and Evidence-Based Evaluation of Clinical Value (pAItient) project, aiming to install the AI Innovation Environment at the Heidelberg University Hospital in Germany. The AI Innovation Environment was designed as a proof-of-concept extension of the already existing Medical Data Integration Center. It will establish a process to support every step of developing and testing AI-based technologies. Objective: The first part of the pAItient project, as presented in this research protocol, aims to explore stakeholders’ requirements for developing AI in partnership with an academic hospital and granting AI experts access to anonymized personal health data. Methods: We planned a multistep mixed methods approach. In the first step, researchers and employees from stakeholder organizations were invited to participate in semistructured interviews. In the following step, questionnaires were developed based on the participants’ answers and distributed among the stakeholders’ organizations to quantify qualitative findings and discover important aspects that were not mentioned by the interviewees. The questionnaires will be analyzed descriptively. In addition, patients and physicians were interviewed as well. No survey questionnaires were developed for this second group of participants. The study was approved by the Ethics Committee of the Heidelberg University Hospital (approval number: S-241/2021). Results: Data collection concluded in summer 2022. Data analysis is planned to start in fall 2022. We plan to publish the results in winter 2022 to 2023. Conclusions: The results of our study will help in shaping the AI Innovation Environment at our academic hospital according to stakeholder requirements. With this approach, in turn, we aim to shape an AI environment that is effective and is deemed acceptable by all parties. International Registered Report Identifier (IRRID): DERR1-10.2196/42208 %M 36525300 %R 10.2196/42208 %U https://www.researchprotocols.org/2022/12/e42208 %U https://doi.org/10.2196/42208 %U http://www.ncbi.nlm.nih.gov/pubmed/36525300 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 12 %P e43757 %T Model for Predicting In-Hospital Mortality of Physical Trauma Patients Using Artificial Intelligence Techniques: Nationwide Population-Based Study in Korea %A Lee,Seungseok %A Kang,Wu Seong %A Seo,Sanghyun %A Kim,Do Wan %A Ko,Hoon %A Kim,Joongsuck %A Lee,Seonghwa %A Lee,Jinseok %+ Department of Biomedical Engineering, Kyung Hee University, 1732, Deogyeong-daero, Giheung-gu, Yong-in, 17104, Republic of Korea, 82 312012570, gonasago@khu.ac.kr %K artificial intelligence %K trauma %K mortality prediction %K international classification of disease %K injury %K prediction model %K severity score %K emergency department %K Information system %K deep neural network %D 2022 %7 13.12.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Physical trauma–related mortality places a heavy burden on society. Estimating the mortality risk in physical trauma patients is crucial to enhance treatment efficiency and reduce this burden. The most popular and accurate model is the Injury Severity Score (ISS), which is based on the Abbreviated Injury Scale (AIS), an anatomical injury severity scoring system. However, the AIS requires specialists to code the injury scale by reviewing a patient's medical record; therefore, applying the model to every hospital is impossible. Objective: We aimed to develop an artificial intelligence (AI) model to predict in-hospital mortality in physical trauma patients using the International Classification of Disease 10th Revision (ICD-10), triage scale, procedure codes, and other clinical features. Methods: We used the Korean National Emergency Department Information System (NEDIS) data set (N=778,111) compiled from over 400 hospitals between 2016 and 2019. To predict in-hospital mortality, we used the following as input features: ICD-10, patient age, gender, intentionality, injury mechanism, and emergent symptom, Alert/Verbal/Painful/Unresponsive (AVPU) scale, Korean Triage and Acuity Scale (KTAS), and procedure codes. We proposed the ensemble of deep neural networks (EDNN) via 5-fold cross-validation and compared them with other state-of-the-art machine learning models, including traditional prediction models. We further investigated the effect of the features. Results: Our proposed EDNN with all features provided the highest area under the receiver operating characteristic (AUROC) curve of 0.9507, outperforming other state-of-the-art models, including the following traditional prediction models: Adaptive Boosting (AdaBoost; AUROC of 0.9433), Extreme Gradient Boosting (XGBoost; AUROC of 0.9331), ICD-based ISS (AUROC of 0.8699 for an inclusive model and AUROC of 0.8224 for an exclusive model), and KTAS (AUROC of 0.1841). In addition, using all features yielded a higher AUROC than any other partial features, namely, EDNN with the features of ICD-10 only (AUROC of 0.8964) and EDNN with the features excluding ICD-10 (AUROC of 0.9383). Conclusions: Our proposed EDNN with all features outperforms other state-of-the-art models, including the traditional diagnostic code-based prediction model and triage scale. %M 36512392 %R 10.2196/43757 %U https://www.jmir.org/2022/12/e43757 %U https://doi.org/10.2196/43757 %U http://www.ncbi.nlm.nih.gov/pubmed/36512392 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 5 %N 4 %P e39113 %T Issues in Melanoma Detection: Semisupervised Deep Learning Algorithm Development via a Combination of Human and Artificial Intelligence %A Zhang,Xinyuan %A Xie,Ziqian %A Xiang,Yang %A Baig,Imran %A Kozman,Mena %A Stender,Carly %A Giancardo,Luca %A Tao,Cui %+ School of Biomedical Informatics, The University of Texas Health Science Center at Houston, 7000 Fannin St, Suite 600, Houston, TX, 77030, United States, 1 7135003981, cui.tao@uth.tmc.edu %K deep learning %K dermoscopic images %K semisupervised learning %K 3-point checklist %K skin lesion %K dermatology %K algorithm %K melanoma classification %K melanoma %K automatic diagnosis %K skin disease %D 2022 %7 12.12.2022 %9 Original Paper %J JMIR Dermatol %G English %X Background: Automatic skin lesion recognition has shown to be effective in increasing access to reliable dermatology evaluation; however, most existing algorithms rely solely on images. Many diagnostic rules, including the 3-point checklist, are not considered by artificial intelligence algorithms, which comprise human knowledge and reflect the diagnosis process of human experts. Objective: In this paper, we aimed to develop a semisupervised model that can not only integrate the dermoscopic features and scoring rule from the 3-point checklist but also automate the feature-annotation process. Methods: We first trained the semisupervised model on a small, annotated data set with disease and dermoscopic feature labels and tried to improve the classification accuracy by integrating the 3-point checklist using ranking loss function. We then used a large, unlabeled data set with only disease label to learn from the trained algorithm to automatically classify skin lesions and features. Results: After adding the 3-point checklist to our model, its performance for melanoma classification improved from a mean of 0.8867 (SD 0.0191) to 0.8943 (SD 0.0115) under 5-fold cross-validation. The trained semisupervised model can automatically detect 3 dermoscopic features from the 3-point checklist, with best performances of 0.80 (area under the curve [AUC] 0.8380), 0.89 (AUC 0.9036), and 0.76 (AUC 0.8444), in some cases outperforming human annotators. Conclusions: Our proposed semisupervised learning framework can help with the automatic diagnosis of skin disease based on its ability to detect dermoscopic features and automate the label-annotation process. The framework can also help combine semantic knowledge with a computer algorithm to arrive at a more accurate and more interpretable diagnostic result, which can be applied to broader use cases. %M 37632881 %R 10.2196/39113 %U https://derma.jmir.org/2022/4/e39113 %U https://doi.org/10.2196/39113 %U http://www.ncbi.nlm.nih.gov/pubmed/37632881 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 12 %P e41819 %T A Machine Learning-Based Approach to Predict Prognosis and Length of Hospital Stay in Adults and Children With Traumatic Brain Injury: Retrospective Cohort Study %A Fang,Cheng %A Pan,Yifeng %A Zhao,Luotong %A Niu,Zhaoyi %A Guo,Qingguo %A Zhao,Bing %+ Department of Neurosurgery, The Second Affiliated Hospital of Anhui Medical University, Anhui Medical University, No. 678, Furong Raod, Hefei, 230601, China, 86 138 6611 2073, aydzhb@126.com %K convolutional neural network %K machine learning %K neurosurgery %K support vector machine %K support vector regression %K traumatic brain injury %D 2022 %7 9.12.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: The treatment and care of adults and children with traumatic brain injury (TBI) constitute an intractable global health problem. Predicting the prognosis and length of hospital stay of patients with TBI may improve therapeutic effects and significantly reduce societal health care burden. Applying novel machine learning methods to the field of TBI may be valuable for determining the prognosis and cost-effectiveness of clinical treatment. Objective: We aimed to combine multiple machine learning approaches to build hybrid models for predicting the prognosis and length of hospital stay for adults and children with TBI. Methods: We collected relevant clinical information from patients treated at the Neurosurgery Center of the Second Affiliated Hospital of Anhui Medical University between May 2017 and May 2022, of which 80% was used for training the model and 20% for testing via screening and data splitting. We trained and tested the machine learning models using 5 cross-validations to avoid overfitting. In the machine learning models, 11 types of independent variables were used as input variables and Glasgow Outcome Scale score, used to evaluate patients’ prognosis, and patient length of stay were used as output variables. Once the models were trained, we obtained and compared the errors of each machine learning model from 5 rounds of cross-validation to select the best predictive model. The model was then externally tested using clinical data of patients treated at the First Affiliated Hospital of Anhui Medical University from June 2021 to February 2022. Results: The final convolutional neural network–support vector machine (CNN-SVM) model predicted Glasgow Outcome Scale score with an accuracy of 93% and 93.69% in the test and external validation sets, respectively, and an area under the curve of 94.68% and 94.32% in the test and external validation sets, respectively. The mean absolute percentage error of the final built convolutional neural network–support vector regression (CNN-SVR) model predicting inpatient time in the test set and external validation set was 10.72% and 10.44%, respectively. The coefficient of determination (R2) was 0.93 and 0.92 in the test set and external validation set, respectively. Compared with back-propagation neural network, CNN, and SVM models built separately, our hybrid model was identified to be optimal and had high confidence. Conclusions: This study demonstrates the clinical utility of 2 hybrid models built by combining multiple machine learning approaches to accurately predict the prognosis and length of stay in hospital for adults and children with TBI. Application of these models may reduce the burden on physicians when assessing TBI and assist clinicians in the medical decision-making process. %M 36485032 %R 10.2196/41819 %U https://www.jmir.org/2022/12/e41819 %U https://doi.org/10.2196/41819 %U http://www.ncbi.nlm.nih.gov/pubmed/36485032 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 8 %N 12 %P e38158 %T Machine Learning Techniques to Explore Clinical Presentations of COVID-19 Severity and to Test the Association With Unhealthy Opioid Use: Retrospective Cross-sectional Cohort Study %A Thompson,Hale M %A Sharma,Brihat %A Smith,Dale L %A Bhalla,Sameer %A Erondu,Ihuoma %A Hazra,Aniruddha %A Ilyas,Yousaf %A Pachwicewicz,Paul %A Sheth,Neeral K %A Chhabra,Neeraj %A Karnik,Niranjan S %A Afshar,Majid %+ Section of Community Behavioral Health, Department of Psychiatry and Behavioral Sciences, Rush University Medical Center, Suite 302, 1645 W. Jackson Boulevard, Chicago, IL, 60612, United States, 1 4153108569, hale_thompson@rush.edu %K unhealthy opioid use %K substance misuse %K COVID-19 %K severity of illness %K overdose %K topic modeling %K machine learning %K opioid use %K pandemic %K health outcome %K public health %K disease severity %K electronic health record %K COVID-19 outcome %K risk factor %K patient data %D 2022 %7 8.12.2022 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: The COVID-19 pandemic has exacerbated health inequities in the United States. People with unhealthy opioid use (UOU) may face disproportionate challenges with COVID-19 precautions, and the pandemic has disrupted access to opioids and UOU treatments. UOU impairs the immunological, cardiovascular, pulmonary, renal, and neurological systems and may increase severity of outcomes for COVID-19. Objective: We applied machine learning techniques to explore clinical presentations of hospitalized patients with UOU and COVID-19 and to test the association between UOU and COVID-19 disease severity. Methods: This retrospective, cross-sectional cohort study was conducted based on data from 4110 electronic health record patient encounters at an academic health center in Chicago between January 1, 2020, and December 31, 2020. The inclusion criterion was an unplanned admission of a patient aged ≥18 years; encounters were counted as COVID-19-positive if there was a positive test for COVID-19 or 2 COVID-19 International Classification of Disease, Tenth Revision codes. Using a predefined cutoff with optimal sensitivity and specificity to identify UOU, we ran a machine learning UOU classifier on the data for patients with COVID-19 to estimate the subcohort of patients with UOU. Topic modeling was used to explore and compare the clinical presentations documented for 2 subgroups: encounters with UOU and COVID-19 and those with no UOU and COVID-19. Mixed effects logistic regression accounted for multiple encounters for some patients and tested the association between UOU and COVID-19 outcome severity. Severity was measured with 3 utilization metrics: low-severity unplanned admission, medium-severity unplanned admission and receiving mechanical ventilation, and high-severity unplanned admission with in-hospital death. All models controlled for age, sex, race/ethnicity, insurance status, and BMI. Results: Topic modeling yielded 10 topics per subgroup and highlighted unique comorbidities associated with UOU and COVID-19 (eg, HIV) and no UOU and COVID-19 (eg, diabetes). In the regression analysis, each incremental increase in the classifier’s predicted probability of UOU was associated with 1.16 higher odds of COVID-19 outcome severity (odds ratio 1.16, 95% CI 1.04-1.29; P=.009). Conclusions: Among patients hospitalized with COVID-19, UOU is an independent risk factor associated with greater outcome severity, including in-hospital death. Social determinants of health and opioid-related overdose are unique comorbidities in the clinical presentation of the UOU patient subgroup. Additional research is needed on the role of COVID-19 therapeutics and inpatient management of acute COVID-19 pneumonia for patients with UOU. Further research is needed to test associations between expanded evidence-based harm reduction strategies for UOU and vaccination rates, hospitalizations, and risks for overdose and death among people with UOU and COVID-19. Machine learning techniques may offer more exhaustive means for cohort discovery and a novel mixed methods approach to population health. %M 36265163 %R 10.2196/38158 %U https://publichealth.jmir.org/2022/12/e38158 %U https://doi.org/10.2196/38158 %U http://www.ncbi.nlm.nih.gov/pubmed/36265163 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 12 %P e40589 %T Applications of Artificial Intelligence to Obesity Research: Scoping Review of Methodologies %A An,Ruopeng %A Shen,Jing %A Xiao,Yunyu %+ Department of Physical Education, China University of Geosciences, No. 29, Xueyuan Road, Haidian District, Beijing, 100083, China, 86 010 82322397, shenjing@cugb.edu.cn %K artificial intelligence %K deep learning %K machine learning %K obesity %K scoping review %D 2022 %7 7.12.2022 %9 Review %J J Med Internet Res %G English %X Background: Obesity is a leading cause of preventable death worldwide. Artificial intelligence (AI), characterized by machine learning (ML) and deep learning (DL), has become an indispensable tool in obesity research. Objective: This scoping review aimed to provide researchers and practitioners with an overview of the AI applications to obesity research, familiarize them with popular ML and DL models, and facilitate the adoption of AI applications. Methods: We conducted a scoping review in PubMed and Web of Science on the applications of AI to measure, predict, and treat obesity. We summarized and categorized the AI methodologies used in the hope of identifying synergies, patterns, and trends to inform future investigations. We also provided a high-level, beginner-friendly introduction to the core methodologies to facilitate the dissemination and adoption of various AI techniques. Results: We identified 46 studies that used diverse ML and DL models to assess obesity-related outcomes. The studies found AI models helpful in detecting clinically meaningful patterns of obesity or relationships between specific covariates and weight outcomes. The majority (18/22, 82%) of the studies comparing AI models with conventional statistical approaches found that the AI models achieved higher prediction accuracy on test data. Some (5/46, 11%) of the studies comparing the performances of different AI models revealed mixed results, indicating the high contingency of model performance on the data set and task it was applied to. An accelerating trend of adopting state-of-the-art DL models over standard ML models was observed to address challenging computer vision and natural language processing tasks. We concisely introduced the popular ML and DL models and summarized their specific applications in the studies included in the review. Conclusions: This study reviewed AI-related methodologies adopted in the obesity literature, particularly ML and DL models applied to tabular, image, and text data. The review also discussed emerging trends such as multimodal or multitask AI models, synthetic data generation, and human-in-the-loop that may witness increasing applications in obesity research. %M 36476515 %R 10.2196/40589 %U https://www.jmir.org/2022/12/e40589 %U https://doi.org/10.2196/40589 %U http://www.ncbi.nlm.nih.gov/pubmed/36476515 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 11 %N 2 %P e38655 %T Levels of Autonomous Radiology %A Ghuwalewala,Suraj %A Kulkarni,Viraj %A Pant,Richa %A Kharat,Amit %+ DeepTek Medical Imaging Pvt Ltd, 3rd Floor, Ideas to Impact, 3, Baner Rd, Pallod Farms, Baner, Pune, 411045, India, 91 72760 60080, richa.pant@deeptek.ai %K artificial intelligence %K automation %K machine learning %K radiology %K explainability %K model decay %K generalizability %K fairness and bias %K distributed learning %K autonomous radiology %K AI assistance %D 2022 %7 7.12.2022 %9 Viewpoint %J Interact J Med Res %G English %X Radiology, being one of the younger disciplines of medicine with a history of just over a century, has witnessed tremendous technological advancements and has revolutionized the way we practice medicine today. In the last few decades, medical imaging modalities have generated seismic amounts of medical data. The development and adoption of artificial intelligence applications using this data will lead to the next phase of evolution in radiology. It will include automating laborious manual tasks such as annotations, report generation, etc, along with the initial radiological assessment of patients and imaging features to aid radiologists in their diagnostic and treatment planning workflow. We propose a level-wise classification for the progression of automation in radiology, explaining artificial intelligence assistance at each level with the corresponding challenges and solutions. We hope that such discussions can help us address challenges in a structured way and take the necessary steps to ensure the smooth adoption of new technologies in radiology. %M 36476422 %R 10.2196/38655 %U https://www.i-jmr.org/2022/2/e38655 %U https://doi.org/10.2196/38655 %U http://www.ncbi.nlm.nih.gov/pubmed/36476422 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 12 %P e41163 %T Using Deep Transfer Learning to Detect Hyperkalemia From Ambulatory Electrocardiogram Monitors in Intensive Care Units: Personalized Medicine Approach %A Chiu,I-Min %A Cheng,Jhu-Yin %A Chen,Tien-Yu %A Wang,Yi-Min %A Cheng,Chi-Yung %A Kung,Chia-Te %A Cheng,Fu-Jen %A Yau,Fei-Fei Flora %A Lin,Chun-Hung Richard %+ Department of Computer Science and Engineering, National Sun Yat-sen University, No 70, Lienhai Rd, Kaohsiung City, 804, Taiwan, 886 07 5252000 ext 4301, lin@cse.nsysu.edu.tw %K deep learning %K transfer learning %K hyperkalemia %K electrocardiogram %K ECG monitor %K ICU %K personalized medicine %D 2022 %7 5.12.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Hyperkalemia is a critical condition, especially in intensive care units. So far, there have been no accurate and noninvasive methods for recognizing hyperkalemia events on ambulatory electrocardiogram monitors. Objective: This study aimed to improve the accuracy of hyperkalemia predictions from ambulatory electrocardiogram (ECG) monitors using a personalized transfer learning method; this would be done by training a generic model and refining it with personal data. Methods: This retrospective cohort study used open source data from the Waveform Database Matched Subset of the Medical Information Mart From Intensive Care III (MIMIC-III). We included patients with multiple serum potassium test results and matched ECG data from the MIMIC-III database. A 1D convolutional neural network–based deep learning model was first developed to predict hyperkalemia in a generic population. Once the model achieved a state-of-the-art performance, it was used in an active transfer learning process to perform patient-adaptive heartbeat classification tasks. Results: The results show that by acquiring data from each new patient, the personalized model can improve the accuracy of hyperkalemia detection significantly, from an average of 0.604 (SD 0.211) to 0.980 (SD 0.078), when compared with the generic model. Moreover, the area under the receiver operating characteristic curve level improved from 0.729 (SD 0.240) to 0.945 (SD 0.094). Conclusions: By using the deep transfer learning method, we were able to build a clinical standard model for hyperkalemia detection using ambulatory ECG monitors. These findings could potentially be extended to applications that continuously monitor one’s ECGs for early alerts of hyperkalemia and help avoid unnecessary blood tests. %M 36469396 %R 10.2196/41163 %U https://www.jmir.org/2022/12/e41163 %U https://doi.org/10.2196/41163 %U http://www.ncbi.nlm.nih.gov/pubmed/36469396 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 9 %N 4 %P e38799 %T Public Trust in Artificial Intelligence Applications in Mental Health Care: Topic Modeling Analysis %A Shan,Yi %A Ji,Meng %A Xie,Wenxiu %A Lam,Kam-Yiu %A Chow,Chi-Yin %+ Nantong University, No 9, Seyuan Rd, Nantong, 226019, China, 86 15558121896, victorsyhz@hotmail.com %K public trust %K public opinion %K AI application %K artificial intelligence %K mental health care %K topic modeling %K topic %K theme %K term %K visualization %K user feedback %K user review %K Google Play %K health app: mHealth %K mobile health %K digital health %K eHealth %K mental health %K mental illness %K mental disorder %D 2022 %7 2.12.2022 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Mental disorders (MDs) impose heavy burdens on health care (HC) systems and affect a growing number of people worldwide. The use of mobile health (mHealth) apps empowered by artificial intelligence (AI) is increasingly being resorted to as a possible solution. Objective: This study adopted a topic modeling (TM) approach to investigate the public trust in AI apps in mental health care (MHC) by identifying the dominant topics and themes in user reviews of the 8 most relevant mental health (MH) apps with the largest numbers of reviewers. Methods: We searched Google Play for the top MH apps with the largest numbers of reviewers, from which we selected the most relevant apps. Subsequently, we extracted data from user reviews posted from January 1, 2020, to April 2, 2022. After cleaning the extracted data using the Python text processing tool spaCy, we ascertained the optimal number of topics, drawing on the coherence scores and used latent Dirichlet allocation (LDA) TM to generate the most salient topics and related terms. We then classified the ascertained topics into different theme categories by plotting them onto a 2D plane via multidimensional scaling using the pyLDAvis visualization tool. Finally, we analyzed these topics and themes qualitatively to better understand the status of public trust in AI apps in MHC. Results: From the top 20 MH apps with the largest numbers of reviewers retrieved, we chose the 8 (40%) most relevant apps: (1) Wysa: Anxiety Therapy Chatbot; (2) Youper Therapy; (3) MindDoc: Your Companion; (4) TalkLife for Anxiety, Depression & Stress; (5) 7 Cups: Online Therapy for Mental Health & Anxiety; (6) BetterHelp-Therapy; (7) Sanvello; and (8) InnerHour. These apps provided 14.2% (n=559), 11.0% (n=431), 13.7% (n=538), 8.8% (n=356), 14.1% (n=554), 11.9% (n=468), 9.2% (n=362), and 16.9% (n=663) of the collected 3931 reviews, respectively. The 4 dominant topics were topic 4 (cheering people up; n=1069, 27%), topic 3 (calming people down; n=1029, 26%), topic 2 (helping figure out the inner world; n=963, 25%), and topic 1 (being an alternative or complement to a therapist; n=870, 22%). Based on topic coherence and intertopic distance, topics 3 and 4 were combined into theme 3 (dispelling negative emotions), while topics 2 and 1 remained 2 separate themes: theme 2 (helping figure out the inner world) and theme 1 (being an alternative or complement to a therapist), respectively. These themes and topics, though involving some dissenting voices, reflected an overall high status of trust in AI apps. Conclusions: This is the first study to investigate the public trust in AI apps in MHC from the perspective of user reviews using the TM technique. The automatic text analysis and complementary manual interpretation of the collected data allowed us to discover the dominant topics hidden in a data set and categorize these topics into different themes to reveal an overall high degree of public trust. The dissenting voices from users, though only a few, can serve as indicators for health providers and app developers to jointly improve these apps, which will ultimately facilitate the treatment of prevalent MDs and alleviate the overburdened HC systems worldwide. %M 36459412 %R 10.2196/38799 %U https://humanfactors.jmir.org/2022/4/e38799 %U https://doi.org/10.2196/38799 %U http://www.ncbi.nlm.nih.gov/pubmed/36459412 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 12 %P e39443 %T Learning the Treatment Process in Radiotherapy Using an Artificial Intelligence–Assisted Chatbot: Development Study %A Rebelo,Nathanael %A Sanders,Leslie %A Li,Kay %A Chow,James C L %+ Radiation Medicine Program, Princess Margaret Cancer Centre, University Health Network, 700 University Ave, 7/F, Room 7-606, Toronto, ON, M5G 1X6, Canada, 1 416 946 4501 ext 5089, james.chow@rmp.uhn.ca %K chatbot %K artificial intelligence %K machine learning %K radiotherapy chain %K radiation treatment process %K communication %K diagnosis %K cancer therapy %K internet of things %K radiation oncology %K medical physics %K health care %D 2022 %7 2.12.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: In knowledge transfer for educational purposes, most cancer hospital or center websites have existing information on cancer health. However, such information is usually a list of topics that are neither interactive nor customized to offer any personal touches to people facing dire health crisis and to attempt to understand the concerns of the users. Patients with cancer, their families, and the general public accessing the information are often in challenging, stressful situations, wanting to access accurate information as efficiently as possible. In addition, there is seldom any comprehensive information specifically on radiotherapy, despite the large number of older patients with cancer, to go through the treatment process. Therefore, having someone with professional knowledge who can listen to them and provide the medical information with good will and encouragement would help patients and families struggling with critical illness, particularly during the lingering pandemic. Objective: This study created a novel virtual assistant, a chatbot that can explain the radiation treatment process to stakeholders comprehensively and accurately, in the absence of any similar software. This chatbot was created using the IBM Watson Assistant with artificial intelligence and machine learning features. The chatbot or bot was incorporated into a resource that can be easily accessed by the general public. Methods: The radiation treatment process in a cancer hospital or center was described by the radiotherapy process: patient diagnosis, consultation, and prescription; patient positioning, immobilization, and simulation; 3D-imaging for treatment planning; target and organ contouring; radiation treatment planning; patient setup and plan verification; and treatment delivery. The bot was created using IBM Watson (IBM Corp) assistant. The natural language processing feature in the Watson platform allowed the bot to flow through a given conversation structure and recognize how the user responds based on recognition of similar given examples, referred to as intents during development. Therefore, the bot can be trained using the responses received, by recognizing similar responses from the user and analyzing using Watson natural language processing. Results: The bot is hosted on a website by the Watson application programming interface. It is capable of guiding the user through the conversation structure and can respond to simple questions and provide resources for requests for information that was not directly programmed into the bot. The bot was tested by potential users, and the overall averages of the identified metrics are excellent. The bot can also acquire users’ feedback for further improvements in the routine update. Conclusions: An artificial intelligence–assisted chatbot was created for knowledge transfer regarding radiation treatment process to the patients with cancer, their families, and the general public. The bot that is supported by machine learning was tested, and it was found that the bot can provide information about radiotherapy effectively. %M 36327383 %R 10.2196/39443 %U https://formative.jmir.org/2022/12/e39443 %U https://doi.org/10.2196/39443 %U http://www.ncbi.nlm.nih.gov/pubmed/36327383 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 12 %P e42163 %T Interpretable Machine Learning Prediction of Drug-Induced QT Prolongation: Electronic Health Record Analysis %A Simon,Steven T %A Trinkley,Katy E %A Malone,Daniel C %A Rosenberg,Michael Aaron %+ Division of Cardiac Electrophysiology, University of Colorado School of Medicine, 12631 E. 17th Ave, Aurora, CO, 80045, United States, 1 720 500 3621, michael.a.rosenberg@cuanschutz.edu %K drug-induced QT prolongation %K predictive modeling %K interpretable machine learning %K ML %K artificial intelligence %K AI %K electronic health records %K EHR %K prediction %K risk %K monitoring %K deep learning %D 2022 %7 1.12.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Drug-induced long-QT syndrome (diLQTS) is a major concern among patients who are hospitalized, for whom prediction models capable of identifying individualized risk could be useful to guide monitoring. We have previously demonstrated the feasibility of machine learning to predict the risk of diLQTS, in which deep learning models provided superior accuracy for risk prediction, although these models were limited by a lack of interpretability. Objective: In this investigation, we sought to examine the potential trade-off between interpretability and predictive accuracy with the use of more complex models to identify patients at risk for diLQTS. We planned to compare a deep learning algorithm to predict diLQTS with a more interpretable algorithm based on cluster analysis that would allow medication- and subpopulation-specific evaluation of risk. Methods: We examined the risk of diLQTS among 35,639 inpatients treated between 2003 and 2018 with at least 1 of 39 medications associated with risk of diLQTS and who had an electrocardiogram in the system performed within 24 hours of medication administration. Predictors included over 22,000 diagnoses and medications at the time of medication administration, with cases of diLQTS defined as a corrected QT interval over 500 milliseconds after treatment with a culprit medication. The interpretable model was developed using cluster analysis (K=4 clusters), and risk was assessed for specific medications and classes of medications. The deep learning model was created using all predictors within a 6-layer neural network, based on previously identified hyperparameters. Results: Among the medications, we found that class III antiarrhythmic medications were associated with increased risk across all clusters, and that in patients who are noncritically ill without cardiovascular disease, propofol was associated with increased risk, whereas ondansetron was associated with decreased risk. Compared with deep learning, the interpretable approach was less accurate (area under the receiver operating characteristic curve: 0.65 vs 0.78), with comparable calibration. Conclusions: In summary, we found that an interpretable modeling approach was less accurate, but more clinically applicable, than deep learning for the prediction of diLQTS. Future investigations should consider this trade-off in the development of methods for clinical prediction. %M 36454608 %R 10.2196/42163 %U https://www.jmir.org/2022/12/e42163 %U https://doi.org/10.2196/42163 %U http://www.ncbi.nlm.nih.gov/pubmed/36454608 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 12 %P e40485 %T Integration of Artificial Intelligence Into Sociotechnical Work Systems—Effects of Artificial Intelligence Solutions in Medical Imaging on Clinical Efficiency: Protocol for a Systematic Literature Review %A Wenderott,Katharina %A Gambashidze,Nikoloz %A Weigl,Matthias %+ Institute for Patient Safety, University Hospital Bonn, Venusberg-Campus 1, Building A 02, Bonn, 53127, Germany, 49 228287 ext 13781, katharina.wenderott@ukbonn.de %K artificial intelligence %K clinical care %K clinical efficiency %K sociotechnical work system %K sociotechnical %K review methodology %K systematic review %K facilitator %K barrier %K diagnostic %K diagnosis %K diagnoses %K digital health %K adoption %K implementation %K literature review %K literature search %K search strategy %K library science %K medical librarian %K narrative review %K narrative synthesis %D 2022 %7 1.12.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: When introducing artificial intelligence (AI) into clinical care, one of the main objectives is to improve workflow efficiency because AI-based solutions are expected to take over or support routine tasks. Objective: This study sought to synthesize the current knowledge base on how the use of AI technologies for medical imaging affects efficiency and what facilitators or barriers moderating the impact of AI implementation have been reported. Methods: In this systematic literature review, comprehensive literature searches will be performed in relevant electronic databases, including PubMed/MEDLINE, Embase, PsycINFO, Web of Science, IEEE Xplore, and CENTRAL. Studies in English and German published from 2000 onwards will be included. The following inclusion criteria will be applied: empirical studies targeting the workflow integration or adoption of AI-based software in medical imaging used for diagnostic purposes in a health care setting. The efficiency outcomes of interest include workflow adaptation, time to complete tasks, and workload. Two reviewers will independently screen all retrieved records, full-text articles, and extract data. The study’s methodological quality will be appraised using suitable tools. The findings will be described qualitatively, and a meta-analysis will be performed, if possible. Furthermore, a narrative synthesis approach that focuses on work system factors affecting the integration of AI technologies reported in eligible studies will be adopted. Results: This review is anticipated to begin in September 2022 and will be completed in April 2023. Conclusions: This systematic review and synthesis aims to summarize the existing knowledge on efficiency improvements in medical imaging through the integration of AI into clinical workflows. Moreover, it will extract the facilitators and barriers of the AI implementation process in clinical care settings. Therefore, our findings have implications for future clinical implementation processes of AI-based solutions, with a particular focus on diagnostic procedures. This review is additionally expected to identify research gaps regarding the focus on seamless workflow integration of novel technologies in clinical settings. Trial Registration: PROSPERO CRD42022303439; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=303439 International Registered Report Identifier (IRRID): PRR1-10.2196/40485 %M 36454624 %R 10.2196/40485 %U https://www.researchprotocols.org/2022/12/e40485 %U https://doi.org/10.2196/40485 %U http://www.ncbi.nlm.nih.gov/pubmed/36454624 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 11 %P e42185 %T Artificial Intelligence in Intensive Care Medicine: Bibliometric Analysis %A Tang,Ri %A Zhang,Shuyi %A Ding,Chenling %A Zhu,Mingli %A Gao,Yuan %+ Department of Intensive Care Medicine, Renji Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, No 1630, Dongfang Road, Pudong New District, Shanghai, 200127, China, 86 13917816250, shuishui286@qq.com %K intensive care medicine %K artificial intelligence %K bibliometric analysis %K machine learning %K sepsis %D 2022 %7 30.11.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Interest in critical care–related artificial intelligence (AI) research is growing rapidly. However, the literature is still lacking in comprehensive bibliometric studies that measure and analyze scientific publications globally. Objective: The objective of this study was to assess the global research trends in AI in intensive care medicine based on publication outputs, citations, coauthorships between nations, and co-occurrences of author keywords. Methods: A total of 3619 documents published until March 2022 were retrieved from the Scopus database. After selecting the document type as articles, the titles and abstracts were checked for eligibility. In the final bibliometric study using VOSviewer, 1198 papers were included. The growth rate of publications, preferred journals, leading research countries, international collaborations, and top institutions were computed. Results: The number of publications increased steeply between 2018 and 2022, accounting for 72.53% (869/1198) of all the included papers. The United States and China contributed to approximately 55.17% (661/1198) of the total publications. Of the 15 most productive institutions, 9 were among the top 100 universities worldwide. Detecting clinical deterioration, monitoring, predicting disease progression, mortality, prognosis, and classifying disease phenotypes or subtypes were some of the research hot spots for AI in patients who are critically ill. Neural networks, decision support systems, machine learning, and deep learning were all commonly used AI technologies. Conclusions: This study highlights popular areas in AI research aimed at improving health care in intensive care units, offers a comprehensive look at the research trend in AI application in the intensive care unit, and provides an insight into potential collaboration and prospects for future research. The 30 articles that received the most citations were listed in detail. For AI-based clinical research to be sufficiently convincing for routine critical care practice, collaborative research efforts are needed to increase the maturity and robustness of AI-driven models. %M 36449345 %R 10.2196/42185 %U https://www.jmir.org/2022/11/e42185 %U https://doi.org/10.2196/42185 %U http://www.ncbi.nlm.nih.gov/pubmed/36449345 %0 Journal Article %@ 2291-9279 %I JMIR Publications %V 10 %N 4 %P e39840 %T Artificial Intelligence–Driven Serious Games in Health Care: Scoping Review %A Abd-alrazaq,Alaa %A Abuelezz,Israa %A Hassan,Asma %A AlSammarraie,AlHasan %A Alhuwail,Dari %A Irshaidat,Sara %A Abu Serhan,Hashem %A Ahmed,Arfan %A Alabed Alrazak,Sadam %A Househ,Mowafa %+ Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, P.O. Box 34110, Doha Al Luqta St, Ar-Rayyan, Doha, 0000, Qatar, 974 55708549, mhouseh@hbku.edu.qa %K serious games %K artificial intelligence %K deep learning %K machine learning %K health care %K digital health %K eHealth %K mobile phone %D 2022 %7 29.11.2022 %9 Review %J JMIR Serious Games %G English %X Background: Artificial intelligence (AI)–driven serious games have been used in health care to offer a customizable and immersive experience. Summarizing the features of the current AI-driven serious games is very important to explore how they have been developed and used and their current state to plan on how to leverage them in the current and future health care needs. Objective: This study aimed to explore the features of AI-driven serious games in health care as reported by previous research. Methods: We conducted a scoping review to achieve the abovementioned objective. The most popular databases in the information technology and health fields (ie, MEDLINE, PsycInfo, Embase, CINAHL, IEEE Xplore, ACM Digital Library, and Google Scholar) were searched using keywords related to serious games and AI. Two reviewers independently performed the study selection process. Three reviewers independently extracted data from the included studies. A narrative approach was used for data synthesis. Results: The search process returned 1470 records. Of these 1470 records, 46 (31.29%) met all eligibility criteria. A total of 64 different serious games were found in the included studies. Motor impairment was the most common health condition targeted by these serious games. Serious games were used for rehabilitation in most of the studies. The most common genres of serious games were role-playing games, puzzle games, and platform games. Unity was the most prominent game engine used to develop serious games. PCs were the most common platform used to play serious games. The most common algorithm used in the included studies was support vector machine. The most common purposes of AI were the detection of disease and the evaluation of user performance. The size of the data set ranged from 36 to 795,600. The most common validation techniques used in the included studies were k-fold cross-validation and training-test split validation. Accuracy was the most commonly used metric for evaluating the performance of AI models. Conclusions: The last decade witnessed an increase in the development of AI-driven serious games for health care purposes, targeting various health conditions, and leveraging multiple AI algorithms; this rising trend is expected to continue for years to come. Although the evidence uncovered in this study shows promising applications of AI-driven serious games, larger and more rigorous, diverse, and robust studies may be needed to examine the efficacy and effectiveness of AI-driven serious games in different populations with different health conditions. %M 36445731 %R 10.2196/39840 %U https://games.jmir.org/2022/4/e39840 %U https://doi.org/10.2196/39840 %U http://www.ncbi.nlm.nih.gov/pubmed/36445731 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 1 %N 1 %P e38171 %T Adolescents’ Well-being While Using a Mobile Artificial Intelligence–Powered Acceptance Commitment Therapy Tool: Evidence From a Longitudinal Study %A Vertsberger,Dana %A Naor,Navot %A Winsberg,Mirène %+ Stanford University, 450 Jane Stanford Way, Stanford, CA, 94305, United States, 1 6465468241, dverts@stanford.edu %K well-being %K adolescents %K chatbots %K conversational agents %K mental health %K mobile mental health %K automated %K support %K self-management %K self-help %K smartphone %K psychology %K intervention %K psychological %K therapy %K acceptance %K commitment %K engagement %D 2022 %7 29.11.2022 %9 Original Paper %J JMIR AI %G English %X Background: Adolescence is a critical developmental period to prevent and treat the emergence of mental health problems. Smartphone-based conversational agents can deliver psychologically driven intervention and support, thus increasing psychological well-being over time. Objective: The objective of the study was to test the potential of an automated conversational agent named Kai.ai to deliver a self-help program based on Acceptance Commitment Therapy tools for adolescents, aimed to increase their well-being. Methods: Participants were 10,387 adolescents, aged 14-18 years, who used Kai.ai on one of the top messaging apps (eg, iMessage and WhatsApp). Users’ well-being levels were assessed between 2 and 5 times using the 5-item World Health Organization Well-being Index questionnaire over their engagement with the service. Results: Users engaged with the conversational agent an average of 45.39 (SD 46.77) days. The average well-being score at time point 1 was 39.28 (SD 18.17), indicating that, on average, users experienced reduced well-being. Latent growth curve modeling indicated that participants’ well-being significantly increased over time (β=2.49; P<.001) and reached a clinically acceptable well-being average score (above 50). Conclusions: Mobile-based conversational agents have the potential to deliver engaging and effective Acceptance Commitment Therapy interventions. %M 38875600 %R 10.2196/38171 %U https://ai.jmir.org/2022/1/e38171 %U https://doi.org/10.2196/38171 %U http://www.ncbi.nlm.nih.gov/pubmed/38875600 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 11 %P e42343 %T Feasibility of a Reinforcement Learning–Enabled Digital Health Intervention to Promote Mammograms: Retrospective, Single-Arm, Observational Study %A Bucher,Amy %A Blazek,E Susanne %A West,Ashley B %+ Lirio, 320 Corporate Drive, NW, Knoxville, TN, 37923, United States, 1 865 839 6158, abucher@lirio.com %K artificial intelligence %K reinforcement learning %K feasibility studies %K mammograms %K nudging %K behavioral intervention %K digital health %K email %K health equity %K cancer screening %D 2022 %7 28.11.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Preventive screenings such as mammograms promote health and detect disease. However, mammogram attendance lags clinical guidelines, with roughly one-quarter of women not completing their recommended mammograms. A scalable digital health intervention leveraging behavioral science and reinforcement learning and delivered via email was implemented in a US health system to promote uptake of recommended mammograms among patients who were 1 or more years overdue for the screening (ie, 2 or more years from last mammogram). Objective: The aim of this study was to establish the feasibility of a reinforcement learning–enabled mammography digital health intervention delivered via email. The research aims included understanding the intervention’s reach and ability to elicit behavioral outcomes of scheduling and attending mammograms, as well as understanding reach and behavioral outcomes for women of different ages, races, educational attainment levels, and household incomes. Methods: The digital health intervention was implemented in a large Catholic health system in the Midwestern United States and targeted the system’s existing patients who had not received a recommended mammogram in 2 or more years. From August 2020 to July 2022, 139,164 eligible women received behavioral science–based email messages assembled and delivered by a reinforcement learning model to encourage clinically recommended mammograms. Target outcome behaviors included scheduling and ultimately attending the mammogram appointment. Results: In total, 139,164 women received at least one intervention email during the study period, and 81.52% engaged with at least one email. Deliverability of emails exceeded 98%. Among message recipients, 24.99% scheduled mammograms and 22.02% attended mammograms (88.08% attendance rate among women who scheduled appointments). Results indicate no practical differences in the frequency at which people engage with the intervention or take action following a message based on their age, race, educational attainment, or household income, suggesting the intervention may equitably drive mammography across diverse populations. Conclusions: The reinforcement learning–enabled email intervention is feasible to implement in a health system to engage patients who are overdue for their mammograms to schedule and attend a recommended screening. In this feasibility study, the intervention was associated with scheduling and attending mammograms for patients who were significantly overdue for recommended screening. Moreover, the intervention showed proportionate reach across demographic subpopulations. This suggests that the intervention may be effective at engaging patients of many different backgrounds who are overdue for screening. Future research will establish the effectiveness of this type of intervention compared to typical health system outreach to patients who have not had recommended screenings as well as identify ways to enhance its reach and impact. %M 36441579 %R 10.2196/42343 %U https://formative.jmir.org/2022/11/e42343 %U https://doi.org/10.2196/42343 %U http://www.ncbi.nlm.nih.gov/pubmed/36441579 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 11 %P e42853 %T Digital Pattern Recognition for the Identification of Various Hypospadias Parameters via an Artificial Neural Network: Protocol for the Development and Validation of a System and Mobile App %A Wahyudi,Irfan %A Utomo,Chandra Prasetyo %A Djauzi,Samsuridjal %A Fathurahman,Muhamad %A Situmorang,Gerhard Reinaldi %A Rodjani,Arry %A Yonathan,Kevin %A Santoso,Budi %+ Department of Urology, Faculty of Medicine, Universitas Indonesia, Jl Dipenogoro No 71, Jakarta, 10430, Indonesia, 62 217867222, irf.wahyudi2011@gmail.com %K artificial intelligence %K digital recognition %K hypospadias %K machine learning %D 2022 %7 25.11.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: Hypospadias remains the most prevalent congenital abnormality in boys worldwide. However, the limited infrastructure and number of pediatric urologists capable of diagnosing and managing the condition hinder the management of hypospadias in Indonesia. The use of artificial intelligence and image recognition is thought to be beneficial in improving the management of hypospadias cases in Indonesia. Objective: We aim to develop and validate a digital pattern recognition system and a mobile app based on an artificial neural network to determine various parameters of hypospadias. Methods: Hypospadias and normal penis images from an age-matched database will be used to train the artificial neural network. Images of 3 aspects of the penis (ventral, dorsal, and lateral aspects, which include the glans, shaft, and scrotum) will be taken from each participant. The images will be labeled with the following hypospadias parameters: hypospadias status, meatal location, meatal shape, the quality of the urethral plate, glans diameter, and glans shape. The data will be uploaded to train the image recognition model. Intrarater and interrater analyses will be performed, using the test images provided to the algorithm. Results: Our study is at the protocol development stage. A preliminary study regarding the system’s development and feasibility will start in December 2022. The results of our study are expected to be available by the end of 2023. Conclusions: A digital pattern recognition system using an artificial neural network will be developed and designed to improve the diagnosis and management of patients with hypospadias, especially those residing in regions with limited infrastructure and health personnel. International Registered Report Identifier (IRRID): PRR1-10.2196/42853 %M 36427238 %R 10.2196/42853 %U https://www.researchprotocols.org/2022/11/e42853 %U https://doi.org/10.2196/42853 %U http://www.ncbi.nlm.nih.gov/pubmed/36427238 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 11 %P e40449 %T Relative Validation of an Artificial Intelligence–Enhanced, Image-Assisted Mobile App for Dietary Assessment in Adults: Randomized Crossover Study %A Moyen,Audrey %A Rappaport,Aviva Ilysse %A Fleurent-Grégoire,Chloé %A Tessier,Anne-Julie %A Brazeau,Anne-Sophie %A Chevalier,Stéphanie %+ School of Human Nutrition, McGill University, 21111 Lakeshore Rd, Sainte-Anne-de-Bellevue, QC, H9X 3V9, Canada, 1 514 398 8603, Stephanie.chevalier@mcgill.ca %K dietary intake %K dietary assessment %K food diary %K food records %K automated self-administered 24-hour recall %K ASA24 %K Keenoa %D 2022 %7 21.11.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Thorough dietary assessment is essential to obtain accurate food and nutrient intake data yet challenging because of the limitations of current methods. Image-based methods may decrease energy underreporting and increase the validity of self-reported dietary intake. Keenoa is an image-assisted food diary that integrates artificial intelligence food recognition. We hypothesized that Keenoa is as valid for dietary assessment as the automated self-administered 24-hour recall (ASA24)–Canada and better appreciated by users. Objective: We aimed to evaluate the relative validity of Keenoa against a 24-hour validated web-based food recall platform (ASA24) in both healthy individuals and those living with diabetes. Secondary objectives were to compare the proportion of under- and overreporters between tools and to assess the user’s appreciation of the tools. Methods: We used a randomized crossover design, and participants completed 4 days of Keenoa food tracking and 4 days of ASA24 food recalls. The System Usability Scale was used to assess perceived ease of use. Differences in reported intakes were analyzed using 2-tailed paired t tests or Wilcoxon signed-rank test and deattenuated correlations by Spearman coefficient. Agreement and bias were determined using the Bland-Altman test. Weighted Cohen κ was used for cross-classification analysis. Energy underreporting was defined as a ratio of reported energy intake to estimated resting energy expenditure <0.9. Results: A total of 136 participants were included (mean 46.1, SD 14.6 years; 49/136, 36% men; 31/136, 22.8% with diabetes). The average reported energy intakes (kcal/d) were 2171 (SD 553) in men with Keenoa and 2118 (SD 566) in men with ASA24 (P=.38) and, in women, 1804 (SD 404) with Keenoa and 1784 (SD 389) with ASA24 (P=.61). The overall mean difference (kcal/d) was −32 (95% CI −97 to 33), with limits of agreement of −789 to 725, indicating acceptable agreement between tools without bias. Mean reported macronutrient, calcium, potassium, and folate intakes did not significantly differ between tools. Reported fiber and iron intakes were higher, and sodium intake lower, with Keenoa than ASA24. Intakes in all macronutrients (r=0.48-0.73) and micronutrients analyzed (r=0.40-0.74) were correlated (all P<.05) between tools. Weighted Cohen κ scores ranged from 0.30 to 0.52 (all P<.001). The underreporting rate was 8.8% (12/136) with both tools. Mean System Usability Scale scores were higher for Keenoa than ASA24 (77/100, 77% vs 53/100, 53%; P<.001); 74.8% (101/135) of participants preferred Keenoa. Conclusions: The Keenoa app showed moderate to strong relative validity against ASA24 for energy, macronutrient, and most micronutrient intakes analyzed in healthy adults and those with diabetes. Keenoa is a new, alternative tool that may facilitate the work of dietitians and nutrition researchers. The perceived ease of use may improve food-tracking adherence over longer periods. %M 36409539 %R 10.2196/40449 %U https://www.jmir.org/2022/11/e40449 %U https://doi.org/10.2196/40449 %U http://www.ncbi.nlm.nih.gov/pubmed/36409539 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 11 %P e38095 %T Medical Text Simplification Using Reinforcement Learning (TESLEA): Deep Learning–Based Text Simplification Approach %A Phatak,Atharva %A Savage,David W %A Ohle,Robert %A Smith,Jonathan %A Mago,Vijay %+ Department of Computer Science, Lakehead University, 955 Oliver Road, Thunder Bay, ON, P7B 5E1, Canada, 1 8073558351, phataka@lakeheadu.ca %K medical text simplification %K reinforcement learning %K natural language processing %K manual evaluation %D 2022 %7 18.11.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: In most cases, the abstracts of articles in the medical domain are publicly available. Although these are accessible by everyone, they are hard to comprehend for a wider audience due to the complex medical vocabulary. Thus, simplifying these complex abstracts is essential to make medical research accessible to the general public. Objective: This study aims to develop a deep learning–based text simplification (TS) approach that converts complex medical text into a simpler version while maintaining the quality of the generated text. Methods: A TS approach using reinforcement learning and transformer–based language models was developed. Relevance reward, Flesch-Kincaid reward, and lexical simplicity reward were optimized to help simplify jargon-dense complex medical paragraphs to their simpler versions while retaining the quality of the text. The model was trained using 3568 complex-simple medical paragraphs and evaluated on 480 paragraphs via the help of automated metrics and human annotation. Results: The proposed method outperformed previous baselines on Flesch-Kincaid scores (11.84) and achieved comparable performance with other baselines when measured using ROUGE-1 (0.39), ROUGE-2 (0.11), and SARI scores (0.40). Manual evaluation showed that percentage agreement between human annotators was more than 70% when factors such as fluency, coherence, and adequacy were considered. Conclusions: A unique medical TS approach is successfully developed that leverages reinforcement learning and accurately simplifies complex medical paragraphs, thereby increasing their readability. The proposed TS approach can be applied to automatically generate simplified text for complex medical text data, which would enhance the accessibility of biomedical research to a wider audience. %M 36399375 %R 10.2196/38095 %U https://medinform.jmir.org/2022/11/e38095 %U https://doi.org/10.2196/38095 %U http://www.ncbi.nlm.nih.gov/pubmed/36399375 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 5 %N 4 %P e37590 %T Enhancing Food Intake Tracking in Long-term Care With Automated Food Imaging and Nutrient Intake Tracking (AFINI-T) Technology: Validation and Feasibility Assessment %A Pfisterer,Kaylen %A Amelard,Robert %A Boger,Jennifer %A Keller,Heather %A Chung,Audrey %A Wong,Alexander %+ Department of Systems Design Engineering, University of Waterloo, 200 University Ave W, Waterloo, ON, N2L 3G1, Canada, 1 519 888 4567, kaylen.pfisterer@uhn.ca %K long-term care %K automated nutrient intake %K convolutional neural network %K food segmentation %K food classification %K depth imaging %K deep learning %K collaborative design %K aging %K food intake %D 2022 %7 17.11.2022 %9 Original Paper %J JMIR Aging %G English %X Background: Half of long-term care (LTC) residents are malnourished, leading to increased hospitalization, mortality, and morbidity, with low quality of life. Current tracking methods are subjective and time-consuming. Objective: This paper presented the automated food imaging and nutrient intake tracking technology designed for LTC. Methods: A needs assessment was conducted with 21 participating staff across 12 LTC and retirement homes. We created 2 simulated LTC intake data sets comprising modified (664/1039, 63.91% plates) and regular (375/1039, 36.09% plates) texture foods. Overhead red-green-blue-depth images of plated foods were acquired, and foods were segmented using a pretrained food segmentation network. We trained a novel convolutional autoencoder food feature extractor network using an augmented UNIMIB2016 food data set. A meal-specific food classifier was appended to the feature extractor and tested on our simulated LTC food intake data sets. Food intake (percentage) was estimated as the differential volume between classified full portion and leftover plates. Results: The needs assessment yielded 13 nutrients of interest, requirement for objectivity and repeatability, and account for real-world environmental constraints. For 12 meal scenarios with up to 15 classes each, the top-1 classification accuracy was 88.9%, with mean intake error of −0.4 (SD 36.7) mL. Nutrient intake estimation by volume was strongly linearly correlated with nutrient estimates from mass (r2=0.92-0.99), with good agreement between methods (σ=−2.7 to −0.01; 0 within each of the limits of agreement). Conclusions: The automated food imaging and nutrient intake tracking approach is a deep learning–powered computational nutrient sensing system that appears to be feasible (validated accuracy against gold-standard weighed food method, positive end user engagement) and may provide a novel means for more accurate and objective tracking of LTC residents’ food intake to support and prevent malnutrition tracking strategies. %M 36394940 %R 10.2196/37590 %U https://aging.jmir.org/2022/4/e37590 %U https://doi.org/10.2196/37590 %U http://www.ncbi.nlm.nih.gov/pubmed/36394940 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 11 %P e37478 %T Considering Clinician Competencies for the Implementation of Artificial Intelligence–Based Tools in Health Care: Findings From a Scoping Review %A Garvey,Kim V %A Thomas Craig,Kelly Jean %A Russell,Regina %A Novak,Laurie L %A Moore,Don %A Miller,Bonnie M %+ Clinical Evidence Development, Aetna Medical Affairs, CVS Health, 151 Farmington Avenue, RC31, Hartford, CT, 06156, United States, 1 970 261 3366, craigk@aetna.com %K artificial intelligence %K competency %K clinical education %K patient %K digital health %K digital tool %K clinical tool %K health technology %K health care %K educational framework %K decision-making %K clinical decision %K health information %K physician %D 2022 %7 16.11.2022 %9 Review %J JMIR Med Inform %G English %X Background: The use of artificial intelligence (AI)–based tools in the care of individual patients and patient populations is rapidly expanding. Objective: The aim of this paper is to systematically identify research on provider competencies needed for the use of AI in clinical settings. Methods: A scoping review was conducted to identify articles published between January 1, 2009, and May 1, 2020, from MEDLINE, CINAHL, and the Cochrane Library databases, using search queries for terms related to health care professionals (eg, medical, nursing, and pharmacy) and their professional development in all phases of clinical education, AI-based tools in all settings of clinical practice, and professional education domains of competencies and performance. Limits were provided for English language, studies on humans with abstracts, and settings in the United States. Results: The searches identified 3476 records, of which 4 met the inclusion criteria. These studies described the use of AI in clinical practice and measured at least one aspect of clinician competence. While many studies measured the performance of the AI-based tool, only 4 measured clinician performance in terms of the knowledge, skills, or attitudes needed to understand and effectively use the new tools being tested. These 4 articles primarily focused on the ability of AI to enhance patient care and clinical decision-making by improving information flow and display, specifically for physicians. Conclusions: While many research studies were identified that investigate the potential effectiveness of using AI technologies in health care, very few address specific competencies that are needed by clinicians to use them effectively. This highlights a critical gap. %M 36318697 %R 10.2196/37478 %U https://medinform.jmir.org/2022/11/e37478 %U https://doi.org/10.2196/37478 %U http://www.ncbi.nlm.nih.gov/pubmed/36318697 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 11 %P e39536 %T Developing an Artificial Intelligence Model for Reading Chest X-rays: Protocol for a Prospective Validation Study %A Miró Catalina,Queralt %A Fuster-Casanovas,Aïna %A Solé-Casals,Jordi %A Vidal-Alaball,Josep %+ Unitat de Suport a la Recerca de la Catalunya Central, Fundació Institut Universitari per a la Recerca a l'Atenció Primària de Salut Jordi Gol i Gurina, C/ Pica d'Estats 13-15, Sant Fruitós de Bages, 08272, Spain, 34 634810263, qmiro.cc.ics@gencat.cat %K artificial intelligence %K machine learning %K chest x-ray %K radiology %K validation %D 2022 %7 16.11.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: Chest x-rays are the most commonly used type of x-rays today, accounting for up to 26% of all radiographic tests performed. However, chest radiography is a complex imaging modality to interpret. Several studies have reported discrepancies in chest x-ray interpretations among emergency physicians and radiologists. It is of vital importance to be able to offer a fast and reliable diagnosis for this kind of x-ray, using artificial intelligence (AI) to support the clinician. Oxipit has developed an AI algorithm for reading chest x-rays, available through a web platform called ChestEye. This platform is an automatic computer-aided diagnosis system where a reading of the inserted chest x-ray is performed, and an automatic report is returned with a capacity to detect 75 pathologies, covering 90% of diagnoses. Objective: The overall objective of the study is to perform validation with prospective data of the ChestEye algorithm as a diagnostic aid. We wish to validate the algorithm for a single pathology and multiple pathologies by evaluating the accuracy, sensitivity, and specificity of the algorithm. Methods: A prospective validation study will be carried out to compare the diagnosis of the reference radiologists for the users attending the primary care center in the Osona region (Spain), with the diagnosis of the ChestEye AI algorithm. Anonymized chest x-ray images will be acquired and fed into the AI algorithm interface, which will return an automatic report. A radiologist will evaluate the same chest x-ray, and both assessments will be compared to calculate the precision, sensitivity, specificity, and accuracy of the AI algorithm. Results will be represented globally and individually for each pathology using a confusion matrix and the One-vs-All methodology. Results: Patient recruitment was conducted from February 7, 2022, and it is expected that data can be obtained in 5 to 6 months. In June 2022, more than 450 x-rays have been collected, so it is expected that 600 samples will be gathered in July 2022. We hope to obtain sufficient evidence to demonstrate that the use of AI in the reading of chest x-rays can be a good tool for diagnostic support. However, there is a decreasing number of radiology professionals and, therefore, it is necessary to develop and validate tools to support professionals who have to interpret these tests. Conclusions: If the results of the validation of the model are satisfactory, it could be implemented as a support tool and allow an increase in the accuracy and speed of diagnosis, patient safety, and agility in the primary care system, while reducing the cost of unnecessary tests. International Registered Report Identifier (IRRID): PRR1-10.2196/39536 %M 36383419 %R 10.2196/39536 %U https://www.researchprotocols.org/2022/11/e39536 %U https://doi.org/10.2196/39536 %U http://www.ncbi.nlm.nih.gov/pubmed/36383419 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 11 %P e41566 %T Personalized Prediction of Response to Smartphone-Delivered Meditation Training: Randomized Controlled Trial %A Webb,Christian A %A Hirshberg,Matthew J %A Davidson,Richard J %A Goldberg,Simon B %+ Department of Counseling Psychology, University of Wisconsin – Madison, 315 Education Building, 1000 Bascom Mall, Madison, WI, 53706, United States, 1 608 265 8986, sbgoldberg@wisc.edu %K precision medicine %K prediction %K machine learning %K meditation %K mobile technology %K smartphone app %K mobile phone %D 2022 %7 8.11.2022 %9 Original Papetar %J J Med Internet Res %G English %X Background: Meditation apps have surged in popularity in recent years, with an increasing number of individuals turning to these apps to cope with stress, including during the COVID-19 pandemic. Meditation apps are the most commonly used mental health apps for depression and anxiety. However, little is known about who is well suited to these apps. Objective: This study aimed to develop and test a data-driven algorithm to predict which individuals are most likely to benefit from app-based meditation training. Methods: Using randomized controlled trial data comparing a 4-week meditation app (Healthy Minds Program [HMP]) with an assessment-only control condition in school system employees (n=662), we developed an algorithm to predict who is most likely to benefit from HMP. Baseline clinical and demographic characteristics were submitted to a machine learning model to develop a “Personalized Advantage Index” (PAI) reflecting an individual’s expected reduction in distress (primary outcome) from HMP versus control. Results: A significant group × PAI interaction emerged (t658=3.30; P=.001), indicating that PAI scores moderated group differences in outcomes. A regression model that included repetitive negative thinking as the sole baseline predictor performed comparably well. Finally, we demonstrate the translation of a predictive model into personalized recommendations of expected benefit. Conclusions: Overall, the results revealed the potential of a data-driven algorithm to inform which individuals are most likely to benefit from a meditation app. Such an algorithm could be used to objectively communicate expected benefits to individuals, allowing them to make more informed decisions about whether a meditation app is appropriate for them. Trial Registration: ClinicalTrials.gov NCT04426318; https://clinicaltrials.gov/ct2/show/NCT04426318 %M 36346668 %R 10.2196/41566 %U https://www.jmir.org/2022/11/e41566 %U https://doi.org/10.2196/41566 %U http://www.ncbi.nlm.nih.gov/pubmed/36346668 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 11 %P e38168 %T Developing an Automated Assessment of In-session Patient Activation for Psychological Therapy: Codevelopment Approach %A Malins,Sam %A Figueredo,Grazziela %A Jilani,Tahseen %A Long,Yunfei %A Andrews,Jacob %A Rawsthorne,Mat %A Manolescu,Cosmin %A Clos,Jeremie %A Higton,Fred %A Waldram,David %A Hunt,Daniel %A Perez Vallejos,Elvira %A Moghaddam,Nima %+ Specialist Services, Nottinghamshire Healthcare NHS Foundation Trust, Triumph Road, Nottingham, NG7 2TU, United Kingdom, 44 7811737725, sam.malins@nottingham.ac.uk %K responsible artificial intelligence %K machine learning %K cognitive behavioral therapy %K multimorbidity %K natural language processing %K mental health %D 2022 %7 8.11.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: Patient activation is defined as a patient’s confidence and perceived ability to manage their own health. Patient activation has been a consistent predictor of long-term health and care costs, particularly for people with multiple long-term health conditions. However, there is currently no means of measuring patient activation from what is said in health care consultations. This may be particularly important for psychological therapy because most current methods for evaluating therapy content cannot be used routinely due to time and cost restraints. Natural language processing (NLP) has been used increasingly to classify and evaluate the contents of psychological therapy. This aims to make the routine, systematic evaluation of psychological therapy contents more accessible in terms of time and cost restraints. However, comparatively little attention has been paid to algorithmic trust and interpretability, with few studies in the field involving end users or stakeholders in algorithm development. Objective: This study applied a responsible design to use NLP in the development of an artificial intelligence model to automate the ratings assigned by a psychological therapy process measure: the consultation interactions coding scheme (CICS). The CICS assesses the level of patient activation observable from turn-by-turn psychological therapy interactions. Methods: With consent, 128 sessions of remotely delivered cognitive behavioral therapy from 53 participants experiencing multiple physical and mental health problems were anonymously transcribed and rated by trained human CICS coders. Using participatory methodology, a multidisciplinary team proposed candidate language features that they thought would discriminate between high and low patient activation. The team included service-user researchers, psychological therapists, applied linguists, digital research experts, artificial intelligence ethics researchers, and NLP researchers. Identified language features were extracted from the transcripts alongside demographic features, and machine learning was applied using k-nearest neighbors and bagged trees algorithms to assess whether in-session patient activation and interaction types could be accurately classified. Results: The k-nearest neighbors classifier obtained 73% accuracy (82% precision and 80% recall) in a test data set. The bagged trees classifier obtained 81% accuracy for test data (87% precision and 75% recall) in differentiating between interactions rated high in patient activation and those rated low or neutral. Conclusions: Coproduced language features identified through a multidisciplinary collaboration can be used to discriminate among psychological therapy session contents based on patient activation among patients experiencing multiple long-term physical and mental health conditions. %M 36346654 %R 10.2196/38168 %U https://medinform.jmir.org/2022/11/e38168 %U https://doi.org/10.2196/38168 %U http://www.ncbi.nlm.nih.gov/pubmed/36346654 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 11 %P e36933 %T Risk Assessment of COVID-19 Cases in Emergency Departments and Clinics With the Use of Real-World Data and Artificial Intelligence: Observational Study %A Logaras,Evangelos %A Billis,Antonis %A Kyparissidis Kokkinidis,Ilias %A Ketseridou,Smaranda Nafsika %A Fourlis,Alexios %A Tzotzis,Aristotelis %A Imprialos,Konstantinos %A Doumas,Michael %A Bamidis,Panagiotis %+ Laboratory of Medical Physics and Digital Innovation, School of Medicine, Aristotle University of Thessaloniki, University Campus, Building D, Gate 8, Thessaloniki, 54124, Greece, 30 2310999237, ampillis@med.auth.gr %K COVID-19 pandemic %K risk assessment %K wearable device %K respiration evaluation %K emergency department %K artificial intelligence %K real-world data %D 2022 %7 8.11.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: The recent COVID-19 pandemic has highlighted the weaknesses of health care systems around the world. In the effort to improve the monitoring of cases admitted to emergency departments, it has become increasingly necessary to adopt new innovative technological solutions in clinical practice. Currently, the continuous monitoring of vital signs is only performed in patients admitted to the intensive care unit. Objective: The study aimed to develop a smart system that will dynamically prioritize patients through the continuous monitoring of vital signs using a wearable biosensor device and recording of meaningful clinical records and estimate the likelihood of deterioration of each case using artificial intelligence models. Methods: The data for the study were collected from the emergency department and COVID-19 inpatient unit of the Hippokration General Hospital of Thessaloniki. The study was carried out in the framework of the COVID-X H2020 project, which was funded by the European Union. For the training of the neural network, data collection was performed from COVID-19 cases hospitalized in the respective unit. A wearable biosensor device was placed on the wrist of each patient, which recorded the primary characteristics of the visual signal related to breathing assessment. Results: A total of 157 adult patients diagnosed with COVID-19 were recruited. Lasso penalty function was used for selecting 18 out of 48 predictors and 2 random forest–based models were implemented for comparison. The high overall performance was maintained, if not improved, by feature selection, with random forest achieving accuracies of 80.9% and 82.1% when trained using all predictors and a subset of them, respectively. Preliminary results, although affected by pandemic limitations and restrictions, were promising regarding breathing pattern recognition. Conclusions: This study represents a novel approach that involves the use of machine learning methods and Edge artificial intelligence to assist the prioritization and continuous monitoring procedures of patients with COVID-19 in health departments. Although initial results appear to be promising, further studies are required to examine its actual effectiveness. %M 36197836 %R 10.2196/36933 %U https://formative.jmir.org/2022/11/e36933 %U https://doi.org/10.2196/36933 %U http://www.ncbi.nlm.nih.gov/pubmed/36197836 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 11 %P e40681 %T Investigating Patients' Continuance Intention Toward Conversational Agents in Outpatient Departments: Cross-sectional Field Survey %A Li,Xingyi %A Xie,Shirong %A Ye,Zhengqiang %A Ma,Shishi %A Yu,Guangjun %+ Shanghai Children’s Hospital, Shanghai Jiao Tong University School of Medicine, No 355 Luding Road, Shanghai, 200062, China, 86 18917762998, gjyu@shchildren.com.cn %K conversational agent %K continuance intention %K expectation-confirmation model %K partial least squares %K structural equation modeling %K chatbot %K virtual assistant %K cross-sectional %K field study %K optimization %K outpatient %K interview %K qualitative %K questionnaire %K satisfaction %K perceived usefulness %K intention %K adoption %K attitude %K perception %D 2022 %7 7.11.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Conversational agents (CAs) have been developed in outpatient departments to improve physician-patient communication efficiency. As end users, patients’ continuance intention is essential for the sustainable development of CAs. Objective: The aim of this study was to facilitate the successful usage of CAs by identifying key factors influencing patients’ continuance intention and proposing corresponding managerial implications. Methods: This study proposed an extended expectation-confirmation model and empirically tested the model via a cross-sectional field survey. The questionnaire included demographic characteristics, multiple-item scales, and an optional open-ended question on patients’ specific expectations for CAs. Partial least squares structural equation modeling was applied to assess the model and hypotheses. The qualitative data were analyzed via thematic analysis. Results: A total of 172 completed questionaries were received, with a 100% (172/172) response rate. The proposed model explained 75.5% of the variance in continuance intention. Both satisfaction (β=.68; P<.001) and perceived usefulness (β=.221; P=.004) were significant predictors of continuance intention. Patients' extent of confirmation significantly and positively affected both perceived usefulness (β=.817; P<.001) and satisfaction (β=.61; P<.001). Contrary to expectations, perceived ease of use had no significant impact on perceived usefulness (β=.048; P=.37), satisfaction (β=−.004; P=.63), and continuance intention (β=.026; P=.91). The following three themes were extracted from the 74 answers to the open-ended question: personalized interaction, effective utilization, and clear illustrations. Conclusions: This study identified key factors influencing patients’ continuance intention toward CAs. Satisfaction and perceived usefulness were significant predictors of continuance intention (P<.001 and P<.004, respectively) and were significantly affected by patients’ extent of confirmation (P<.001 and P<.001, respectively). Developing a better understanding of patients’ continuance intention can help administrators figure out how to facilitate the effective implementation of CAs. Efforts should be made toward improving the aspects that patients reasonably expect CAs to have, which include personalized interactions, effective utilization, and clear illustrations. %M 36342768 %R 10.2196/40681 %U https://www.jmir.org/2022/11/e40681 %U https://doi.org/10.2196/40681 %U http://www.ncbi.nlm.nih.gov/pubmed/36342768 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 11 %P e36553 %T Ambient Assisted Living: Scoping Review of Artificial Intelligence Models, Domains, Technology, and Concerns %A Jovanovic,Mladjan %A Mitrov,Goran %A Zdravevski,Eftim %A Lameski,Petre %A Colantonio,Sara %A Kampel,Martin %A Tellioglu,Hilda %A Florez-Revuelta,Francisco %+ Department of Computer Science, Singidunum University, Danijelova 32, Belgrade, 11000, Serbia, 381 603831844, mjovanovic@singidunum.ac.rs %K ambient assisted living %K AAL %K assisted living %K active living %K digital health %K digital well-being %K automated learning approach %K artificial intelligence algorithms %K human-centered AI %K review %K implications %K artificial intelligence %K mobile phone %D 2022 %7 4.11.2022 %9 Review %J J Med Internet Res %G English %X Background: Ambient assisted living (AAL) is a common name for various artificial intelligence (AI)—infused applications and platforms that support their users in need in multiple activities, from health to daily living. These systems use different approaches to learn about their users and make automated decisions, known as AI models, for personalizing their services and increasing outcomes. Given the numerous systems developed and deployed for people with different needs, health conditions, and dispositions toward the technology, it is critical to obtain clear and comprehensive insights concerning AI models used, along with their domains, technology, and concerns, to identify promising directions for future work. Objective: This study aimed to provide a scoping review of the literature on AI models in AAL. In particular, we analyzed specific AI models used in AАL systems, the target domains of the models, the technology using the models, and the major concerns from the end-user perspective. Our goal was to consolidate research on this topic and inform end users, health care professionals and providers, researchers, and practitioners in developing, deploying, and evaluating future intelligent AAL systems. Methods: This study was conducted as a scoping review to identify, analyze, and extract the relevant literature. It used a natural language processing toolkit to retrieve the article corpus for an efficient and comprehensive automated literature search. Relevant articles were then extracted from the corpus and analyzed manually. This review included 5 digital libraries: IEEE, PubMed, Springer, Elsevier, and MDPI. Results: We included a total of 108 articles. The annual distribution of relevant articles showed a growing trend for all categories from January 2010 to July 2022. The AI models mainly used unsupervised and semisupervised approaches. The leading models are deep learning, natural language processing, instance-based learning, and clustering. Activity assistance and recognition were the most common target domains of the models. Ambient sensing, mobile technology, and robotic devices mainly implemented the models. Older adults were the primary beneficiaries, followed by patients and frail persons of various ages. Availability was a top beneficiary concern. Conclusions: This study presents the analytical evidence of AI models in AAL and their domains, technologies, beneficiaries, and concerns. Future research on intelligent AAL should involve health care professionals and caregivers as designers and users, comply with health-related regulations, improve transparency and privacy, integrate with health care technological infrastructure, explain their decisions to the users, and establish evaluation metrics and design guidelines. Trial Registration: PROSPERO (International Prospective Register of Systematic Reviews) CRD42022347590; https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42022347590 %M 36331530 %R 10.2196/36553 %U https://www.jmir.org/2022/11/e36553 %U https://doi.org/10.2196/36553 %U http://www.ncbi.nlm.nih.gov/pubmed/36331530 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 11 %P e34067 %T Predicting Emerging Themes in Rapidly Expanding COVID-19 Literature With Unsupervised Word Embeddings and Machine Learning: Evidence-Based Study %A Pal,Ridam %A Chopra,Harshita %A Awasthi,Raghav %A Bandhey,Harsh %A Nagori,Aditya %A Sethi,Tavpritesh %+ Department of Computational Biology, Indraprastha Institute of Information Technology Delhi, Third Floor, New Academic Block, Okhla Industrial Estate, Phase-III, New Delhi, 110020, India, 91 9779908630, tavpriteshsethi@iiitd.ac.in %K COVID-19 %K named entity recognition %K unsupervised word embeddings %K machine learning %K natural language preprocessing %D 2022 %7 2.11.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Evidence from peer-reviewed literature is the cornerstone for designing responses to global threats such as COVID-19. In massive and rapidly growing corpuses, such as COVID-19 publications, assimilating and synthesizing information is challenging. Leveraging a robust computational pipeline that evaluates multiple aspects, such as network topological features, communities, and their temporal trends, can make this process more efficient. Objective: We aimed to show that new knowledge can be captured and tracked using the temporal change in the underlying unsupervised word embeddings of the literature. Further imminent themes can be predicted using machine learning on the evolving associations between words. Methods: Frequently occurring medical entities were extracted from the abstracts of more than 150,000 COVID-19 articles published on the World Health Organization database, collected on a monthly interval starting from February 2020. Word embeddings trained on each month’s literature were used to construct networks of entities with cosine similarities as edge weights. Topological features of the subsequent month’s network were forecasted based on prior patterns, and new links were predicted using supervised machine learning. Community detection and alluvial diagrams were used to track biomedical themes that evolved over the months. Results: We found that thromboembolic complications were detected as an emerging theme as early as August 2020. A shift toward the symptoms of long COVID complications was observed during March 2021, and neurological complications gained significance in June 2021. A prospective validation of the link prediction models achieved an area under the receiver operating characteristic curve of 0.87. Predictive modeling revealed predisposing conditions, symptoms, cross-infection, and neurological complications as dominant research themes in COVID-19 publications based on the patterns observed in previous months. Conclusions: Machine learning–based prediction of emerging links can contribute toward steering research by capturing themes represented by groups of medical entities, based on patterns of semantic relationships over time. %M 36040993 %R 10.2196/34067 %U https://www.jmir.org/2022/11/e34067 %U https://doi.org/10.2196/34067 %U http://www.ncbi.nlm.nih.gov/pubmed/36040993 %0 Journal Article %@ 2561-1011 %I JMIR Publications %V 6 %N 2 %P e38040 %T The Impact of Time Horizon on Classification Accuracy: Application of Machine Learning to Prediction of Incident Coronary Heart Disease %A Simon,Steven %A Mandair,Divneet %A Albakri,Abdel %A Fohner,Alison %A Simon,Noah %A Lange,Leslie %A Biggs,Mary %A Mukamal,Kenneth %A Psaty,Bruce %A Rosenberg,Michael %+ Division of Cardiology, University of Colorado School of Medicine, 13001 E 17th Pl, Aurora, CO, 80045, United States, 1 303 724 6946, steven.simon@cuanschutz.edu %K coronary heart disease %K risk prediction %K machine learning %K heart %K heart disease %K clinical %K risk %K myocardial %K gender %D 2022 %7 2.11.2022 %9 Original Paper %J JMIR Cardio %G English %X Background: Many machine learning approaches are limited to classification of outcomes rather than longitudinal prediction. One strategy to use machine learning in clinical risk prediction is to classify outcomes over a given time horizon. However, it is not well-known how to identify the optimal time horizon for risk prediction. Objective: In this study, we aim to identify an optimal time horizon for classification of incident myocardial infarction (MI) using machine learning approaches looped over outcomes with increasing time horizons. Additionally, we sought to compare the performance of these models with the traditional Framingham Heart Study (FHS) coronary heart disease gender-specific Cox proportional hazards regression model. Methods: We analyzed data from a single clinic visit of 5201 participants of a cardiovascular health study. We examined 61 variables collected from this baseline exam, including demographic and biologic data, medical history, medications, serum biomarkers, electrocardiographic, and echocardiographic data. We compared several machine learning methods (eg, random forest, L1 regression, gradient boosted decision tree, support vector machine, and k-nearest neighbor) trained to predict incident MI that occurred within time horizons ranging from 500-10,000 days of follow-up. Models were compared on a 20% held-out testing set using area under the receiver operating characteristic curve (AUROC). Variable importance was performed for random forest and L1 regression models across time points. We compared results with the FHS coronary heart disease gender-specific Cox proportional hazards regression functions. Results: There were 4190 participants included in the analysis, with 2522 (60.2%) female participants and an average age of 72.6 years. Over 10,000 days of follow-up, there were 813 incident MI events. The machine learning models were most predictive over moderate follow-up time horizons (ie, 1500-2500 days). Overall, the L1 (Lasso) logistic regression demonstrated the strongest classification accuracy across all time horizons. This model was most predictive at 1500 days follow-up, with an AUROC of 0.71. The most influential variables differed by follow-up time and model, with gender being the most important feature for the L1 regression and weight for the random forest model across all time frames. Compared with the Framingham Cox function, the L1 and random forest models performed better across all time frames beyond 1500 days. Conclusions: In a population free of coronary heart disease, machine learning techniques can be used to predict incident MI at varying time horizons with reasonable accuracy, with the strongest prediction accuracy in moderate follow-up periods. Validation across additional populations is needed to confirm the validity of this approach in risk prediction. %M 36322114 %R 10.2196/38040 %U https://cardio.jmir.org/2022/2/e38040 %U https://doi.org/10.2196/38040 %U http://www.ncbi.nlm.nih.gov/pubmed/36322114 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 11 %P e39748 %T The Impact of Artificial Intelligence on Health Equity in Oncology: Scoping Review %A Istasy,Paul %A Lee,Wen Shen %A Iansavichene,Alla %A Upshur,Ross %A Gyawali,Bishal %A Burkell,Jacquelyn %A Sadikovic,Bekim %A Lazo-Langner,Alejandro %A Chin-Yee,Benjamin %+ Division of Hematology, Department of Medicine, London Health Sciences Centre, 800 Comissioners Rd E, London, ON, N6A 5W9, Canada, 1 519 685 8475, Benjamin.Chin-Yee@lhsc.on.ca %K artificial intelligence %K eHealth %K digital health %K machine learning %K oncology %K cancer %K health equity %K health disparity %K bias %K global health %K public health %K cancer epidemiology %K epidemiology %K scoping %K review %K mobile phone %D 2022 %7 1.11.2022 %9 Review %J J Med Internet Res %G English %X Background: The field of oncology is at the forefront of advances in artificial intelligence (AI) in health care, providing an opportunity to examine the early integration of these technologies in clinical research and patient care. Hope that AI will revolutionize health care delivery and improve clinical outcomes has been accompanied by concerns about the impact of these technologies on health equity. Objective: We aimed to conduct a scoping review of the literature to address the question, “What are the current and potential impacts of AI technologies on health equity in oncology?” Methods: Following PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines for scoping reviews, we systematically searched MEDLINE and Embase electronic databases from January 2000 to August 2021 for records engaging with key concepts of AI, health equity, and oncology. We included all English-language articles that engaged with the 3 key concepts. Articles were analyzed qualitatively for themes pertaining to the influence of AI on health equity in oncology. Results: Of the 14,011 records, 133 (0.95%) identified from our review were included. We identified 3 general themes in the literature: the use of AI to reduce health care disparities (58/133, 43.6%), concerns surrounding AI technologies and bias (16/133, 12.1%), and the use of AI to examine biological and social determinants of health (55/133, 41.4%). A total of 3% (4/133) of articles focused on many of these themes. Conclusions: Our scoping review revealed 3 main themes on the impact of AI on health equity in oncology, which relate to AI’s ability to help address health disparities, its potential to mitigate or exacerbate bias, and its capability to help elucidate determinants of health. Gaps in the literature included a lack of discussion of ethical challenges with the application of AI technologies in low- and middle-income countries, lack of discussion of problems of bias in AI algorithms, and a lack of justification for the use of AI technologies over traditional statistical methods to address specific research questions in oncology. Our review highlights a need to address these gaps to ensure a more equitable integration of AI in cancer research and clinical practice. The limitations of our study include its exploratory nature, its focus on oncology as opposed to all health care sectors, and its analysis of solely English-language articles. %M 36005841 %R 10.2196/39748 %U https://www.jmir.org/2022/11/e39748 %U https://doi.org/10.2196/39748 %U http://www.ncbi.nlm.nih.gov/pubmed/36005841 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 9 %N 11 %P e41014 %T Ethical Implications of the Use of Language Analysis Technologies for the Diagnosis and Prediction of Psychiatric Disorders %A Loch,Alexandre Andrade %A Lopes-Rocha,Ana Caroline %A Ara,Anderson %A Gondim,João Medrado %A Cecchi,Guillermo A %A Corcoran,Cheryl Mary %A Mota,Natália Bezerra %A Argolo,Felipe C %+ Institute of Psychiatry, University of Sao Paulo, R. Dr. Ovidio Pires de Campos 785, 4 andar sala 4N60, Sao Paulo, 05403010, Brazil, 55 11996201213, alexandre.loch@usp.br %K at-risk mental state %K psychosis %K clinical high risk %K digital phenotyping %K machine learning %K artificial intelligence %K natural language processing %D 2022 %7 1.11.2022 %9 Viewpoint %J JMIR Ment Health %G English %X Recent developments in artificial intelligence technologies have come to a point where machine learning algorithms can infer mental status based on someone’s photos and texts posted on social media. More than that, these algorithms are able to predict, with a reasonable degree of accuracy, future mental illness. They potentially represent an important advance in mental health care for preventive and early diagnosis initiatives, and for aiding professionals in the follow-up and prognosis of their patients. However, important issues call for major caution in the use of such technologies, namely, privacy and the stigma related to mental disorders. In this paper, we discuss the bioethical implications of using such technologies to diagnose and predict future mental illness, given the current scenario of swiftly growing technologies that analyze human language and the online availability of personal information given by social media. We also suggest future directions to be taken to minimize the misuse of such important technologies. %M 36318266 %R 10.2196/41014 %U https://mental.jmir.org/2022/11/e41014 %U https://doi.org/10.2196/41014 %U http://www.ncbi.nlm.nih.gov/pubmed/36318266 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 10 %P e39698 %T Technical, Ethical, Legal, and Societal Challenges With Digital Twin Systems for the Management of Chronic Diseases in Children and Young People %A Drummond,David %A Coulet,Adrien %+ Department of Paediatric Pulmonology and Allergology, University Hospital Necker-Enfants Malades, AP-HP, 149 rue de Sèvres, Paris, 75015, France, 33 144494848, david.drummond@aphp.fr %K artificial intelligence %K pediatrics %K medical cyber-physical systems %K children %K digital twin %K child %K personalized %K cyber-physical %K digital health %K digital medicine %K eHealth %K ethics %K legal %K law %K young people %K youth %K ethical %K sensor %K monitor %K privacy %K data collection %K paediatric %K pediatrician %K paediatrician %K chronic disease %K medical system %D 2022 %7 31.10.2022 %9 Viewpoint %J J Med Internet Res %G English %X Advances in digital medicine now make it possible to use digital twin systems (DTS), which combine (1) extensive patient monitoring through the use of multiple sensors and (2) personalized adaptation of patient care through the use of software. After the artificial pancreas system already operational in children with type 1 diabetes, new DTS could be developed for real-time monitoring and management of children with chronic diseases. Just as providing care for children is a specific discipline—pediatrics—because of their particular characteristics and needs, providing digital care for children also presents particular challenges. This article reviews the technical challenges, mainly related to the problem of data collection in children; the ethical challenges, including the need to preserve the child's place in their care when using DTS; the legal challenges and the dual need to guarantee the safety of DTS for children and to ensure their access to DTS; and the societal challenges, including the needs to maintain human contact and trust between the child and the pediatrician and to limit DTS to specific uses to avoid contributing to a surveillance society and, at another level, to climate change.  %M 36315239 %R 10.2196/39698 %U https://www.jmir.org/2022/10/e39698 %U https://doi.org/10.2196/39698 %U http://www.ncbi.nlm.nih.gov/pubmed/36315239 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 9 %N 4 %P e38411 %T Clinicians’ Perceptions of an Artificial Intelligence–Based Blood Utilization Calculator: Qualitative Exploratory Study %A Choudhury,Avishek %A Asan,Onur %A Medow,Joshua E %+ Industrial and Management Systems Engineering, Benjamin M Statler College of Engineering and Mineral Resources, West Virginia University, 1306 Evansdale Drive, PO Box 6107, Morgantown, WV, 26506-6107, United States, 1 3042939431, avishek.choudhury@mail.wvu.edu %K artificial intelligence %K human factors %K decision-making %K blood transfusion %K technology acceptance %K complications %K prevention %K decision support %K transfusion overload %K risk %K support %K perception %K safety %K usability %D 2022 %7 31.10.2022 %9 Original Paper %J JMIR Hum Factors %G English %X Background: According to the US Food and Drug Administration Center for Biologics Evaluation and Research, health care systems have been experiencing blood transfusion overuse. To minimize the overuse of blood product transfusions, a proprietary artificial intelligence (AI)–based blood utilization calculator (BUC) was developed and integrated into a US hospital’s electronic health record. Despite the promising performance of the BUC, this technology remains underused in the clinical setting. Objective: This study aims to explore how clinicians perceived this AI-based decision support system and, consequently, understand the factors hindering BUC use. Methods: We interviewed 10 clinicians (BUC users) until the data saturation point was reached. The interviews were conducted over a web-based platform and were recorded. The audiovisual recordings were then anonymously transcribed verbatim. We used an inductive-deductive thematic analysis to analyze the transcripts, which involved applying predetermined themes to the data (deductive) and consecutively identifying new themes as they emerged in the data (inductive). Results: We identified the following two themes: (1) workload and usability and (2) clinical decision-making. Clinicians acknowledged the ease of use and usefulness of the BUC for the general inpatient population. The clinicians also found the BUC to be useful in making decisions related to blood transfusion. However, some clinicians found the technology to be confusing due to inconsistent automation across different blood work processes. Conclusions: This study highlights that analytical efficacy alone does not ensure technology use or acceptance. The overall system’s design, user perception, and users’ knowledge of the technology are equally important and necessary (limitations, functionality, purpose, and scope). Therefore, the effective integration of AI-based decision support systems, such as the BUC, mandates multidisciplinary engagement, ensuring the adequate initial and recurrent training of AI users while maintaining high analytical efficacy and validity. As a final takeaway, the design of AI systems that are made to perform specific tasks must be self-explanatory, so that the users can easily understand how and when to use the technology. Using any technology on a population for whom it was not initially designed will hinder user perception and the technology’s use. %M 36315238 %R 10.2196/38411 %U https://humanfactors.jmir.org/2022/4/e38411 %U https://doi.org/10.2196/38411 %U http://www.ncbi.nlm.nih.gov/pubmed/36315238 %0 Journal Article %@ 2563-3570 %I JMIR Publications %V 3 %N 1 %P e29404 %T Prediction of Antibody-Antigen Binding via Machine Learning: Development of Data Sets and Evaluation of Methods %A Ye,Chao %A Hu,Wenxing %A Gaeta,Bruno %+ School of Computer Science and Engineering, The University of New South Wales, Computer Science Building (K17), Engineering Rd, UNSW, Sydney, 2052, Australia, 61 293857213, bgaeta@unsw.edu.au %K DNA sequencing %K DNA %K DNA sequence %K sequence data %K molecular biology %K genomic %K random forest %K nearest neighbor %K immunoglobulin %K genetics %K antibody-antigen binding %K antigen %K antibody %K structural biology %K machine learning %K protein modeling %K protein %K proteomic %D 2022 %7 28.10.2022 %9 Original Paper %J JMIR Bioinform Biotech %G English %X Background: The mammalian immune system is able to generate antibodies against a huge variety of antigens, including bacteria, viruses, and toxins. The ultradeep DNA sequencing of rearranged immunoglobulin genes has considerable potential in furthering our understanding of the immune response, but it is limited by the lack of a high-throughput, sequence-based method for predicting the antigen(s) that a given immunoglobulin recognizes. Objective: As a step toward the prediction of antibody-antigen binding from sequence data alone, we aimed to compare a range of machine learning approaches that were applied to a collated data set of antibody-antigen pairs in order to predict antibody-antigen binding from sequence data. Methods: Data for training and testing were extracted from the Protein Data Bank and the Coronavirus Antibody Database, and additional antibody-antigen pair data were generated by using a molecular docking protocol. Several machine learning methods, including the weighted nearest neighbor method, the nearest neighbor method with the BLOSUM62 matrix, and the random forest method, were applied to the problem. Results: The final data set contained 1157 antibodies and 57 antigens that were combined in 5041 antibody-antigen pairs. The best performance for the prediction of interactions was obtained by using the nearest neighbor method with the BLOSUM62 matrix, which resulted in around 82% accuracy on the full data set. These results provide a useful frame of reference, as well as protocols and considerations, for machine learning and data set creation in the prediction of antibody-antigen binding. Conclusions: Several machine learning approaches were compared to predict antibody-antigen interaction from protein sequences. Both the data set (in CSV format) and the machine learning program (coded in Python) are freely available for download on GitHub. %R 10.2196/29404 %U https://bioinform.jmir.org/2022/1/e29404 %U https://doi.org/10.2196/29404 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 10 %P e39613 %T Associations Between the Severity of Obsessive-Compulsive Disorder and Vocal Features in Children and Adolescents: Protocol for a Statistical and Machine Learning Analysis %A Clemmensen,Line Katrine Harder %A Lønfeldt,Nicole Nadine %A Das,Sneha %A Lund,Nicklas Leander %A Uhre,Valdemar Funch %A Mora-Jensen,Anna-Rosa Cecilie %A Pretzmann,Linea %A Uhre,Camilla Funch %A Ritter,Melanie %A Korsbjerg,Nicoline Løcke Jepsen %A Hagstrøm,Julie %A Thoustrup,Christine Lykke %A Clemmesen,Iben Thiemer %A Plessen,Kersten Jessica %A Pagsberg,Anne Katrine %+ Department of Applied Mathematics and Computer Science, Technical University of Denmark, Richard Petersens Plads, Bygning 324, Copenhagen, 2800, Denmark, 45 45 25 37 64, lkhc@dtu.dk %K machine learning %K obsessive-compulsive disorder %K vocal features %K speech signals %K children %K teens %K adolescents %K OCD %K AI %K artificial intelligence %K tool %K mental health %K care %K speech %K data %K clinical trial %K validity %K results %D 2022 %7 28.10.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: Artificial intelligence tools have the potential to objectively identify youth in need of mental health care. Speech signals have shown promise as a source for predicting various psychiatric conditions and transdiagnostic symptoms. Objective: We designed a study testing the association between obsessive-compulsive disorder (OCD) diagnosis and symptom severity on vocal features in children and adolescents. Here, we present an analysis plan and statistical report for the study to document our a priori hypotheses and increase the robustness of the findings of our planned study. Methods: Audio recordings of clinical interviews of 47 children and adolescents with OCD and 17 children and adolescents without a psychiatric diagnosis will be analyzed. Youths were between 8 and 17 years old. We will test the effect of OCD diagnosis on computationally derived scores of vocal activation using ANOVA. To test the effect of OCD severity classifications on the same computationally derived vocal scores, we will perform a logistic regression. Finally, we will attempt to create an improved indicator of OCD severity by refining the model with more relevant labels. Models will be adjusted for age and gender. Model validation strategies are outlined. Results: Simulated results are presented. The actual results using real data will be presented in future publications. Conclusions: A major strength of this study is that we will include age and gender in our models to increase classification accuracy. A major challenge is the suboptimal quality of the audio recordings, which are representative of in-the-wild data and a large body of recordings collected during other clinical trials. This preregistered analysis plan and statistical report will increase the validity of the interpretations of the upcoming results. International Registered Report Identifier (IRRID): DERR1-10.2196/39613 %M 36306153 %R 10.2196/39613 %U https://www.researchprotocols.org/2022/10/e39613 %U https://doi.org/10.2196/39613 %U http://www.ncbi.nlm.nih.gov/pubmed/36306153 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 10 %P e39157 %T COVID-Bot, an Intelligent System for COVID-19 Vaccination Screening: Design and Development %A Okonkwo,Chinedu Wilfred %A Amusa,Lateef Babatunde %A Twinomurinzi,Hossana %+ Centre for Applied Data Sciences, College of Business and Economics, University of Johannesburg, Bunting Road Campus, Auckland Park, Johannesburg, 2006, South Africa, 27 0835796557, chineduo@uj.ac.za %K chatbot %K COVID-Bot %K COVID-19 %K students %K vaccine %K exemption letter %K vaccination %K artificial intelligence %D 2022 %7 27.10.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Coronavirus continues to spread worldwide, causing various health and economic disruptions. One of the most important approaches to controlling the spread of this disease is to use an artificial intelligence (AI)–based technological intervention, such as a chatbot system. Chatbots can aid in the fight against the spread of COVID-19. Objective: This paper introduces COVID-Bot, an intelligent interactive system that can help screen students and confirm their COVID-19 vaccination status. Methods: The design and development of COVID-Bot followed the principles of the design science research (DSR) process, which is a research method for creating a new scientific artifact. COVID-Bot was developed and implemented using the SnatchBot chatbot application programming interface (API) and its predefined tools, which are driven by various natural language processing algorithms. Results: An evaluation was carried out through a survey that involved 106 university students in determining the functionality, compatibility, reliability, and usability of COVID-Bot. The findings indicated that 92 (86.8%) of the participants agreed that the chatbot functions well, 85 (80.2%) agreed that it fits well with their mobile devices and their lifestyle, 86 (81.1%) agreed that it has the potential to produce accurate and consistent responses, and 85 (80.2%) agreed that it is easy to use. The average obtained α was .87, indicating satisfactory reliability. Conclusions: This study demonstrates that incorporating chatbot technology into the educational system can combat the spread of COVID-19 among university students. The intelligent system does this by interacting with students to determine their vaccination status. %M 36301616 %R 10.2196/39157 %U https://formative.jmir.org/2022/10/e39157 %U https://doi.org/10.2196/39157 %U http://www.ncbi.nlm.nih.gov/pubmed/36301616 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 10 %P e38215 %T Relationships Between Blood Pressure Reduction, Weight Loss, and Engagement in a Digital App–Based Hypertension Care Program: Observational Study %A Branch,OraLee H %A Rikhy,Mohit %A Auster-Gussman,Lisa A %A Lockwood,Kimberly G %A Graham,Sarah A %+ Lark Technologies, Inc, 2570 El Camino Real, Mountain View, CA, 94040, United States, 1 650 300 1755, sarah.graham@lark.com %K high blood pressure %K obesity %K weight loss %K conversational artificial intelligence %K lifestyle coaching %D 2022 %7 27.10.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Home blood pressure (BP) monitoring is recommended for people with hypertension; however, meta-analyses have demonstrated that BP improvements are related to additional coaching support in combination with self-monitoring, with little or no effect of self-monitoring alone. High-contact coaching requires substantial resources and may be difficult to deliver via human coaching models. Objective: This observational study assessed changes in BP and body weight following participation in a fully digital program called Lark Hypertension Care with coaching powered by artificial intelligence (AI). Methods: Participants (N=864) had a baseline systolic BP (SBP) ≥120 mm Hg, provided their baseline body weight, and had reached at least their third month in the program. The primary outcome was the change in SBP at 3 and 6 months, with secondary outcomes of change in body weight and associations of changes in SBP and body weight with participant demographics, characteristics, and program engagement. Results: By month 3, there was a significant drop of –5.4 mm Hg (95% CI –6.5 to –4.3; P<.001) in mean SBP from baseline. BP did not change significantly (ie, the SBP drop maintained) from 3 to 6 months for participants who provided readings at both time points (P=.49). Half of the participants achieved a clinically meaningful drop of ≥5 mm Hg by month 3 (178/349, 51.0%) and month 6 (98/199, 49.2%). The magnitude of the drop depended on starting SBP. Participants classified as hypertension stage 2 had the largest mean drop in SBP of –12.4 mm Hg (SE 1.2 mm Hg) by month 3 and –13.0 mm Hg (SE 1.6 mm Hg) by month 6; participants classified as hypertension stage 1 lowered by –5.2 mm Hg (SE 0.8) mm Hg by month 3 and –7.3 mm Hg (SE 1.3 mm Hg) by month 6; participants classified as elevated lowered by –1.1 mm Hg (SE 0.7 mm Hg) by month 3 but did not drop by month 6. Starting SBP (β=.11; P<.001), percent weight change (β=–.36; P=.02), and initial BMI (β=–.56; P<.001) were significantly associated with the likelihood of lowering SBP ≥5 mm Hg by month 3. Percent weight change acted as a mediator of the relationship between program engagement and drop in SBP. The bootstrapped unstandardized indirect effect was –0.0024 (95% CI –0.0052 to 0; P=.002). Conclusions: A hypertension care program with coaching powered by AI was associated with a clinically meaningful reduction in SBP following 3 and 6 months of program participation. Percent weight change was significantly associated with the likelihood of achieving a ≥5 mm Hg drop in SBP. An AI-powered solution may offer a scalable approach to helping individuals with hypertension achieve clinically meaningful reductions in their BP and associated risk of cardiovascular disease and other serious adverse outcomes via healthy lifestyle changes such as weight loss. %M 36301618 %R 10.2196/38215 %U https://formative.jmir.org/2022/10/e38215 %U https://doi.org/10.2196/38215 %U http://www.ncbi.nlm.nih.gov/pubmed/36301618 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 8 %N 4 %P e38325 %T Perceptions of US Medical Students on Artificial Intelligence in Medicine: Mixed Methods Survey Study %A Liu,David Shalom %A Sawyer,Jake %A Luna,Alexander %A Aoun,Jihad %A Wang,Janet %A Boachie,Lord %A Halabi,Safwan %A Joe,Bina %+ Department of Physiology and Pharmacology, College of Medicine and Life Sciences, University of Toledo, 3000 Arlington Ave, Toledo, OH, 43614, United States, 1 419 383 4144, bina.joe@utoledo.edu %K artificial intelligence %K eHealth %K digital health %K integration %K medical education %K medical curriculum %K education %K medical student %K medical school %K elective course %D 2022 %7 21.10.2022 %9 Original Paper %J JMIR Med Educ %G English %X Background: Given the rapidity with which artificial intelligence is gaining momentum in clinical medicine, current physician leaders have called for more incorporation of artificial intelligence topics into undergraduate medical education. This is to prepare future physicians to better work together with artificial intelligence technology. However, the first step in curriculum development is to survey the needs of end users. There has not been a study to determine which media and which topics are most preferred by US medical students to learn about the topic of artificial intelligence in medicine. Objective: We aimed to survey US medical students on the need to incorporate artificial intelligence in undergraduate medical education and their preferred means to do so to assist with future education initiatives. Methods: A mixed methods survey comprising both specific questions and a write-in response section was sent through Qualtrics to US medical students in May 2021. Likert scale questions were used to first assess various perceptions of artificial intelligence in medicine. Specific questions were posed regarding learning format and topics in artificial intelligence. Results: We surveyed 390 US medical students with an average age of 26 (SD 3) years from 17 different medical programs (the estimated response rate was 3.5%). A majority (355/388, 91.5%) of respondents agreed that training in artificial intelligence concepts during medical school would be useful for their future. While 79.4% (308/388) were excited to use artificial intelligence technologies, 91.2% (353/387) either reported that their medical schools did not offer resources or were unsure if they did so. Short lectures (264/378, 69.8%), formal electives (180/378, 47.6%), and Q and A panels (167/378, 44.2%) were identified as preferred formats, while fundamental concepts of artificial intelligence (247/379, 65.2%), when to use artificial intelligence in medicine (227/379, 59.9%), and pros and cons of using artificial intelligence (224/379, 59.1%) were the most preferred topics for enhancing their training. Conclusions: The results of this study indicate that current US medical students recognize the importance of artificial intelligence in medicine and acknowledge that current formal education and resources to study artificial intelligence–related topics are limited in most US medical schools. Respondents also indicated that a hybrid formal/flexible format would be most appropriate for incorporating artificial intelligence as a topic in US medical schools. Based on these data, we conclude that there is a definitive knowledge gap in artificial intelligence education within current medical education in the US. Further, the results suggest there is a disparity in opinions on the specific format and topics to be introduced. %M 36269641 %R 10.2196/38325 %U https://mededu.jmir.org/2022/4/e38325 %U https://doi.org/10.2196/38325 %U http://www.ncbi.nlm.nih.gov/pubmed/36269641 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 10 %P e40567 %T Automatic Assessment of Intelligibility in Noise in Parkinson Disease: Validation Study %A Moya-Galé,Gemma %A Walsh,Stephen J %A Goudarzi,Alireza %+ Department of Communication Sciences & Disorders, Long Island University, 1 University Plaza, Brooklyn, NY, 11201, United States, 1 718 780 4125, gemma.moya-gale@liu.edu %K automatic speech recognition %K Parkinson disease %K intelligibility %K dysarthria %K digital health %K artificial intelligence %D 2022 %7 20.10.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Most individuals with Parkinson disease (PD) experience a degradation in their speech intelligibility. Research on the use of automatic speech recognition (ASR) to assess intelligibility is still sparse, especially when trying to replicate communication challenges in real-life conditions (ie, noisy backgrounds). Developing technologies to automatically measure intelligibility in noise can ultimately assist patients in self-managing their voice changes due to the disease. Objective: The goal of this study was to pilot-test and validate the use of a customized web-based app to assess speech intelligibility in noise in individuals with dysarthria associated with PD. Methods: In total, 20 individuals with dysarthria associated with PD and 20 healthy controls (HCs) recorded a set of sentences using their phones. The Google Cloud ASR API was used to automatically transcribe the speakers’ sentences. An algorithm was created to embed speakers’ sentences in +6-dB signal-to-noise multitalker babble. Results from ASR performance were compared to those from 30 listeners who orthographically transcribed the same set of sentences. Data were reduced into a single event, defined as a success if the artificial intelligence (AI) system transcribed a random speaker or sentence as well or better than the average of 3 randomly chosen human listeners. These data were further analyzed by logistic regression to assess whether AI success differed by speaker group (HCs or speakers with dysarthria) or was affected by sentence length. A discriminant analysis was conducted on the human listener data and AI transcriber data independently to compare the ability of each data set to discriminate between HCs and speakers with dysarthria. Results: The data analysis indicated a 0.8 probability (95% CI 0.65-0.91) that AI performance would be as good or better than the average human listener. AI transcriber success probability was not found to be dependent on speaker group. AI transcriber success was found to decrease with sentence length, losing an estimated 0.03 probability of transcribing as well as the average human listener for each word increase in sentence length. The AI transcriber data were found to offer the same discrimination of speakers into categories (HCs and speakers with dysarthria) as the human listener data. Conclusions: ASR has the potential to assess intelligibility in noise in speakers with dysarthria associated with PD. Our results hold promise for the use of AI with this clinical population, although a full range of speech severity needs to be evaluated in future work, as well as the effect of different speaking tasks on ASR. %M 36264608 %R 10.2196/40567 %U https://www.jmir.org/2022/10/e40567 %U https://doi.org/10.2196/40567 %U http://www.ncbi.nlm.nih.gov/pubmed/36264608 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 10 %P e39565 %T Physical Therapists’ Knowledge and Attitudes Regarding Artificial Intelligence Applications in Health Care and Rehabilitation: Cross-sectional Study %A Alsobhi,Mashael %A Khan,Fayaz %A Chevidikunnan,Mohamed Faisal %A Basuodan,Reem %A Shawli,Lama %A Neamatallah,Ziyad %+ Department of Physical Therapy, Faculty of Medical Rehabilitation Sciences, King Abdulaziz University, Administration Street, Jeddah, 22254, Saudi Arabia, 966 533034058, fayazrkhan@gmail.com %K artificial intelligence %K physical therapy %K clinicians’ attitudes %K health care %K rehabilitation %K digital health %K machine learning %K survey %D 2022 %7 20.10.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: The use of artificial intelligence (AI) in the field of rehabilitation is growing rapidly. Therefore, there is a need to understand how physical therapists (PTs) perceive AI technologies in clinical practice. Objective: This study aimed to investigate the knowledge and attitude of PTs regarding AI applications in rehabilitation based on multiple explanatory factors. Methods: A web-based Google Form survey, which was divided into 4 sections, was used to collect the data. A total of 317 PTs participated voluntarily in the study. Results: The PTs’ knowledge about AI applications in rehabilitation was lower than their knowledge about AI in general. We found a statistically significant difference in the PTs’ knowledge regarding AI applications in the rehabilitation field based on sex (odds ratio [OR] 2.43, 95% CI 1.53-3.87; P<.001). In addition, experience (OR 1.79, 95% CI 1.11-2.87; P=.02) and educational qualification (OR 1.68, 95% CI 1.05-2.70; P=.03) were found to be significant predictors of knowledge about AI applications. PTs who work in the nonacademic sector and who had <10 years of experience had positive attitudes regarding AI. Conclusions: AI technologies have been integrated into many physical therapy practices through the automation of clinical tasks. Therefore, PTs are encouraged to take advantage of the widespread development of AI technologies and enrich their knowledge about, and enhance their practice with, AI applications. %M 36264614 %R 10.2196/39565 %U https://www.jmir.org/2022/10/e39565 %U https://doi.org/10.2196/39565 %U http://www.ncbi.nlm.nih.gov/pubmed/36264614 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 10 %P e38472 %T The Effectiveness of Supervised Machine Learning in Screening and Diagnosing Voice Disorders: Systematic Review and Meta-analysis %A Al-Hussain,Ghada %A Shuweihdi,Farag %A Alali,Haitham %A Househ,Mowafa %A Abd-alrazaq,Alaa %+ AI Center for Precision Health, Weill Cornell Medicine, Education City, Qatar Foundation, PO Box 24144, Doha, Qatar, 974 55708549, alaa_alzoubi88@yahoo.com %K machine learning %K voice disorders %K systematic review %K meta-analysis %K diagnose %K screening %K mobile phone %D 2022 %7 14.10.2022 %9 Review %J J Med Internet Res %G English %X Background: When investigating voice disorders a series of processes are used when including voice screening and diagnosis. Both methods have limited standardized tests, which are affected by the clinician’s experience and subjective judgment. Machine learning (ML) algorithms have been used as an objective tool in screening or diagnosing voice disorders. However, the effectiveness of ML algorithms in assessing and diagnosing voice disorders has not received sufficient scholarly attention. Objective: This systematic review aimed to assess the effectiveness of ML algorithms in screening and diagnosing voice disorders. Methods: An electronic search was conducted in 5 databases. Studies that examined the performance (accuracy, sensitivity, and specificity) of any ML algorithm in detecting pathological voice samples were included. Two reviewers independently selected the studies, extracted data from the included studies, and assessed the risk of bias. The methodological quality of each study was assessed using the Quality Assessment of Diagnostic Accuracy Studies 2 tool via RevMan 5 software (Cochrane Library). The characteristics of studies, population, and index tests were extracted, and meta-analyses were conducted to pool the accuracy, sensitivity, and specificity of ML techniques. The issue of heterogeneity was addressed by discussing possible sources and excluding studies when necessary. Results: Of the 1409 records retrieved, 13 studies and 4079 participants were included in this review. A total of 13 ML techniques were used in the included studies, with the most common technique being least squares support vector machine. The pooled accuracy, sensitivity, and specificity of ML techniques in screening voice disorders were 93%, 96%, and 93%, respectively. Least squares support vector machine had the highest accuracy (99%), while the K-nearest neighbor algorithm had the highest sensitivity (98%) and specificity (98%). Quadric discriminant analysis achieved the lowest accuracy (91%), sensitivity (89%), and specificity (89%). Conclusions: ML showed promising findings in the screening of voice disorders. However, the findings were not conclusive in diagnosing voice disorders owing to the limited number of studies that used ML for diagnostic purposes; thus, more investigations are needed. While it might not be possible to use ML alone as a substitute for current diagnostic tools, it may be used as a decision support tool for clinicians to assess their patients, which could improve the management process for assessment. Trial Registration: PROSPERO CRD42020214438; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=214438 %M 36239999 %R 10.2196/38472 %U https://www.jmir.org/2022/10/e38472 %U https://doi.org/10.2196/38472 %U http://www.ncbi.nlm.nih.gov/pubmed/36239999 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 1 %N 1 %P e41940 %T Provider Perspectives on Artificial Intelligence–Guided Screening for Low Ejection Fraction in Primary Care: Qualitative Study %A Barry,Barbara %A Zhu,Xuan %A Behnken,Emma %A Inselman,Jonathan %A Schaepe,Karen %A McCoy,Rozalina %A Rushlow,David %A Noseworthy,Peter %A Richardson,Jordan %A Curtis,Susan %A Sharp,Richard %A Misra,Artika %A Akfaly,Abdulla %A Molling,Paul %A Bernard,Matthew %A Yao,Xiaoxi %+ Division of Health Care Delivery Research, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, United States, 1 507 255 5123, barry.barbara@mayo.edu %K artificial intelligence %K AI %K machine learning %K human-AI interaction %K health informatics %K primary care %K cardiology %K pragmatic clinical trial %K AI-enabled clinical decision support %K human-computer interaction %K health care delivery %K clinical decision support %K health care %K AI tools %D 2022 %7 14.10.2022 %9 Original Paper %J JMIR AI %G English %X Background: The promise of artificial intelligence (AI) to transform health care is threatened by a tangle of challenges that emerge as new AI tools are introduced into clinical practice. AI tools with high accuracy, especially those that detect asymptomatic cases, may be hindered by barriers to adoption. Understanding provider needs and concerns is critical to inform implementation strategies that improve provider buy-in and adoption of AI tools in medicine. Objective: This study aimed to describe provider perspectives on the adoption of an AI-enabled screening tool in primary care to inform effective integration and sustained use. Methods: A qualitative study was conducted between December 2019 and February 2020 as part of a pragmatic randomized controlled trial at a large academic medical center in the United States. In all, 29 primary care providers were purposively sampled using a positive deviance approach for participation in semistructured focus groups after their use of the AI tool in the randomized controlled trial was complete. Focus group data were analyzed using a grounded theory approach; iterative analysis was conducted to identify codes and themes, which were synthesized into findings. Results: Our findings revealed that providers understood the purpose and functionality of the AI tool and saw potential value for more accurate and faster diagnoses. However, successful adoption into routine patient care requires the smooth integration of the tool with clinical decision-making and existing workflow to address provider needs and preferences during implementation. To fulfill the AI tool’s promise of clinical value, providers identified areas for improvement including integration with clinical decision-making, cost-effectiveness and resource allocation, provider training, workflow integration, care pathway coordination, and provider-patient communication. Conclusions: The implementation of AI-enabled tools in medicine can benefit from sensitivity to the nuanced context of care and provider needs to enable the useful adoption of AI tools at the point of care. Trial Registration: ClinicalTrials.gov NCT04000087; https://clinicaltrials.gov/ct2/show/NCT04000087 %M 38875550 %R 10.2196/41940 %U https://ai.jmir.org/2022/1/e41940 %U https://doi.org/10.2196/41940 %U http://www.ncbi.nlm.nih.gov/pubmed/38875550 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 10 %P e37704 %T An Artificial Intelligence–Driven Digital Health Solution to Support Clinical Management of Patients With Long COVID-19: Protocol for a Prospective Multicenter Observational Study %A Fuster-Casanovas,Aïna %A Fernandez-Luque,Luis %A Nuñez-Benjumea,Francisco J %A Moreno Conde,Alberto %A Luque-Romero,Luis G %A Bilionis,Ioannis %A Rubio Escudero,Cristina %A Chicchi Giglioli,Irene Alice %A Vidal-Alaball,Josep %+ Adhera Health Inc, 1001 Page Mill Rd Building One, Suite 200, Palo Alto, CA, 94304, United States, 34 656930901, luis@adherahealth.com %K COVID-19 syndrome %K artificial intelligence %K AI %K primary health care %K Postacute COVID-19 syndrome %K COVID-19 %K health system %K health care %K health care resource %K public health policy %K long COVID-19 %K mHealth %K digital health solution %K patient %K clinical information %K clinical decision support %D 2022 %7 14.10.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: COVID-19 pandemic has revealed the weaknesses of most health systems around the world, collapsing them and depleting their available health care resources. Fortunately, the development and enforcement of specific public health policies, such as vaccination, mask wearing, and social distancing, among others, has reduced the prevalence and complications associated with COVID-19 in its acute phase. However, the aftermath of the global pandemic has called for an efficient approach to manage patients with long COVID-19. This is a great opportunity to leverage on innovative digital health solutions to provide exhausted health care systems with the most cost-effective and efficient tools available to support the clinical management of this population. In this context, the SENSING-AI project is focused on the research toward the implementation of an artificial intelligence–driven digital health solution that supports both the adaptive self-management of people living with long COVID-19 and the health care staff in charge of the management and follow-up of this population. Objective: The objective of this protocol is the prospective collection of psychometric and biometric data from 10 patients for training algorithms and prediction models to complement the SENSING-AI cohort. Methods: Publicly available health and lifestyle data registries will be consulted and complemented with a retrospective cohort of anonymized data collected from clinical information of patients diagnosed with long COVID-19. Furthermore, a prospective patient-generated data set will be captured using wearable devices and validated patient-reported outcomes questionnaires to complement the retrospective cohort. Finally, the ‘Findability, Accessibility, Interoperability, and Reuse’ guiding principles for scientific data management and stewardship will be applied to the resulting data set to encourage the continuous process of discovery, evaluation, and reuse of information for the research community at large. Results: The SENSING-AI cohort is expected to be completed during 2022. It is expected that sufficient data will be obtained to generate artificial intelligence models based on behavior change and mental well-being techniques to improve patients’ self-management, while providing useful and timely clinical decision support services to health care professionals based on risk stratification models and early detection of exacerbations. Conclusions: SENSING-AI focuses on obtaining high-quality data of patients with long COVID-19 during their daily life. Supporting these patients is of paramount importance in the current pandemic situation, including supporting their health care professionals in a cost-effective and efficient management of long COVID-19. Trial Registration: Clinicaltrials.gov NCT05204615; https://clinicaltrials.gov/ct2/show/NCT05204615 International Registered Report Identifier (IRRID): DERR1-10.2196/37704 %M 36166648 %R 10.2196/37704 %U https://www.researchprotocols.org/2022/10/e37704 %U https://doi.org/10.2196/37704 %U http://www.ncbi.nlm.nih.gov/pubmed/36166648 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 10 %P e39187 %T A Recurrent Neural Network Model for Predicting Activated Partial Thromboplastin Time After Treatment With Heparin: Retrospective Study %A Boie,Sebastian Daniel %A Engelhardt,Lilian Jo %A Coenen,Nicolas %A Giesa,Niklas %A Rubarth,Kerstin %A Menk,Mario %A Balzer,Felix %+ Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, Berlin, 10117, Germany, 49 30 450580877, Sebastian-Daniel.Boie@charite.de %K machine learning %K health care %K recurrent neural network %K heparin %K activated partial thromboplastin time (aPTT) %K deep learning %K ICU %K critical care %D 2022 %7 13.10.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: Anticoagulation therapy with heparin is a frequent treatment in intensive care units and is monitored by activated partial thromboplastin clotting time (aPTT). It has been demonstrated that reaching an established anticoagulation target within 24 hours is associated with favorable outcomes. However, patients respond to heparin differently and reaching the anticoagulation target can be challenging. Machine learning algorithms may potentially support clinicians with improved dosing recommendations. Objective: This study evaluates a range of machine learning algorithms on their capability of predicting the patients’ response to heparin treatment. In this analysis, we apply, for the first time, a model that considers time series. Methods: We extracted patient demographics, laboratory values, dialysis and extracorporeal membrane oxygenation treatments, and scores from the hospital information system. We predicted the numerical values of aPTT laboratory values 24 hours after continuous heparin infusion and evaluated 7 different machine learning models. The best-performing model was compared to recently published models on a classification task. We considered all data before and within the first 12 hours of continuous heparin infusion as features and predicted the aPTT value after 24 hours. Results: The distribution of aPTT in our cohort of 5926 hospital admissions was highly skewed. Most patients showed aPTT values below 75 s, while some outliers showed much higher aPTT values. A recurrent neural network that consumes a time series of features showed the highest performance on the test set. Conclusions: A recurrent neural network that uses time series of features instead of only static and aggregated features showed the highest performance in predicting aPTT after heparin treatment. %M 36227653 %R 10.2196/39187 %U https://medinform.jmir.org/2022/10/e39187 %U https://doi.org/10.2196/39187 %U http://www.ncbi.nlm.nih.gov/pubmed/36227653 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 10 %P e40344 %T Successful Integration of EN/ISO 13606–Standardized Extracts From a Patient Mobile App Into an Electronic Health Record: Description of a Methodology %A Frid,Santiago %A Fuentes Expósito,Maria Angeles %A Grau-Corral,Inmaculada %A Amat-Fernandez,Clara %A Muñoz Mateu,Montserrat %A Pastor Duran,Xavier %A Lozano-Rubí,Raimundo %+ Medical Informatics Unit, Hospital Clínic de Barcelona, Villarroel 170, Barcelona, 08036, Spain, 34 932 27 54 00 ext 3344, santifrid@gmail.com %K health information interoperability %K mobile app %K health information standards %K artificial intelligence %K electronic health records %K machine learning %D 2022 %7 12.10.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: There is an increasing need to integrate patient-generated health data (PGHD) into health information systems (HISs). The use of health information standards based on the dual model allows the achievement of semantic interoperability among systems. Although there is evidence in the use of the Substitutable Medical Applications and Reusable Technologies on Fast Healthcare Interoperability Resources (SMART on FHIR) framework for standardized communication between mobile apps and electronic health records (EHRs), the use of European Norm/International Organization for Standardization (EN/ISO) 13606 has not been explored yet, despite some advantages over FHIR in terms of modeling and formalization of clinical knowledge, as well as flexibility in the creation of new concepts. Objective: This study aims to design and implement a methodology based on the dual-model paradigm to communicate clinical information between a patient mobile app (Xemio Research) and an institutional ontology-based clinical repository (OntoCR) without loss of meaning. Methods: This paper is framed within Artificial intelligence Supporting CAncer Patients across Europe (ASCAPE), a project that aims to use artificial intelligence (AI)/machine learning (ML) mechanisms to support cancer patients’ health status and quality of life (QoL). First, the variables “side effect” and “daily steps” were defined and represented with EN/ISO 13606 archetypes. Next, ontologies that model archetyped concepts and map them to the standard were created and uploaded to OntoCR, where they were ready to receive instantiated patient data. Xemio Research used a conversion module in the ASCAPE Local Edge to transform data entered into the app to create EN/ISO 13606 extracts, which were sent to an Application Programming Interface (API) in OntoCR that maps each element in the normalized XML files to its corresponding location in the ontology. This way, instantiated data of patients are stored in the clinical repository. Results: Between December 22, 2020, and April 4, 2022, 1100 extracts of 47 patients were successfully communicated (234/1100, 21.3%, extracts of side effects and 866/1100, 78.7%, extracts of daily activity). Furthermore, the creation of EN/ISO 13606–standardized archetypes allows the reuse of clinical information regarding daily activity and side effects, while with the creation of ontologies, we extended the knowledge representation of our clinical repository. Conclusions: Health information interoperability is one of the requirements for continuity of health care. The dual model allows the separation of knowledge and information in HISs. EN/ISO 13606 was chosen for this project because of the operational mechanisms it offers for data exchange, as well as its flexibility for modeling knowledge and creating new concepts. To the best of our knowledge, this is the first experience reported in the literature of effective communication of EN/ISO 13606 EHR extracts between a patient mobile app and an institutional clinical repository using a scalable standard-agnostic methodology that can be applied to other projects, data sources, and institutions. %M 36222792 %R 10.2196/40344 %U https://medinform.jmir.org/2022/10/e40344 %U https://doi.org/10.2196/40344 %U http://www.ncbi.nlm.nih.gov/pubmed/36222792 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 9 %N 4 %P e39102 %T Answering Hospital Caregivers’ Questions at Any Time: Proof-of-Concept Study of an Artificial Intelligence–Based Chatbot in a French Hospital %A Daniel,Thomas %A de Chevigny,Alix %A Champrigaud,Adeline %A Valette,Julie %A Sitbon,Marine %A Jardin,Meryam %A Chevalier,Delphine %A Renet,Sophie %+ Department of Pharmacy, Paris Saint-Joseph Hospital Group, 185 Raymond Losserand Street, Paris, 75014, France, 33 144127191, srenet@ghpsj.fr %K chatbot %K artificial intelligence %K pharmacy %K hospital %K health care %K drugs %K medication %K information quality %K health information %K caregiver %K healthcare staff %K digital health tool %K COVID-19 %K information technology %D 2022 %7 11.10.2022 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Access to accurate information in health care is a key point for caregivers to avoid medication errors, especially with the reorganization of staff and drug circuits during health crises such as the COVID‑19 pandemic. It is, therefore, the role of the hospital pharmacy to answer caregivers’ questions. Some may require the expertise of a pharmacist, some should be answered by pharmacy technicians, but others are simple and redundant, and automated responses may be provided. Objective: We aimed at developing and implementing a chatbot to answer questions from hospital caregivers about drugs and pharmacy organization 24 hours a day and to evaluate this tool. Methods: The ADDIE (Analysis, Design, Development, Implementation, and Evaluation) model was used by a multiprofessional team composed of 3 hospital pharmacists, 2 members of the Innovation and Transformation Department, and the IT service provider. Based on an analysis of the caregivers’ needs about drugs and pharmacy organization, we designed and developed a chatbot. The tool was then evaluated before its implementation into the hospital intranet. Its relevance and conversations with testers were monitored via the IT provider’s back office. Results: Needs analysis with 5 hospital pharmacists and 33 caregivers from 5 health services allowed us to identify 7 themes about drugs and pharmacy organization (such as opening hours and specific prescriptions). After a year of chatbot design and development, the test version obtained good evaluation scores: its speed was rated 8.2 out of 10, usability 8.1 out of 10, and appearance 7.5 out of 10. Testers were generally satisfied (70%) and were hoping for the content to be enhanced. Conclusions: The chatbot seems to be a relevant tool for hospital caregivers, helping them obtain reliable and verified information they need on drugs and pharmacy organization. In the context of significant mobility of nursing staff during the health crisis due to the COVID-19 pandemic, the chatbot could be a suitable tool for transmitting relevant information related to drug circuits or specific procedures. To our knowledge, this is the first time that such a tool has been designed for caregivers. Its development further continued by means of tests conducted with other users such as pharmacy technicians and via the integration of additional data before the implementation on the 2 hospital sites. %M 35930555 %R 10.2196/39102 %U https://humanfactors.jmir.org/2022/4/e39102 %U https://doi.org/10.2196/39102 %U http://www.ncbi.nlm.nih.gov/pubmed/35930555 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 10 %P e39905 %T Methodological Frameworks and Dimensions to Be Taken Into Consideration in Digital Health Technology Assessment: Protocol for a Scoping Review %A Segur-Ferrer,Joan %A Moltó-Puigmartí,Carolina %A Pastells-Peiró,Roland %A Vivanco-Hidalgo,Rosa Maria %+ Health Technology Assessment and Quality of Care Area, Agency for Health Quality and Assessment of Catalonia, Carrer de Roc Boronat, 81-95 (segona planta), Barcelona, 08005, Spain, 34 935 513 900, joan.segur@gencat.cat %K digital health %K eHealth %K mobile health %K artificial intelligence %K framework %K health technology assessment %D 2022 %7 11.10.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: Health technology assessment (HTA) is one of the main tools that health systems have to appraise evidence and determine the value of a given health technology. Although the existing HTA frameworks are useful tools for the evaluation of a wide range of health technologies, more and more experts, organizations across the world, and HTA agencies are highlighting the need to update or develop specific methodological frameworks for the evaluation of digital health technologies in order to take into account additional domains that cover these technologies’ intrinsic characteristics. Objective: The purpose of our scoping review is to identify the methodological frameworks that are used worldwide for the assessment of digital health technologies; determine what dimensions and aspects are being considered; and generate, through a thematic analysis, a proposal for a methodological framework that is based on the most frequently described dimensions in the literature. Methods: The scoping review will be performed in accordance with the guidelines established in the updated statement of the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews). We will search for peer-reviewed and grey literature published between 2011 and the date of the search execution. The retrieved references will be reviewed in a single-blind manner by 2 independent authors, and their quality will be assessed by using the Critical Appraisal Skills Program tool. The ATLAS.ti software (Scientific Software Development GmbH) will be used for data extraction and to perform the thematic analysis. Results: The scoping review is currently (May 2022) in progress. It is expected to be completed in October 2022, and the final results of the research will be presented and published by November 2022. Conclusions: To our knowledge, no studies have been published to date that identify the existing methodological frameworks for digital HTA, determine which dimensions must be evaluated for correct decision-making, and serve as a basis for the development of a methodological framework of reference that health care systems can use to carry out this kind of assessment. This work is intended to address this knowledge gap of key relevance for the field of HTA. International Registered Report Identifier (IRRID): DERR1-10.2196/39905 %M 36222788 %R 10.2196/39905 %U https://www.researchprotocols.org/2022/10/e39905 %U https://doi.org/10.2196/39905 %U http://www.ncbi.nlm.nih.gov/pubmed/36222788 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 5 %N 4 %P e38464 %T Evolving Hybrid Partial Genetic Algorithm Classification Model for Cost-effective Frailty Screening: Investigative Study %A Oates,John %A Shafiabady,Niusha %A Ambagtsheer,Rachel %A Beilby,Justin %A Seiboth,Chris %A Dent,Elsa %+ College of Engineering, Information Technology and Environment, Charles Darwin University, 815 George Street, Haymarket, NSW, 2000, Australia, 61 80474147, niusha.shafiabady@cdu.edu.au %K machine learning %K frailty screening %K partial genetic algorithms %K SVM %K KNN %K decision trees %K frailty %K algorithm %K cost %K model %K index %K database %K ai %K ageing %K adults %K older people %K screening %K tool %D 2022 %7 7.10.2022 %9 Original Paper %J JMIR Aging %G English %X Background: A commonly used method for measuring frailty is the accumulation of deficits expressed as a frailty index (FI). FIs can be readily adapted to many databases, as the parameters to use are not prescribed but rather reflect a subset of extracted features (variables). Unfortunately, the structure of many databases does not permit the direct extraction of a suitable subset, requiring additional effort to determine and verify the value of features for each record and thus significantly increasing cost. Objective: Our objective is to describe how an artificial intelligence (AI) optimization technique called partial genetic algorithms can be used to refine the subset of features used to calculate an FI and favor features that have the least cost of acquisition. Methods: This is a secondary analysis of a residential care database compiled from 10 facilities in Queensland, Australia. The database is comprised of routinely collected administrative data and unstructured patient notes for 592 residents aged 75 years and over. The primary study derived an electronic frailty index (eFI) calculated from 36 suitable features. We then structurally modified a genetic algorithm to find an optimal predictor of the calculated eFI (0.21 threshold) from 2 sets of features. Partial genetic algorithms were used to optimize 4 underlying classification models: logistic regression, decision trees, random forest, and support vector machines. Results: Among the underlying models, logistic regression was found to produce the best models in almost all scenarios and feature set sizes. The best models were built using all the low-cost features and as few as 10 high-cost features, and they performed well enough (sensitivity 89%, specificity 87%) to be considered candidates for a low-cost frailty screening test. Conclusions: In this study, a systematic approach for selecting an optimal set of features with a low cost of acquisition and performance comparable to the eFI for detecting frailty was demonstrated on an aged care database. Partial genetic algorithms have proven useful in offering a trade-off between cost and accuracy to systematically identify frailty. %M 36206042 %R 10.2196/38464 %U https://aging.jmir.org/2022/4/e38464 %U https://doi.org/10.2196/38464 %U http://www.ncbi.nlm.nih.gov/pubmed/36206042 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 10 %P e42055 %T Formative Evaluation of the Acceptance of HIV Prevention Artificial Intelligence Chatbots By Men Who Have Sex With Men in Malaysia: Focus Group Study %A Peng,Mary L %A Wickersham,Jeffrey A %A Altice,Frederick L %A Shrestha,Roman %A Azwa,Iskandar %A Zhou,Xin %A Halim,Mohd Akbar Ab %A Ikhtiaruddin,Wan Mohd %A Tee,Vincent %A Kamarulzaman,Adeeba %A Ni,Zhao %+ School of Nursing, Yale University, 400 West Campus Drive, New Haven, CT, 06477, United States, 1 203 737 3039, zhao.ni@yale.edu %K artificial intelligence %K chatbot %K HIV prevention %K implementation science %K men who have sex with men %K MSM %K mobile health design %K mHealth design %K unified theory of acceptance and use of technology %K mobile phone %D 2022 %7 6.10.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Mobile technologies are being increasingly developed to support the practice of medicine, nursing, and public health, including HIV testing and prevention. Chatbots using artificial intelligence (AI) are novel mobile health strategies that can promote HIV testing and prevention among men who have sex with men (MSM) in Malaysia, a hard-to-reach population at elevated risk of HIV, yet little is known about the features that are important to this key population. Objective: The aim of this study was to identify the barriers to and facilitators of Malaysian MSM’s acceptance of an AI chatbot designed to assist in HIV testing and prevention in relation to its perceived benefits, limitations, and preferred features among potential users. Methods: We conducted 5 structured web-based focus group interviews with 31 MSM in Malaysia between July 2021 and September 2021. The interviews were first recorded, transcribed, coded, and thematically analyzed using NVivo (version 9; QSR International). Subsequently, the unified theory of acceptance and use of technology was used to guide data analysis to map emerging themes related to the barriers to and facilitators of chatbot acceptance onto its 4 domains: performance expectancy, effort expectancy, facilitating conditions, and social influence. Results: Multiple barriers and facilitators influencing MSM’s acceptance of an AI chatbot were identified for each domain. Performance expectancy (ie, the perceived usefulness of the AI chatbot) was influenced by MSM’s concerns about the AI chatbot’s ability to deliver accurate information, its effectiveness in information dissemination and problem-solving, and its ability to provide emotional support and raise health awareness. Convenience, cost, and technical errors influenced the AI chatbot’s effort expectancy (ie, the perceived ease of use). Efficient linkage to health care professionals and HIV self-testing was reported as a facilitating condition of MSM’s receptiveness to using an AI chatbot to access HIV testing. Participants stated that social influence (ie, sociopolitical climate) factors influencing the acceptance of mobile technology that addressed HIV in Malaysia included privacy concerns, pervasive stigma against homosexuality, and the criminalization of same-sex sexual behaviors. Key design strategies that could enhance MSM’s acceptance of an HIV prevention AI chatbot included an anonymous user setting; embedding the chatbot in MSM-friendly web-based platforms; and providing user-guiding questions and options related to HIV testing, prevention, and treatment. Conclusions: This study provides important insights into key features and potential implementation strategies central to designing an AI chatbot as a culturally sensitive digital health tool to prevent stigmatized health conditions in vulnerable and systematically marginalized populations. Such features not only are crucial to designing effective user-centered and culturally situated mobile health interventions for MSM in Malaysia but also illuminate the importance of incorporating social stigma considerations into health technology implementation strategies. %M 36201390 %R 10.2196/42055 %U https://formative.jmir.org/2022/10/e42055 %U https://doi.org/10.2196/42055 %U http://www.ncbi.nlm.nih.gov/pubmed/36201390 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 10 %P e40238 %T Artificial Intelligence Applications in Health Care Practice: Scoping Review %A Sharma,Malvika %A Savage,Carl %A Nair,Monika %A Larsson,Ingrid %A Svedberg,Petra %A Nygren,Jens M %+ School of Health and Welfare, Halmstad University, Box 823, Halmstad, 30118, Sweden, 46 035167100, jens.nygren@hh.se %K artificial intelligence %K health care %K implementation %K scoping review %K technology adoption %D 2022 %7 5.10.2022 %9 Review %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) is often heralded as a potential disruptor that will transform the practice of medicine. The amount of data collected and available in health care, coupled with advances in computational power, has contributed to advances in AI and an exponential growth of publications. However, the development of AI applications does not guarantee their adoption into routine practice. There is a risk that despite the resources invested, benefits for patients, staff, and society will not be realized if AI implementation is not better understood. Objective: The aim of this study was to explore how the implementation of AI in health care practice has been described and researched in the literature by answering 3 questions: What are the characteristics of research on implementation of AI in practice? What types and applications of AI systems are described? What characteristics of the implementation process for AI systems are discernible? Methods: A scoping review was conducted of MEDLINE (PubMed), Scopus, Web of Science, CINAHL, and PsycINFO databases to identify empirical studies of AI implementation in health care since 2011, in addition to snowball sampling of selected reference lists. Using Rayyan software, we screened titles and abstracts and selected full-text articles. Data from the included articles were charted and summarized. Results: Of the 9218 records retrieved, 45 (0.49%) articles were included. The articles cover diverse clinical settings and disciplines; most (32/45, 71%) were published recently, were from high-income countries (33/45, 73%), and were intended for care providers (25/45, 56%). AI systems are predominantly intended for clinical care, particularly clinical care pertaining to patient-provider encounters. More than half (24/45, 53%) possess no action autonomy but rather support human decision-making. The focus of most research was on establishing the effectiveness of interventions (16/45, 35%) or related to technical and computational aspects of AI systems (11/45, 24%). Focus on the specifics of implementation processes does not yet seem to be a priority in research, and the use of frameworks to guide implementation is rare. Conclusions: Our current empirical knowledge derives from implementations of AI systems with low action autonomy and approaches common to implementations of other types of information systems. To develop a specific and empirically based implementation framework, further research is needed on the more disruptive types of AI systems being implemented in routine care and on aspects unique to AI implementation in health care, such as building trust, addressing transparency issues, developing explainable and interpretable solutions, and addressing ethical concerns around privacy and data protection. %M 36197712 %R 10.2196/40238 %U https://www.jmir.org/2022/10/e40238 %U https://doi.org/10.2196/40238 %U http://www.ncbi.nlm.nih.gov/pubmed/36197712 %0 Journal Article %@ 2563-3570 %I JMIR Publications %V 3 %N 1 %P e36660 %T Multiple-Inputs Convolutional Neural Network for COVID-19 Classification and Critical Region Screening From Chest X-ray Radiographs: Model Development and Performance Evaluation %A Li,Zhongqiang %A Li,Zheng %A Yao,Luke %A Chen,Qing %A Zhang,Jian %A Li,Xin %A Feng,Ji-Ming %A Li,Yanping %A Xu,Jian %+ Division of Electrical and Computer Engineering, College of Engineering, Louisiana State University, Patrick F Taylor Hall, 3304 S Quad Dr, Baton Rouge, LA, 70803, United States, 1 (225) 578 4483, jianxu1@lsu.edu %K COVID-19 %K chest X-ray radiography %K multiple-inputs convolutional neural network %K screening critical COVID regions %D 2022 %7 4.10.2022 %9 Original Paper %J JMIR Bioinform Biotech %G English %X Background: The COVID-19 pandemic is becoming one of the largest, unprecedented health crises, and chest X-ray radiography (CXR) plays a vital role in diagnosing COVID-19. However, extracting and finding useful image features from CXRs demand a heavy workload for radiologists. Objective: The aim of this study was to design a novel multiple-inputs (MI) convolutional neural network (CNN) for the classification of COVID-19 and extraction of critical regions from CXRs. We also investigated the effect of the number of inputs on the performance of our new MI-CNN model. Methods: A total of 6205 CXR images (including 3021 COVID-19 CXRs and 3184 normal CXRs) were used to test our MI-CNN models. CXRs could be evenly segmented into different numbers (2, 4, and 16) of individual regions. Each region could individually serve as one of the MI-CNN inputs. The CNN features of these MI-CNN inputs would then be fused for COVID-19 classification. More importantly, the contributions of each CXR region could be evaluated through assessing the number of images that were accurately classified by their corresponding regions in the testing data sets. Results: In both the whole-image and left- and right-lung region of interest (LR-ROI) data sets, MI-CNNs demonstrated good efficiency for COVID-19 classification. In particular, MI-CNNs with more inputs (2-, 4-, and 16-input MI-CNNs) had better efficiency in recognizing COVID-19 CXRs than the 1-input CNN. Compared to the whole-image data sets, the efficiency of LR-ROI data sets showed approximately 4% lower accuracy, sensitivity, specificity, and precision (over 91%). In considering the contributions of each region, one of the possible reasons for this reduced performance was that nonlung regions (eg, region 16) provided false-positive contributions to COVID-19 classification. The MI-CNN with the LR-ROI data set could provide a more accurate evaluation of the contribution of each region and COVID-19 classification. Additionally, the right-lung regions had higher contributions to the classification of COVID-19 CXRs, whereas the left-lung regions had higher contributions to identifying normal CXRs. Conclusions: Overall, MI-CNNs could achieve higher accuracy with an increasing number of inputs (eg, 16-input MI-CNN). This approach could assist radiologists in identifying COVID-19 CXRs and in screening the critical regions related to COVID-19 classifications. %M 36277075 %R 10.2196/36660 %U https://bioinform.jmir.org/2022/1/e36660 %U https://doi.org/10.2196/36660 %U http://www.ncbi.nlm.nih.gov/pubmed/36277075 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 9 %N 4 %P e38876 %T Assessing the Topics and Motivating Factors Behind Human-Social Chatbot Interactions: Thematic Analysis of User Experiences %A Ta-Johnson,Vivian P %A Boatfield,Carolynn %A Wang,Xinyu %A DeCero,Esther %A Krupica,Isabel C %A Rasof,Sophie D %A Motzer,Amelie %A Pedryc,Wiktoria M %+ Department of Psychology, Lake Forest College, 555 N Sheridan Road, Lake Forest, IL, 60045, United States, 1 847 735 5258, ta@lakeforest.edu %K social chatbots %K Replika %K emotional chatbots %K artificial intelligence %K thematic analysis %K human-chatbot interactions %K chatbot %K usability %K interaction %K human factors %K motivation %K topics %K AI %K perception %K usage %D 2022 %7 3.10.2022 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Although social chatbot usage is expected to increase as language models and artificial intelligence improve, very little is known about the dynamics of human-social chatbot interactions. Specifically, there is a paucity of research examining why human-social chatbot interactions are initiated and the topics that are discussed. Objective: We sought to identify the motivating factors behind initiating contact with Replika, a popular social chatbot, and the topics discussed in these interactions. Methods: A sample of Replika users completed a survey that included open-ended questions pertaining to the reasons why they initiated contact with Replika and the topics they typically discuss. Thematic analyses were then used to extract themes and subthemes regarding the motivational factors behind Replika use and the types of discussions that take place in conversations with Replika. Results: Users initiated contact with Replika out of interest, in search of social support, and to cope with mental and physical health conditions. Users engaged in a wide variety of discussion topics with their Replika, including intellectual topics, life and work, recreation, mental health, connection, Replika, current events, and other people. Conclusions: Given the wide range of motivational factors and discussion topics that were reported, our results imply that multifaceted support can be provided by a single social chatbot. While previous research already established that social chatbots can effectively help address mental and physical health issues, these capabilities have been dispersed across several different social chatbots instead of deriving from a single one. Our results also highlight a motivating factor of human-social chatbot usage that has received less attention than other motivating factors: interest. Users most frequently reported using Replika out of interest and sought to explore its capabilities and learn more about artificial intelligence. Thus, while developers and researchers study human-social chatbot interactions with the efficacy of the social chatbot and its targeted user base in mind, it is equally important to consider how its usage can shape public perceptions and support for social chatbots and artificial agents in general. %M 36190745 %R 10.2196/38876 %U https://humanfactors.jmir.org/2022/4/e38876 %U https://doi.org/10.2196/38876 %U http://www.ncbi.nlm.nih.gov/pubmed/36190745 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 9 %P e39452 %T Enabling Early Obstructive Sleep Apnea Diagnosis With Machine Learning: Systematic Review %A Ferreira-Santos,Daniela %A Amorim,Pedro %A Silva Martins,Tiago %A Monteiro-Soares,Matilde %A Pereira Rodrigues,Pedro %+ Department of Community Medicine, Information and Decision Sciences, Faculty of Medicine, University of Porto, Rua Dr Plácido da Costa, s/n, Porto, 4200-450, Portugal, 351 937710766, danielasantos@med.up.pt %K machine learning %K obstructive sleep apnea %K systematic review %K polysomnography %D 2022 %7 30.9.2022 %9 Review %J J Med Internet Res %G English %X Background: American Academy of Sleep Medicine guidelines suggest that clinical prediction algorithms can be used to screen patients with obstructive sleep apnea (OSA) without replacing polysomnography, the gold standard. Objective: We aimed to identify, gather, and analyze existing machine learning approaches that are being used for disease screening in adult patients with suspected OSA. Methods: We searched the MEDLINE, Scopus, and ISI Web of Knowledge databases to evaluate the validity of different machine learning techniques, with polysomnography as the gold standard outcome measure and used the Prediction Model Risk of Bias Assessment Tool (Kleijnen Systematic Reviews Ltd) to assess risk of bias and applicability of each included study. Results: Our search retrieved 5479 articles, of which 63 (1.15%) articles were included. We found 23 studies performing diagnostic model development alone, 26 with added internal validation, and 14 applying the clinical prediction algorithm to an independent sample (although not all reporting the most common discrimination metrics, sensitivity or specificity). Logistic regression was applied in 35 studies, linear regression in 16, support vector machine in 9, neural networks in 8, decision trees in 6, and Bayesian networks in 4. Random forest, discriminant analysis, classification and regression tree, and nomogram were each performed in 2 studies, whereas Pearson correlation, adaptive neuro-fuzzy inference system, artificial immune recognition system, genetic algorithm, supersparse linear integer models, and k-nearest neighbors algorithm were each performed in 1 study. The best area under the receiver operating curve was 0.98 (0.96-0.99) for age, waist circumference, Epworth Somnolence Scale score, and oxygen saturation as predictors in a logistic regression. Conclusions: Although high values were obtained, they still lacked external validation results in large cohorts and a standard OSA criteria definition. Trial Registration: PROSPERO CRD42021221339; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=221339 %M 36178720 %R 10.2196/39452 %U https://www.jmir.org/2022/9/e39452 %U https://doi.org/10.2196/39452 %U http://www.ncbi.nlm.nih.gov/pubmed/36178720 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 9 %N 3 %P e39234 %T Electronic Diagnostic Support in Emergency Physician Triage: Qualitative Study With Thematic Analysis of Interviews %A Sibbald,Matthew %A Abdulla,Bashayer %A Keuhl,Amy %A Norman,Geoffrey %A Monteiro,Sandra %A Sherbino,Jonathan %+ McMaster Education Research, Innovation & Theory (MERIT) Program, Department of Medicine, McMaster University, 100 Main Street West, Hamilton, ON, L8P 1H6, Canada, 1 905 921 2101 ext 44477, matthew.sibbald@medportal.ca %K electronic differential diagnostic support %K clinical reasoning %K natural language processing %K triage %K diagnostic error %K human factors %K diagnosis %K diagnostic %K emergency %K artificial intelligence %K adoption %K attitude %K support system %K automation %D 2022 %7 30.9.2022 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Not thinking of a diagnosis is a leading cause of diagnostic error in the emergency department, resulting in delayed treatment, morbidity, and excess mortality. Electronic differential diagnostic support (EDS) results in small but significant reductions in diagnostic error. However, the uptake of EDS by clinicians is limited. Objective: We sought to understand physician perceptions and barriers to the uptake of EDS within the emergency department triage process. Methods: We conducted a qualitative study using a research associate to rapidly prototype an embedded EDS into the emergency department triage process. Physicians involved in the triage assessment of a busy emergency department were provided the output of an EDS based on the triage complaint by an embedded researcher to simulate an automated system that would draw from the electronic medical record. Physicians were interviewed immediately after their experience. Verbatim transcripts were analyzed by a team using open and axial coding, informed by direct content analysis. Results: In all, 4 themes emerged from 14 interviews: (1) the quality of the EDS was inferred from the scope and prioritization of the diagnoses present in the EDS differential; (2) the trust of the EDS was linked to varied beliefs around the diagnostic process and potential for bias; (3) clinicians foresaw more benefit to EDS use for colleagues and trainees rather than themselves; and (4) clinicians felt strongly that EDS output should not be included in the patient record. Conclusions: The adoption of an EDS into an emergency department triage process will require a system that provides diagnostic suggestions appropriate for the scope and context of the emergency department triage process, transparency of system design, and affordances for clinician beliefs about the diagnostic process and addresses clinician concern around including EDS output in the patient record. %M 36178728 %R 10.2196/39234 %U https://humanfactors.jmir.org/2022/3/e39234 %U https://doi.org/10.2196/39234 %U http://www.ncbi.nlm.nih.gov/pubmed/36178728 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 9 %P e30113 %T Thinking Aloud or Screaming Inside: Exploratory Study of Sentiment Around Work %A Hoque Tania,Marzia %A Hossain,Md Razon %A Jahanara,Nuzhat %A Andreev,Ilya %A Clifton,David A %+ Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, CHI Lab, Old Road Research Campus, Oxford, OX3 7DQ, United Kingdom, 44 1865617670, marzia.hoquetania@eng.ox.ac.uk %K work-related mental health %K sentiment analysis %K natural language processing %K occupational health %K Bayesian inference %K machine learning %K artificial intelligence %K mobile phone %D 2022 %7 30.9.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Millions of workers experience work-related ill health every year. The loss of working days often accounts for poor well-being because of discomfort and stress caused by the workplace. The ongoing pandemic and postpandemic shift in socioeconomic and work culture can continue to contribute to adverse work-related sentiments. Critically investigating state-of-the-art technologies, this study identifies the research gaps in recognizing workers’ need for well-being support, and we aspire to understand how such evidence can be collected to transform the workforce and workplace. Objective: Building on recent advances in sentiment analysis, this study aims to closely examine the potential of social media as a tool to assess workers’ emotions toward the workplace. Methods: This study collected a large Twitter data set comprising both pandemic and prepandemic tweets facilitated through a human-in-the-loop approach in combination with unsupervised learning and meta-heuristic optimization algorithms. The raw data preprocessed through natural language processing techniques were assessed using a generative statistical model and a lexicon-assisted rule-based model, mapping lexical features to emotion intensities. This study also assigned human annotations and performed work-related sentiment analysis. Results: A mixed methods approach, including topic modeling using latent Dirichlet allocation, identified the top topics from the corpus to understand how Twitter users engage with discussions on work-related sentiments. The sorted aspects were portrayed through overlapped clusters and low intertopic distances. However, further analysis comprising the Valence Aware Dictionary for Sentiment Reasoner suggested a smaller number of negative polarities among diverse subjects. By contrast, the human-annotated data set created for this study contained more negative sentiments. In this study, sentimental juxtaposition revealed through the labeled data set was supported by the n-gram analysis as well. Conclusions: The developed data set demonstrates that work-related sentiments are projected onto social media, which offers an opportunity to better support workers. The infrastructure of the workplace, the nature of the work, the culture within the industry and the particular organization, employers, colleagues, person-specific habits, and upbringing all play a part in the health and well-being of any working adult who contributes to the productivity of the organization. Therefore, understanding the origin and influence of the complex underlying factors both qualitatively and quantitatively can inform the next generation of workplaces to drive positive change by relying on empirically grounded evidence. Therefore, this study outlines a comprehensive approach to capture deeper insights into work-related health. %M 36178712 %R 10.2196/30113 %U https://formative.jmir.org/2022/9/e30113 %U https://doi.org/10.2196/30113 %U http://www.ncbi.nlm.nih.gov/pubmed/36178712 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 9 %P e41241 %T Comparison Between QT and Corrected QT Interval Assessment by an Apple Watch With the AccurBeat Platform and by a 12‑Lead Electrocardiogram With Manual Annotation: Prospective Observational Study %A Chokshi,Sara %A Tologonova,Gulzhan %A Calixte,Rose %A Yadav,Vandana %A Razvi,Naveed %A Lazar,Jason %A Kachnowski,Stan %+ Healthcare Innovation and Technology Lab, 3960 Broadway, New York, NY, 10032, United States, 1 212 543 0100, schokshi@hitlab.org %K artificial intelligence ECG %K AI ECG %K AI wearables %K big data %K cardiovascular medicine %K digital health %K machine learning %D 2022 %7 28.9.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Abnormal prolongation or shortening of the QT interval is associated with increased risk for ventricular arrhythmias and sudden cardiac death. For continuous monitoring, widespread use, and prevention of cardiac events, advanced wearable technologies are emerging as promising surrogates for conventional 12‑lead electrocardiogram (ECG) QT interval assessment. Previous studies have shown a good agreement between QT and corrected QT (QTc) intervals measured on a smartwatch ECG and a 12-lead ECG, but the clinical accuracy of computerized algorithms for QT and QTc interval measurement from smartwatch ECGs is unclear. Objective: The prospective observational study compared the smartwatch-recorded QT and QTc assessed using AccurKardia’s AccurBeat platform with the conventional 12‑lead ECG annotated manually by a cardiologist. Methods: ECGs were collected from healthy participants (without any known cardiovascular disease) aged >22 years. Two consecutive 30-second ECG readings followed by (within 15 minutes) a 10-second standard 12-lead ECG were recorded for each participant. Characteristics of the participants were compared by sex using a 2-sample t test and Wilcoxon rank sum test. Statistical comparisons of heart rate (HR), QT interval, and QTc interval between the platform and the 12-lead ECG, ECG lead I, and ECG lead II were done using the Wilcoxon sign rank test. Linear regression was used to predict QTc and QT intervals from the ECG based on the platform’s QTc/QT intervals with adjustment for age, sex, and difference in HR measurement. The Bland-Altman method was used to check agreement between various QT and QTc interval measurements. Results: A total of 50 participants (32 female, mean age 46 years, SD 1 year) were included in the study. The result of the regression model using the platform measurements to predict the 12-lead ECG measurements indicated that, in univariate analysis, QT/QTc intervals from the platform significantly predicted QT/QTc intervals from the 12-lead ECG, ECG lead I, and ECG lead II, and this remained significant after adjustment for sex, age, and change in HR. The Bland-Altman plot results found that 96% of the average QTc interval measurements between the platform and QTc intervals from the 12-lead ECG were within the 95% confidence limit of the average difference between the two measurements, with a mean difference of –10.5 (95% limits of agreement –71.43, 50.43). A total of 94% of the average QT interval measurements between the platform and the 12-lead ECG were within the 95% CI of the average difference between the two measurements, with a mean difference of –6.3 (95% limits of agreement –54.54, 41.94). Conclusions: QT and QTc intervals obtained by a smartwatch coupled with the platform’s assessment were comparable to those from a 12-lead ECG. Accordingly, with further refinements, remote monitoring using this technology holds promise for the identification of QT interval prolongation. %M 36169999 %R 10.2196/41241 %U https://formative.jmir.org/2022/9/e41241 %U https://doi.org/10.2196/41241 %U http://www.ncbi.nlm.nih.gov/pubmed/36169999 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 9 %P e36118 %T Detection of Depression Severity Using Bengali Social Media Posts on Mental Health: Study Using Natural Language Processing Techniques %A Kabir,Muhammad Khubayeeb %A Islam,Maisha %A Kabir,Anika Nahian Binte %A Haque,Adiba %A Rhaman,Md Khalilur %+ Department of Computer Science, Brac University, 66 Mohakhali, Dhaka, 1212, Bangladesh, 880 1708812609, muhammad.khubayeeb.kabir@g.bracu.ac.bd %K mental health forums %K natural language processing %K severity %K major depressive disorder %K deep learning %K machine learning %K multiclass text classification %D 2022 %7 28.9.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: There are a myriad of language cues that indicate depression in written texts, and natural language processing (NLP) researchers have proven the ability of machine learning and deep learning approaches to detect these cues. However, to date, these approaches bridging NLP and the domain of mental health for Bengali literature are not comprehensive. The Bengali-speaking population can express emotions in their native language in greater detail. Objective: Our goal is to detect the severity of depression using Bengali texts by generating a novel Bengali corpus of depressive posts. We collaborated with mental health experts to generate a clinically sound labeling scheme and an annotated corpus to train machine learning and deep learning models. Methods: We conducted a study using Bengali text-based data from blogs and open source platforms. We constructed a procedure for annotated corpus generation and extraction of textual information from Bengali literature for predictive analysis. We developed our own structured data set and designed a clinically sound labeling scheme with the help of mental health professionals, adhering to the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) during the process. We used 5 machine learning models for detecting the severity of depression: kernel support vector machine (SVM), random forest, logistic regression K-nearest neighbor (KNN), and complement naive Bayes (NB). For the deep learning approach, we used long short-term memory (LSTM) units and gated recurrent units (GRUs) coupled with convolutional blocks or self-attention layers. Finally, we aimed for enhanced outcomes by using state-of-the-art pretrained language models. Results: The independent recurrent neural network (RNN) models yielded the highest accuracies and weighted F1 scores. GRUs, in particular, produced 81% accuracy. The hybrid architectures could not surpass the RNNs in terms of performance. Kernel SVM with term frequency–inverse document frequency (TF-IDF) embeddings generated 78% accuracy on test data. We used validation and training loss curves to observe and report the performance of our architectures. Overall, the number of available data remained the limitation of our experiment. Conclusions: The findings from our experimental setup indicate that machine learning and deep learning models are fairly capable of assessing the severity of mental health issues from texts. For the future, we suggest more research endeavors to increase the volume of Bengali text data, in particular, so that modern architectures reach improved generalization capability. %M 36169989 %R 10.2196/36118 %U https://formative.jmir.org/2022/9/e36118 %U https://doi.org/10.2196/36118 %U http://www.ncbi.nlm.nih.gov/pubmed/36169989 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 9 %P e40317 %T The Impact of a Digital Artificial Intelligence System on the Monitoring and Self-management of Nonmotor Symptoms in People With Parkinson Disease: Proposal for a Phase 1 Implementation Study %A Meinert,Edward %A Milne-Ives,Madison %A Chaudhuri,K Ray %A Harding,Tracey %A Whipps,John %A Whipps,Susan %A Carroll,Camille %+ Centre for Health Technology, University of Plymouth, 6 Kirkby Place, Room 2, Plymouth, PL4 6DN, United Kingdom, 44 175 260 0600, edward.meinert@plymouth.ac.uk %K Parkinson disease %K self-management %K telemedicine %K artificial intelligence %D 2022 %7 26.9.2022 %9 Proposal %J JMIR Res Protoc %G English %X Background: Nonmotor symptoms of Parkinson disease are a major factor of disease burden but are often underreported in clinical appointments. A digital tool has been developed to support the monitoring and management of nonmotor symptoms. Objective: The aim of this study is to establish evidence of the impact of the system on patient confidence, knowledge, and skills for self-management of nonmotor symptoms, symptom burden, and quality of life of people with Parkinson and their care partners. It will also evaluate the usability, acceptability, and potential for adoption of the system for people with Parkinson, care partners, and health care professionals. Methods: A mixed methods implementation and feasibility study based on the nonadoption, abandonment, scale-up, spread, and sustainability framework will be conducted with 60 person with Parkinson–care partner dyads and their associated health care professionals. Participants will be recruited from outpatient clinics at the University Hospitals Plymouth NHS Trust Parkinson service. The primary outcome, patient activation, will be measured over the 12-month intervention period; secondary outcomes include the system’s impact on health and well-being outcomes, safety, usability, acceptability, engagement, and costs. Semistructured interviews with a subset of participants will gather a more in-depth understanding of user perspectives and experiences with the system. Repeated measures analysis of variance will analyze change over time and thematic analysis will be conducted on qualitative data. The study was peer reviewed by the Parkinson’s UK Non-Drug Approaches grant board and is pending ethical approval. Results: The study won funding in August 2021; data collection is expected to begin in December 2022. Conclusions: The study’s success criteria will be affirming evidence regarding the system’s feasibility, usability and acceptability, no serious safety risks identified, and an observed positive impact on patient activation. Results will be disseminated in academic peer-reviewed journals and in platforms and formats that are accessible to the general public, guided by patient and public collaborators. Trial Registration: ClinicalTrials.gov NCT05414071; https://clinicaltrials.gov/ct2/show/NCT05414071 International Registered Report Identifier (IRRID): PRR1-10.2196/40317 %M 36155396 %R 10.2196/40317 %U https://www.researchprotocols.org/2022/9/e40317 %U https://doi.org/10.2196/40317 %U http://www.ncbi.nlm.nih.gov/pubmed/36155396 %0 Journal Article %@ 2563-3570 %I JMIR Publications %V 3 %N 1 %P e37951 %T Treatment Discontinuation Prediction in Patients With Diabetes Using a Ranking Model: Machine Learning Model Development %A Kurasawa,Hisashi %A Waki,Kayo %A Chiba,Akihiro %A Seki,Tomohisa %A Hayashi,Katsuyoshi %A Fujino,Akinori %A Haga,Tsuneyuki %A Noguchi,Takashi %A Ohe,Kazuhiko %+ Department of Healthcare Information Management, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan, 81 3 5800 9077, kwaki-tky@m.u-tokyo.ac.jp %K machine learning %K machine-learned ranking model %K treatment discontinuation %K diabetes %K prediction %K electronic health record %K EHR %K big data %K ranking %K algorithm %D 2022 %7 23.9.2022 %9 Original Paper %J JMIR Bioinform Biotech %G English %X Background: Treatment discontinuation (TD) is one of the major prognostic issues in diabetes care, and several models have been proposed to predict a missed appointment that may lead to TD in patients with diabetes by using binary classification models for the early detection of TD and for providing intervention support for patients. However, as binary classification models output the probability of a missed appointment occurring within a predetermined period, they are limited in their ability to estimate the magnitude of TD risk in patients with inconsistent intervals between appointments, making it difficult to prioritize patients for whom intervention support should be provided. Objective: This study aimed to develop a machine-learned prediction model that can output a TD risk score defined by the length of time until TD and prioritize patients for intervention according to their TD risk. Methods: This model included patients with diagnostic codes indicative of diabetes at the University of Tokyo Hospital between September 3, 2012, and May 17, 2014. The model was internally validated with patients from the same hospital from May 18, 2014, to January 29, 2016. The data used in this study included 7551 patients who visited the hospital after January 1, 2004, and had diagnostic codes indicative of diabetes. In particular, data that were recorded in the electronic medical records between September 3, 2012, and January 29, 2016, were used. The main outcome was the TD of a patient, which was defined as missing a scheduled clinical appointment and having no hospital visits within 3 times the average number of days between the visits of the patient and within 60 days. The TD risk score was calculated by using the parameters derived from the machine-learned ranking model. The prediction capacity was evaluated by using test data with the C-index for the performance of ranking patients, area under the receiver operating characteristic curve, and area under the precision-recall curve for discrimination, in addition to a calibration plot. Results: The means (95% confidence limits) of the C-index, area under the receiver operating characteristic curve, and area under the precision-recall curve for the TD risk score were 0.749 (0.655, 0.823), 0.758 (0.649, 0.857), and 0.713 (0.554, 0.841), respectively. The observed and predicted probabilities were correlated with the calibration plots. Conclusions: A TD risk score was developed for patients with diabetes by combining a machine-learned method with electronic medical records. The score calculation can be integrated into medical records to identify patients at high risk of TD, which would be useful in supporting diabetes care and preventing TD. %R 10.2196/37951 %U https://bioinform.jmir.org/2022/1/e37951 %U https://doi.org/10.2196/37951 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 9 %P e40249 %T Medical Staff and Resident Preferences for Using Deep Learning in Eye Disease Screening: Discrete Choice Experiment %A Lin,Senlin %A Li,Liping %A Zou,Haidong %A Xu,Yi %A Lu,Lina %+ Shanghai Eye Disease Prevention and Treatment Center, Shanghai Eye Hospital, No. 1440, Hongqiao Road, Shanghai, 200336, China, 86 02162539696, lulina781019@qq.com %K discrete choice experiment %K preference %K artificial intelligence %K AI %K vision health %K screening %D 2022 %7 20.9.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Deep learning–assisted eye disease diagnosis technology is increasingly applied in eye disease screening. However, no research has suggested the prerequisites for health care service providers and residents willing to use it. Objective: The aim of this paper is to reveal the preferences of health care service providers and residents for using artificial intelligence (AI) in community-based eye disease screening, particularly their preference for accuracy. Methods: Discrete choice experiments for health care providers and residents were conducted in Shanghai, China. In total, 34 medical institutions with adequate AI-assisted screening experience participated. A total of 39 medical staff and 318 residents were asked to answer the questionnaire and make a trade-off among alternative screening strategies with different attributes, including missed diagnosis rate, overdiagnosis rate, screening result feedback efficiency, level of ophthalmologist involvement, organizational form, cost, and screening result feedback form. Conditional logit models with the stepwise selection method were used to estimate the preferences. Results: Medical staff preferred high accuracy: The specificity of deep learning models should be more than 90% (odds ratio [OR]=0.61 for 10% overdiagnosis; P<.001), which was much higher than the Food and Drug Administration standards. However, accuracy was not the residents’ preference. Rather, they preferred to have the doctors involved in the screening process. In addition, when compared with a fully manual diagnosis, AI technology was more favored by the medical staff (OR=2.08 for semiautomated AI model and OR=2.39 for fully automated AI model; P<.001), while the residents were in disfavor of the AI technology without doctors’ supervision (OR=0.24; P<.001). Conclusions: Deep learning model under doctors’ supervision is strongly recommended, and the specificity of the model should be more than 90%. In addition, digital transformation should help medical staff move away from heavy and repetitive work and spend more time on communicating with residents. %M 36125854 %R 10.2196/40249 %U https://www.jmir.org/2022/9/e40249 %U https://doi.org/10.2196/40249 %U http://www.ncbi.nlm.nih.gov/pubmed/36125854 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 1 %N 1 %P e42046 %T Introducing JMIR AI %A El Emam,Khaled %A Malin,Bradley %+ School of Epidemiology and Public Health, University of Ottawa, 401 Smyth Road, Ottawa, ON, K1H 8L1, Canada, 1 613 737 7600, kelemam@ehealthinformation.ca %K artificial intelligence %K AI %K machine learning %K methodology %D 2022 %7 20.9.2022 %9 Editorial %J JMIR AI %G English %X JMIR AI is a new journal with a focus on publishing applied artificial intelligence and machine learning research. This editorial provides an overview of the primary objectives, the focus areas of the journal, and the types of articles that are within scope. %M 38875542 %R 10.2196/42046 %U https://ai.jmir.org/2022/1/e42046 %U https://doi.org/10.2196/42046 %U http://www.ncbi.nlm.nih.gov/pubmed/38875542 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 5 %N 3 %P e39547 %T Automatically Identifying Twitter Users for Interventions to Support Dementia Family Caregivers: Annotated Data Set and Benchmark Classification Models %A Klein,Ari Z %A Magge,Arjun %A O'Connor,Karen %A Gonzalez-Hernandez,Graciela %+ Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Blockley Hall, 4th Fl., 423 Guardian Dr., Philadelphia, PA, 19104, United States, 1 310 423 3521, ariklein@pennmedicine.upenn.edu %K natural language processing %K social media %K data mining %K dementia %K Alzheimer disease %K caregivers %D 2022 %7 16.9.2022 %9 Short Paper %J JMIR Aging %G English %X Background: More than 6 million people in the United States have Alzheimer disease and related dementias, receiving help from more than 11 million family or other informal caregivers. A range of traditional interventions has been developed to support family caregivers; however, most of them have not been implemented in practice and remain largely inaccessible. While recent studies have shown that family caregivers of people with dementia use Twitter to discuss their experiences, methods have not been developed to enable the use of Twitter for interventions. Objective: The objective of this study is to develop an annotated data set and benchmark classification models for automatically identifying a cohort of Twitter users who have a family member with dementia. Methods: Between May 4 and May 20, 2021, we collected 10,733 tweets, posted by 8846 users, that mention a dementia-related keyword, a linguistic marker that potentially indicates a diagnosis, and a select familial relationship. Three annotators annotated 1 random tweet per user to distinguish those that indicate having a family member with dementia from those that do not. Interannotator agreement was 0.82 (Fleiss kappa). We used the annotated tweets to train and evaluate support vector machine and deep neural network classifiers. To assess the scalability of our approach, we then deployed automatic classification on unlabeled tweets that were continuously collected between May 4, 2021, and March 9, 2022. Results: A deep neural network classifier based on a BERT (bidirectional encoder representations from transformers) model pretrained on tweets achieved the highest F1-score of 0.962 (precision=0.946 and recall=0.979) for the class of tweets indicating that the user has a family member with dementia. The classifier detected 128,838 tweets that indicate having a family member with dementia, posted by 74,290 users between May 4, 2021, and March 9, 2022—that is, approximately 7500 users per month. Conclusions: Our annotated data set can be used to automatically identify Twitter users who have a family member with dementia, enabling the use of Twitter on a large scale to not only explore family caregivers’ experiences but also directly target interventions at these users. %M 36112408 %R 10.2196/39547 %U https://aging.jmir.org/2022/3/e39547 %U https://doi.org/10.2196/39547 %U http://www.ncbi.nlm.nih.gov/pubmed/36112408 %0 Journal Article %@ 2563-3570 %I JMIR Publications %V 3 %N 1 %P e37701 %T Diagnosis of a Single-Nucleotide Variant in Whole-Exome Sequencing Data for Patients With Inherited Diseases: Machine Learning Study Using Artificial Intelligence Variant Prioritization %A Huang,Yu-Shan %A Hsu,Ching %A Chune,Yu-Chang %A Liao,I-Cheng %A Wang,Hsin %A Lin,Yi-Lin %A Hwu,Wuh-Liang %A Lee,Ni-Chung %A Lai,Feipei %+ Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Number 1, Roosevelt Road, Section 4, Taipei City, 106319, Taiwan, 886 2 33664924, flai@ntu.edu.tw %K next-generation sequencing %K genetic variation analysis %K machine learning %K artificial intelligence %K whole-exome sequencing %D 2022 %7 15.9.2022 %9 Original Paper %J JMIR Bioinform Biotech %G English %X Background: In recent years, thanks to the rapid development of next-generation sequencing (NGS) technology, an entire human genome can be sequenced in a short period. As a result, NGS technology is now being widely introduced into clinical diagnosis practice, especially for diagnosis of hereditary disorders. Although the exome data of single-nucleotide variant (SNV) can be generated using these approaches, processing the DNA sequence data of a patient requires multiple tools and complex bioinformatics pipelines. Objective: This study aims to assist physicians to automatically interpret the genetic variation information generated by NGS in a short period. To determine the true causal variants of a patient with genetic disease, currently, physicians often need to view numerous features on every variant manually and search for literature in different databases to understand the effect of genetic variation. Methods: We constructed a machine learning model for predicting disease-causing variants in exome data. We collected sequencing data from whole-exome sequencing (WES) and gene panel as training set, and then integrated variant annotations from multiple genetic databases for model training. The model built ranked SNVs and output the most possible disease-causing candidates. For model testing, we collected WES data from 108 patients with rare genetic disorders in National Taiwan University Hospital. We applied sequencing data and phenotypic information automatically extracted by a keyword extraction tool from patient’s electronic medical records into our machine learning model. Results: We succeeded in locating 92.5% (124/134) of the causative variant in the top 10 ranking list among an average of 741 candidate variants per person after filtering. AI Variant Prioritizer was able to assign the target gene to the top rank for around 61.1% (66/108) of the patients, followed by Variant Prioritizer, which assigned it for 44.4% (48/108) of the patients. The cumulative rank result revealed that our AI Variant Prioritizer has the highest accuracy at ranks 1, 5, 10, and 20. It also shows that AI Variant Prioritizer presents better performance than other tools. After adopting the Human Phenotype Ontology (HPO) terms by looking up the databases, the top 10 ranking list can be increased to 93.5% (101/108). Conclusions: We successfully applied sequencing data from WES and free-text phenotypic information of patient’s disease automatically extracted by the keyword extraction tool for model training and testing. By interpreting our model, we identified which features of variants are important. Besides, we achieved a satisfactory result on finding the target variant in our testing data set. After adopting the HPO terms by looking up the databases, the top 10 ranking list can be increased to 93.5% (101/108). The performance of the model is similar to that of manual analysis, and it has been used to help National Taiwan University Hospital with a genetic diagnosis. %R 10.2196/37701 %U https://bioinform.jmir.org/2022/1/e37701 %U https://doi.org/10.2196/37701 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 9 %P e35675 %T Dynamic Digital Twin: Diagnosis, Treatment, Prediction, and Prevention of Disease During the Life Course %A Mulder,Skander Tahar %A Omidvari,Amir-Houshang %A Rueten-Budde,Anja J %A Huang,Pei-Hua %A Kim,Ki-Hun %A Bais,Babette %A Rousian,Melek %A Hai,Rihan %A Akgun,Can %A van Lennep,Jeanine Roeters %A Willemsen,Sten %A Rijnbeek,Peter R %A Tax,David MJ %A Reinders,Marcel %A Boersma,Eric %A Rizopoulos,Dimitris %A Visch,Valentijn %A Steegers-Theunissen,Régine %+ Obstetrics and Gynaecology, Erasmus Medical Center, Dr. Molewaterplein 40, Rotterdam, 3015GD, Netherlands, 31 10 7038256, r.steegers@erasmusmc.nl %K digital health %K digital twin %K machine learning %K artifical intelligence %K obstetrics %K cardiovascular %K disease %K health %D 2022 %7 14.9.2022 %9 Viewpoint %J J Med Internet Res %G English %X A digital twin (DT), originally defined as a virtual representation of a physical asset, system, or process, is a new concept in health care. A DT in health care is not a single technology but a domain-adapted multimodal modeling approach incorporating the acquisition, management, analysis, prediction, and interpretation of data, aiming to improve medical decision-making. However, there are many challenges and barriers that must be overcome before a DT can be used in health care. In this viewpoint paper, we build on the current literature, address these challenges, and describe a dynamic DT in health care for optimizing individual patient health care journeys, specifically for women at risk for cardiovascular complications in the preconception and pregnancy periods and across the life course. We describe how we can commit multiple domains to developing this DT. With our cross-domain definition of the DT, we aim to define future goals, trade-offs, and methods that will guide the development of the dynamic DT and implementation strategies in health care. %M 36103220 %R 10.2196/35675 %U https://www.jmir.org/2022/9/e35675 %U https://doi.org/10.2196/35675 %U http://www.ncbi.nlm.nih.gov/pubmed/36103220 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 9 %P e36130 %T Predicting Depression in Patients With Knee Osteoarthritis Using Machine Learning: Model Development and Validation Study %A Nowinka,Zuzanna %A Alagha,M Abdulhadi %A Mahmoud,Khadija %A Jones,Gareth G %+ MSk Lab, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, South Kensington Campus, London, SW7 2AZ, United Kingdom, 44 020 7589 5111, h.alagha@imperial.ac.uk %K knee osteoarthritis %K depression %K machine learning %K predictive modeling %D 2022 %7 13.9.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Knee osteoarthritis (OA) is the most common form of OA and a leading cause of disability worldwide. Chronic pain and functional loss secondary to knee OA put patients at risk of developing depression, which can also impair their treatment response. However, no tools exist to assist clinicians in identifying patients at risk. Machine learning (ML) predictive models may offer a solution. We investigated whether ML models could predict the development of depression in patients with knee OA and examined which features are the most predictive. Objective: The primary aim of this study was to develop and test an ML model to predict depression in patients with knee OA at 2 years and to validate the models using an external data set. The secondary aim was to identify the most important predictive features used by the ML algorithms. Methods: Osteoarthritis Initiative Study (OAI) data were used for model development and external validation was performed using Multicenter Osteoarthritis Study (MOST) data. Forty-two features were selected, which denoted routinely collected demographic and clinical data such as patient demographics, past medical history, knee OA history, baseline examination findings, and patient-reported outcome measures. Six different ML classification models were trained (logistic regression, least absolute shrinkage and selection operator [LASSO], ridge regression, decision tree, random forest, and gradient boosting machine). The primary outcome was to predict depression at 2 years following study enrollment. The presence of depression was defined using the Center for Epidemiological Studies Depression Scale. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC) and F1 score. The most important features were extracted from the best-performing model on external validation. Results: A total of 5947 patients were included in this study, with 2969 in the training set, 742 in the test set, and 2236 in the external validation set. For the test set, the AUC ranged from 0.673 (95% CI 0.604-0.742) to 0.869 (95% CI 0.824-0.913), with an F1 score of 0.435 to 0.490. On external validation, the AUC varied from 0.720 (95% CI 0.685-0.755) to 0.876 (95% CI 0.853-0.899), with an F1 score of 0.456 to 0.563. LASSO modeling offered the highest predictive performance. Blood pressure, baseline depression score, knee pain and stiffness, and quality of life were the most predictive features. Conclusions: To our knowledge, this is the first study to apply ML classification models to predict depression in patients with knee OA. Our study showed that ML models can deliver a clinically acceptable level of performance (AUC>0.7) in predicting the development of depression using routinely available demographic and clinical data. Further work is required to address the class imbalance in the training data and to evaluate the clinical utility of the models in facilitating early intervention and improved outcomes. %M 36099008 %R 10.2196/36130 %U https://formative.jmir.org/2022/9/e36130 %U https://doi.org/10.2196/36130 %U http://www.ncbi.nlm.nih.gov/pubmed/36099008 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 5 %N 3 %P e35150 %T Assessing the Generalizability of Deep Learning Models Trained on Standardized and Nonstandardized Images and Their Performance Against Teledermatologists: Retrospective Comparative Study %A Oloruntoba,Ayooluwatomiwa I %A Vestergaard,Tine %A Nguyen,Toan D %A Yu,Zhen %A Sashindranath,Maithili %A Betz-Stablein,Brigid %A Soyer,H Peter %A Ge,Zongyuan %A Mar,Victoria %+ School of Public Health and Preventive Medicine, Monash University, 553 St Kilda Road, Melbourne, Victoria, 3004, Australia, 1 0403040994, Victoria.Mar@monash.edu %K artificial intelligence %K AI %K convolutional neural network %K CNN %K teledermatology %K standardized Image %K nonstandardized image %K machine learning %K skin cancer %K cancer %D 2022 %7 12.9.2022 %9 Original Paper %J JMIR Dermatol %G English %X Background: Convolutional neural networks (CNNs) are a type of artificial intelligence that shows promise as a diagnostic aid for skin cancer. However, the majority are trained using retrospective image data sets with varying image capture standardization. Objective: The aim of our study was to use CNN models with the same architecture—trained on image sets acquired with either the same image capture device and technique (standardized) or with varied devices and capture techniques (nonstandardized)—and test variability in performance when classifying skin cancer images in different populations. Methods: In all, 3 CNNs with the same architecture were trained. CNN nonstandardized (CNN-NS) was trained on 25,331 images taken from the International Skin Imaging Collaboration (ISIC) using different image capture devices. CNN standardized (CNN-S) was trained on 177,475 MoleMap images taken with the same capture device, and CNN standardized number 2 (CNN-S2) was trained on a subset of 25,331 standardized MoleMap images (matched for number and classes of training images to CNN-NS). These 3 models were then tested on 3 external test sets: 569 Danish images, the publicly available ISIC 2020 data set consisting of 33,126 images, and The University of Queensland (UQ) data set of 422 images. Primary outcome measures were sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC). Teledermatology assessments available for the Danish data set were used to determine model performance compared to teledermatologists. Results: When tested on the 569 Danish images, CNN-S achieved an AUROC of 0.861 (95% CI 0.830-0.889) and CNN-S2 achieved an AUROC of 0.831 (95% CI 0.798-0.861; standardized models), with both outperforming CNN-NS (nonstandardized model; P=.001 and P=.009, respectively), which achieved an AUROC of 0.759 (95% CI 0.722-0.794). When tested on 2 additional data sets (ISIC 2020 and UQ), CNN-S (P<.001 and P<.001, respectively) and CNN-S2 (P=.08 and P=.35, respectively) still outperformed CNN-NS. When the CNNs were matched to the mean sensitivity and specificity of the teledermatologists on the Danish data set, the models’ resultant sensitivities and specificities were surpassed by the teledermatologists. However, when compared to CNN-S, the differences were not statistically significant (sensitivity: P=.10; specificity: P=.053). Performance across all CNN models as well as teledermatologists was influenced by image quality. Conclusions: CNNs trained on standardized images had improved performance and, therefore, greater generalizability in skin cancer classification when applied to unseen data sets. This finding is an important consideration for future algorithm development, regulation, and approval. %M 39475778 %R 10.2196/35150 %U https://derma.jmir.org/2022/3/e35150 %U https://doi.org/10.2196/35150 %U http://www.ncbi.nlm.nih.gov/pubmed/39475778 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 9 %P e40387 %T Real-world Implementation of an eHealth System Based on Artificial Intelligence Designed to Predict and Reduce Emergency Department Visits by Older Adults: Pragmatic Trial %A Belmin,Joël %A Villani,Patrick %A Gay,Mathias %A Fabries,Stéphane %A Havreng-Théry,Charlotte %A Malvoisin,Stéphanie %A Denis,Fabrice %A Veyron,Jacques-Henri %+ Presage, 72 boulevard de Sébastopol, Paris, 75003, France, 33 149964242, jhveyron@presage.care %K emergency department visits %K home care aides %K community-dwelling older adults %K smartphone %K mobile phone %K predictive tool %K health intervention %K machine learning %K predict %K risk %K algorithm %K model %K user experience %K alert %K monitoring %D 2022 %7 8.9.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Frail older people use emergency services extensively, and digital systems that monitor health remotely could be useful in reducing these visits by earlier detection of worsening health conditions. Objective: We aimed to implement a system that produces alerts when the machine learning algorithm identifies a short-term risk for an emergency department (ED) visit and examine health interventions delivered after these alerts and users’ experience. This study highlights the feasibility of the general system and its performance in reducing ED visits. It also evaluates the accuracy of alerts’ prediction. Methods: An uncontrolled multicenter trial was conducted in community-dwelling older adults receiving assistance from home aides (HAs). We implemented an eHealth system that produces an alert for a high risk of ED visits. After each home visit, the HAs completed a questionnaire on participants’ functional status, using a smartphone app, and the information was processed in real time by a previously developed machine learning algorithm that identifies patients at risk of an ED visit within 14 days. In case of risk, the eHealth system alerted a coordinating nurse who could then inform the family carer and the patient’s nurses or general practitioner. The primary outcomes were the rate of ED visits and the number of deaths after alert-triggered health interventions (ATHIs) and users’ experience with the eHealth system; the secondary outcome was the accuracy of the eHealth system in predicting ED visits. Results: We included 206 patients (mean age 85, SD 8 years; 161/206, 78% women) who received aid from 109 HAs, and the mean follow-up period was 10 months. The HAs monitored 2656 visits, which resulted in 405 alerts. Two ED visits were recorded following 131 alerts with an ATHI (2/131, 1.5%), whereas 36 ED visits were recorded following 274 alerts that did not result in an ATHI (36/274, 13.4%), corresponding to an odds ratio of 0.10 (95% IC 0.02-0.43; P<.001). Five patients died during the study. All had alerts, 4 did not have an ATHI and were hospitalized, and 1 had an ATHI (P=.04). In terms of overall usability, the digital system was easy to use for 90% (98/109) of HAs, and response time was acceptable for 89% (98/109) of them. Conclusions: The eHealth system has been successfully implemented, was appreciated by users, and produced relevant alerts. ATHIs were associated with a lower rate of ED visits, suggesting that the eHealth system might be effective in lowering the number of ED visits in this population. Trial Registration: clinicaltrials.gov NCT05221697; https://clinicaltrials.gov/ct2/show/NCT05221697. %M 35921685 %R 10.2196/40387 %U https://www.jmir.org/2022/9/e40387 %U https://doi.org/10.2196/40387 %U http://www.ncbi.nlm.nih.gov/pubmed/35921685 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 9 %P e36986 %T Use of Social Media Data to Diagnose and Monitor Psychotic Disorders: Systematic Review %A Lejeune,Alban %A Robaglia,Benoit-Marie %A Walter,Michel %A Berrouiguet,Sofian %A Lemey,Christophe %+ Unité de Recherche Clinique Intersectorielle, Hôpital de Bohars, Centre Hospitalier Régional Universitaire de Brest, Route de Ploudalmézeau, Bohars, 29820, France, 33 6389910008, alban.lejeune@gmail.com %K schizophrenia %K psychotic disorders %K psychiatric disorders %K artificial intelligence %K AI %K machine learning %K neural network %K social media %D 2022 %7 6.9.2022 %9 Review %J J Med Internet Res %G English %X Background: Schizophrenia is a disease associated with high burden, and improvement in care is necessary. Artificial intelligence (AI) has been used to diagnose several medical conditions as well as psychiatric disorders. However, this technology requires large amounts of data to be efficient. Social media data could be used to improve diagnostic capabilities. Objective: The objective of our study is to analyze the current capabilities of AI to use social media data as a diagnostic tool for psychotic disorders. Methods: A systematic review of the literature was conducted using several databases (PubMed, Embase, Cochrane, PsycInfo, and IEEE Xplore) using relevant keywords to search for articles published as of November 12, 2021. We used the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) criteria to identify, select, and critically assess the quality of the relevant studies while minimizing bias. We critically analyzed the methodology of the studies to detect any bias and presented the results. Results: Among the 93 studies identified, 7 studies were included for analyses. The included studies presented encouraging results. Social media data could be used in several ways to care for patients with schizophrenia, including the monitoring of patients after the first episode of psychosis. We identified several limitations in the included studies, mainly lack of access to clinical diagnostic data, small sample size, and heterogeneity in study quality. We recommend using state-of-the-art natural language processing neural networks, called language models, to model social media activity. Combined with the synthetic minority oversampling technique, language models can tackle the imbalanced data set limitation, which is a necessary constraint to train unbiased classifiers. Furthermore, language models can be easily adapted to the classification task with a procedure called “fine-tuning.” Conclusions: The use of social media data for the diagnosis of psychotic disorders is promising. However, most of the included studies had significant biases; we therefore could not draw conclusions about accuracy in clinical situations. Future studies need to use more accurate methodologies to obtain unbiased results. %M 36066938 %R 10.2196/36986 %U https://www.jmir.org/2022/9/e36986 %U https://doi.org/10.2196/36986 %U http://www.ncbi.nlm.nih.gov/pubmed/36066938 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 9 %N 9 %P e39556 %T The Use of Automated Machine Translation to Translate Figurative Language in a Clinical Setting: Analysis of a Convenience Sample of Patients Drawn From a Randomized Controlled Trial %A Tougas,Hailee %A Chan,Steven %A Shahrvini,Tara %A Gonzalez,Alvaro %A Chun Reyes,Ruth %A Burke Parish,Michelle %A Yellowlees,Peter %+ Department of Psychiatry and Behavioral Sciences, University of California, Davis, 2230 Stockton Blvd, Sacramento, CA, 95817, United States, 1 916 734 3574, htougs@gmail.com %K telepsychiatry %K automated machine translation %K language barriers %K psychiatry %K assessment %K automated translation %K automated %K translation %K artificial intelligence %K AI %K speech recognition %K limited English proficiency %K LEP %K asynchronous telepsychiatry %K ATP %K automated speech recognition %K ASR %K AMT %K figurative language device %K FLD %K language concordant %K language discordant %K AI interpretation %D 2022 %7 6.9.2022 %9 Original Paper %J JMIR Ment Health %G English %X Background: Patients with limited English proficiency frequently receive substandard health care. Asynchronous telepsychiatry (ATP) has been established as a clinically valid method for psychiatric assessments. The addition of automated speech recognition (ASR) and automated machine translation (AMT) technologies to asynchronous telepsychiatry may be a viable artificial intelligence (AI)–language interpretation option. Objective: This project measures the frequency and accuracy of the translation of figurative language devices (FLDs) and patient word count per minute, in a subset of psychiatric interviews from a larger trial, as an approximation to patient speech complexity and quantity in clinical encounters that require interpretation. Methods: A total of 6 patients were selected from the original trial, where they had undergone 2 assessments, once by an English-speaking psychiatrist through a Spanish-speaking human interpreter and once in Spanish by a trained mental health interviewer-researcher with AI interpretation. 3 (50%) of the 6 selected patients were interviewed via videoconferencing because of the COVID-19 pandemic. Interview transcripts were created by automated speech recognition with manual corrections for transcriptional accuracy and assessment for translational accuracy of FLDs. Results: AI-interpreted interviews were found to have a significant increase in the use of FLDs and patient word count per minute. Both human and AI-interpreted FLDs were frequently translated inaccurately, however FLD translation may be more accurate on videoconferencing. Conclusions: AI interpretation is currently not sufficiently accurate for use in clinical settings. However, this study suggests that alternatives to human interpretation are needed to circumvent modifications to patients’ speech. While AI interpretation technologies are being further developed, using videoconferencing for human interpreting may be more accurate than in-person interpreting. Trial Registration: ClinicalTrials.gov NCT03538860; https://clinicaltrials.gov/ct2/show/NCT03538860 %M 36066959 %R 10.2196/39556 %U https://mental.jmir.org/2022/9/e39556 %U https://doi.org/10.2196/39556 %U http://www.ncbi.nlm.nih.gov/pubmed/36066959 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 9 %P e37374 %T Developing Clinical Artificial Intelligence for Obstetric Ultrasound to Improve Access in Underserved Regions: Protocol for a Computer-Assisted Low-Cost Point-of-Care UltraSound (CALOPUS) Study %A Self,Alice %A Chen,Qingchao %A Desiraju,Bapu Koundinya %A Dhariwal,Sumeet %A Gleed,Alexander D %A Mishra,Divyanshu %A Thiruvengadam,Ramachandran %A Chandramohan,Varun %A Craik,Rachel %A Wilden,Elizabeth %A Khurana,Ashok %A , %A Bhatnagar,Shinjini %A Papageorghiou,Aris T %A Noble,J Alison %+ Nuffield Department of Women's and Reproductive Health, University of Oxford, Level 3, Women's Centre, John Radcliffe Hospital, Headley Way, Oxford, OX3 9DU, United Kingdom, 44 1865221004, aris.papageorghiou@wrh.ox.ac.uk %K ultrasound %K obstetrics %K artificial intelligence %K machine learning %K data annotation %D 2022 %7 1.9.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: The World Health Organization recommends a package of pregnancy care that includes obstetric ultrasound scans. There are significant barriers to universal access to antenatal ultrasound, particularly because of the cost and need for maintenance of ultrasound equipment and a lack of trained personnel. As low-cost, handheld ultrasound devices have become widely available, the current roadblock is the global shortage of health care providers trained in obstetric scanning. Objective: The aim of this study is to improve pregnancy and risk assessment for women in underserved regions. Therefore, we are undertaking the Computer-Assisted Low-Cost Point-of-Care UltraSound (CALOPUS) project, bringing together experts in machine learning and clinical obstetric ultrasound. Methods: In this prospective study conducted in two clinical centers (United Kingdom and India), participating pregnant women were scanned and full-length ultrasounds were performed. Each woman underwent 2 consecutive ultrasound scans. The first was a series of simple, standardized ultrasound sweeps (the CALOPUS protocol), immediately followed by a routine, full clinical ultrasound examination that served as the comparator. We describe the development of a simple-to-use clinical protocol designed for nonexpert users to assess fetal viability, detect the presence of multiple pregnancies, evaluate placental location, assess amniotic fluid volume, determine fetal presentation, and perform basic fetal biometry. The CALOPUS protocol was designed using the smallest number of steps to minimize redundant information, while maximizing diagnostic information. Here, we describe how ultrasound videos and annotations are captured for machine learning. Results: Over 5571 scans have been acquired, from which 1,541,751 label annotations have been performed. An adapted protocol, including a low pelvic brim sweep and a well-filled maternal bladder, improved visualization of the cervix from 28% to 91% and classification of placental location from 82% to 94%. Excellent levels of intra- and interannotator agreement are achievable following training and standardization. Conclusions: The CALOPUS study is a unique study that uses obstetric ultrasound videos and annotations from pregnancies dated from 11 weeks and followed up until birth using novel ultrasound and annotation protocols. The data from this study are being used to develop and test several different machine learning algorithms to address key clinical diagnostic questions pertaining to obstetric risk management. We also highlight some of the challenges and potential solutions to interdisciplinary multinational imaging collaboration. International Registered Report Identifier (IRRID): RR1-10.2196/37374 %M 36048518 %R 10.2196/37374 %U https://www.researchprotocols.org/2022/9/e37374 %U https://doi.org/10.2196/37374 %U http://www.ncbi.nlm.nih.gov/pubmed/36048518 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 8 %P e37531 %T Using Artificial Intelligence as a Diagnostic Decision Support Tool in Skin Disease: Protocol for an Observational Prospective Cohort Study %A Escalé-Besa,Anna %A Fuster-Casanovas,Aïna %A Börve,Alexander %A Yélamos,Oriol %A Fustà-Novell,Xavier %A Esquius Rafat,Mireia %A Marin-Gomez,Francesc X %A Vidal-Alaball,Josep %+ Unitat de Suport a la Recerca de la Catalunya Central, Fundació Institut Universitari per a la Recerca a l'Atenció Primària de Salut Jordi Gol i Gurina, C/Pica d'Estats 13-15, Sant Fruitós de Bages, 08272, Spain, 34 93 693 0040, afuster.cc.ics@gencat.cat %K machine learning %K artificial intelligence %K data accuracy %K computer-assisted diagnosis %K neural network computer %K support tool %K skin disease %K cohort study %K dermatology %D 2022 %7 31.8.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: Dermatological conditions are a relevant health problem. Each person has an average of 1.6 skin diseases per year, and consultations for skin pathology represent 20% of the total annual visits to primary care and around 35% are referred to a dermatology specialist. Machine learning (ML) models can be a good tool to help primary care professionals, as it can analyze and optimize complex sets of data. In addition, ML models are increasingly being applied to dermatology as a diagnostic decision support tool using image analysis, especially for skin cancer detection and classification. Objective: This study aims to perform a prospective validation of an image analysis ML model as a diagnostic decision support tool for the diagnosis of dermatological conditions. Methods: In this prospective study, 100 consecutive patients who visit a participant general practitioner (GP) with a skin problem in central Catalonia were recruited. Data collection was planned to last 7 months. Anonymized pictures of skin diseases were taken and introduced to the ML model interface (capable of screening for 44 different skin diseases), which returned the top 5 diagnoses by probability. The same image was also sent as a teledermatology consultation following the current stablished workflow. The GP, ML model, and dermatologist’s assessments will be compared to calculate the precision, sensitivity, specificity, and accuracy of the ML model. The results will be represented globally and individually for each skin disease class using a confusion matrix and one-versus-all methodology. The time taken to make the diagnosis will also be taken into consideration. Results: Patient recruitment began in June 2021 and lasted for 5 months. Currently, all patients have been recruited and the images have been shown to the GPs and dermatologists. The analysis of the results has already started. Conclusions: This study will provide information about ML models’ effectiveness and limitations. External testing is essential for regulating these diagnostic systems to deploy ML models in a primary care practice setting. %M 36044249 %R 10.2196/37531 %U https://www.researchprotocols.org/2022/8/e37531 %U https://doi.org/10.2196/37531 %U http://www.ncbi.nlm.nih.gov/pubmed/36044249 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 8 %P e37578 %T Predicting Readmission Charges Billed by Hospitals: Machine Learning Approach %A Gopukumar,Deepika %A Ghoshal,Abhijeet %A Zhao,Huimin %+ Department of Health and Clinical Outcomes Research, School of Medicine, Saint Louis University, SALUS Center, 3545 Lafayette Ave., 4th floor, Room 409 B, St.Louis, MO, 63110, United States, 1 3149779300, deepika.gopukumar@health.slu.edu %K readmission charges %K readmission analytics %K predictive models %K machine learning %K readmissions %K predictive analytics %D 2022 %7 30.8.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: The Centers for Medicare and Medicaid Services projects that health care costs will continue to grow over the next few years. Rising readmission costs contribute significantly to increasing health care costs. Multiple areas of health care, including readmissions, have benefited from the application of various machine learning algorithms in several ways. Objective: We aimed to identify suitable models for predicting readmission charges billed by hospitals. Our literature review revealed that this application of machine learning is underexplored. We used various predictive methods, ranging from glass-box models (such as regularization techniques) to black-box models (such as deep learning–based models). Methods: We defined readmissions as readmission with the same major diagnostic category (RSDC) and all-cause readmission category (RADC). For these readmission categories, 576,701 and 1,091,580 individuals, respectively, were identified from the Nationwide Readmission Database of the Healthcare Cost and Utilization Project by the Agency for Healthcare Research and Quality for 2013. Linear regression, lasso regression, elastic net, ridge regression, eXtreme gradient boosting (XGBoost), and a deep learning model based on multilayer perceptron (MLP) were the 6 machine learning algorithms we tested for RSDC and RADC through 10-fold cross-validation. Results: Our preliminary analysis using a data-driven approach revealed that within RADC, the subsequent readmission charge billed per patient was higher than the previous charge for 541,090 individuals, and this number was 319,233 for RSDC. The top 3 major diagnostic categories (MDCs) for such instances were the same for RADC and RSDC. The average readmission charge billed was higher than the previous charge for 21 of the MDCs in the case of RSDC, whereas it was only for 13 of the MDCs in RADC. We recommend XGBoost and the deep learning model based on MLP for predicting readmission charges. The following performance metrics were obtained for XGBoost: (1) RADC (mean absolute percentage error [MAPE]=3.121%; root mean squared error [RMSE]=0.414; mean absolute error [MAE]=0.317; root relative squared error [RRSE]=0.410; relative absolute error [RAE]=0.399; normalized RMSE [NRMSE]=0.040; mean absolute deviation [MAD]=0.031) and (2) RSDC (MAPE=3.171%; RMSE=0.421; MAE=0.321; RRSE=0.407; RAE=0.393; NRMSE=0.041; MAD=0.031). The performance obtained for MLP-based deep neural networks are as follows: (1) RADC (MAPE=3.103%; RMSE=0.413; MAE=0.316; RRSE=0.410; RAE=0.397; NRMSE=0.040; MAD=0.031) and (2) RSDC (MAPE=3.202%; RMSE=0.427; MAE=0.326; RRSE=0.413; RAE=0.399; NRMSE=0.041; MAD=0.032). Repeated measures ANOVA revealed that the mean RMSE differed significantly across models with P<.001. Post hoc tests using the Bonferroni correction method indicated that the mean RMSE of the deep learning/XGBoost models was statistically significantly (P<.001) lower than that of all other models, namely linear regression/elastic net/lasso/ridge regression. Conclusions: Models built using XGBoost and MLP are suitable for predicting readmission charges billed by hospitals. The MDCs allow models to accurately predict hospital readmission charges. %M 35896038 %R 10.2196/37578 %U https://medinform.jmir.org/2022/8/e37578 %U https://doi.org/10.2196/37578 %U http://www.ncbi.nlm.nih.gov/pubmed/35896038 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 8 %P e35563 %T Analyzing Suicide Risk From Linguistic Features in Social Media: Evaluation Study %A Lao,Cecilia %A Lane,Jo %A Suominen,Hanna %+ School of Computing, College of Engineering and Computer Science, The Australian National University, 145 Science Road, Canberra, ACT, 2600, Australia, 61 416236920, cecilia.lao@anu.edu.au %K evaluation study %K interdisciplinary research %K linguistics %K machine learning %K mental health %K natural language processing %K social media %K suicide risk %D 2022 %7 30.8.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Effective suicide risk assessments and interventions are vital for suicide prevention. Although assessing such risks is best done by health care professionals, people experiencing suicidal ideation may not seek help. Hence, machine learning (ML) and computational linguistics can provide analytical tools for understanding and analyzing risks. This, therefore, facilitates suicide intervention and prevention. Objective: This study aims to explore, using statistical analyses and ML, whether computerized language analysis could be applied to assess and better understand a person’s suicide risk on social media. Methods: We used the University of Maryland Suicidality Dataset comprising text posts written by users (N=866) of mental health–related forums on Reddit. Each user was classified with a suicide risk rating (no, low, moderate, or severe) by either medical experts or crowdsourced annotators, denoting their estimated likelihood of dying by suicide. In language analysis, the Linguistic Inquiry and Word Count lexicon assessed sentiment, thinking styles, and part of speech, whereas readability was explored using the TextStat library. The Mann-Whitney U test identified differences between at-risk (low, moderate, and severe risk) and no-risk users. Meanwhile, the Kruskal-Wallis test and Spearman correlation coefficient were used for granular analysis between risk levels and to identify redundancy, respectively. In the ML experiments, gradient boost, random forest, and support vector machine models were trained using 10-fold cross validation. The area under the receiver operator curve and F1-score were the primary measures. Finally, permutation importance uncovered the features that contributed the most to each model’s decision-making. Results: Statistically significant differences (P<.05) were identified between the at-risk (671/866, 77.5%) and no-risk groups (195/866, 22.5%). This was true for both the crowd- and expert-annotated samples. Overall, at-risk users had higher median values for most variables (authenticity, first-person pronouns, and negation), with a notable exception of clout, which indicated that at-risk users were less likely to engage in social posturing. A high positive correlation (ρ>0.84) was present between the part of speech variables, which implied redundancy and demonstrated the utility of aggregate features. All ML models performed similarly in their area under the curve (0.66-0.68); however, the random forest and gradient boost models were noticeably better in their F1-score (0.65 and 0.62) than the support vector machine (0.52). The features that contributed the most to the ML models were authenticity, clout, and negative emotions. Conclusions: In summary, our statistical analyses found linguistic features associated with suicide risk, such as social posturing (eg, authenticity and clout), first-person singular pronouns, and negation. This increased our understanding of the behavioral and thought patterns of social media users and provided insights into the mechanisms behind ML models. We also demonstrated the applicative potential of ML in assisting health care professionals to assess and manage individuals experiencing suicide risk. %M 36040781 %R 10.2196/35563 %U https://formative.jmir.org/2022/8/e35563 %U https://doi.org/10.2196/35563 %U http://www.ncbi.nlm.nih.gov/pubmed/36040781 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 8 %P e36823 %T Guidelines for Artificial Intelligence in Medicine: Literature Review and Content Analysis of Frameworks %A Crossnohere,Norah L %A Elsaid,Mohamed %A Paskett,Jonathan %A Bose-Brill,Seuli %A Bridges,John F P %+ Department of Biomedical Informatics, The Ohio State University College of Medicine, 1800 Cannon Drive, Columbus, OH, 43210, United States, 1 3476286314, norah.crossnohere@osumc.edu %K artificial intelligence %K translational science %K translational research %K ethics %K engagement %K reproducibility %K transparency %K effectiveness %K medicine %K health care %K AI %D 2022 %7 25.8.2022 %9 Review %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) is rapidly expanding in medicine despite a lack of consensus on its application and evaluation. Objective: We sought to identify current frameworks guiding the application and evaluation of AI for predictive analytics in medicine and to describe the content of these frameworks. We also assessed what stages along the AI translational spectrum (ie, AI development, reporting, evaluation, implementation, and surveillance) the content of each framework has been discussed. Methods: We performed a literature review of frameworks regarding the oversight of AI in medicine. The search included key topics such as “artificial intelligence,” “machine learning,” “guidance as topic,” and “translational science,” and spanned the time period 2014-2022. Documents were included if they provided generalizable guidance regarding the use or evaluation of AI in medicine. Included frameworks are summarized descriptively and were subjected to content analysis. A novel evaluation matrix was developed and applied to appraise the frameworks’ coverage of content areas across translational stages. Results: Fourteen frameworks are featured in the review, including six frameworks that provide descriptive guidance and eight that provide reporting checklists for medical applications of AI. Content analysis revealed five considerations related to the oversight of AI in medicine across frameworks: transparency, reproducibility, ethics, effectiveness, and engagement. All frameworks include discussions regarding transparency, reproducibility, ethics, and effectiveness, while only half of the frameworks discuss engagement. The evaluation matrix revealed that frameworks were most likely to report AI considerations for the translational stage of development and were least likely to report considerations for the translational stage of surveillance. Conclusions: Existing frameworks for the application and evaluation of AI in medicine notably offer less input on the role of engagement in oversight and regarding the translational stage of surveillance. Identifying and optimizing strategies for engagement are essential to ensure that AI can meaningfully benefit patients and other end users. %M 36006692 %R 10.2196/36823 %U https://www.jmir.org/2022/8/e36823 %U https://doi.org/10.2196/36823 %U http://www.ncbi.nlm.nih.gov/pubmed/36006692 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 8 %P e37188 %T Randomized Controlled Trials of Artificial Intelligence in Clinical Practice: Systematic Review %A Lam,Thomas Y T %A Cheung,Max F K %A Munro,Yasmin L %A Lim,Kong Meng %A Shung,Dennis %A Sung,Joseph J Y %+ Lee Kong Chian School of Medicine, Nanyang Technological University, Lee Kong Chian School of Medicine, Nanyang Technological University, 11 Mandalay Road, Singapore, 308232, Singapore, 65 65138886, josephsung@ntu.edu.sg %K artificial intelligence %K randomized controlled trial %K systematic review %K clinical %K gastroenterology %K clinical informatics %K mobile phone %D 2022 %7 25.8.2022 %9 Review %J J Med Internet Res %G English %X Background: The number of artificial intelligence (AI) studies in medicine has exponentially increased recently. However, there is no clear quantification of the clinical benefits of implementing AI-assisted tools in patient care. Objective: This study aims to systematically review all published randomized controlled trials (RCTs) of AI-assisted tools to characterize their performance in clinical practice. Methods: CINAHL, Cochrane Central, Embase, MEDLINE, and PubMed were searched to identify relevant RCTs published up to July 2021 and comparing the performance of AI-assisted tools with conventional clinical management without AI assistance. We evaluated the primary end points of each study to determine their clinical relevance. This systematic review was conducted following the updated PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 guidelines. Results: Among the 11,839 articles retrieved, only 39 (0.33%) RCTs were included. These RCTs were conducted in an approximately equal distribution from North America, Europe, and Asia. AI-assisted tools were implemented in 13 different clinical specialties. Most RCTs were published in the field of gastroenterology, with 15 studies on AI-assisted endoscopy. Most RCTs studied biosignal-based AI-assisted tools, and a minority of RCTs studied AI-assisted tools drawn from clinical data. In 77% (30/39) of the RCTs, AI-assisted interventions outperformed usual clinical care, and clinically relevant outcomes improved with AI-assisted intervention in 70% (21/30) of the studies. Small sample size and single-center design limited the generalizability of these studies. Conclusions: There is growing evidence supporting the implementation of AI-assisted tools in daily clinical practice; however, the number of available RCTs is limited and heterogeneous. More RCTs of AI-assisted tools integrated into clinical practice are needed to advance the role of AI in medicine. Trial Registration: PROSPERO CRD42021286539; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=286539 %M 35904087 %R 10.2196/37188 %U https://www.jmir.org/2022/8/e37188 %U https://doi.org/10.2196/37188 %U http://www.ncbi.nlm.nih.gov/pubmed/35904087 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 8 %P e37611 %T The Adoption of Artificial Intelligence in Health Care and Social Services in Australia: Findings From a Methodologically Innovative National Survey of Values and Attitudes (the AVA-AI Study) %A Isbanner,Sebastian %A O’Shaughnessy,Pauline %A Steel,David %A Wilcock,Scarlet %A Carter,Stacy %+ Australian Centre for Health Engagement Evidence and Values, Faculty of the Arts, Social Sciences and Humanities, University of Wollongong, Northfields Ave, Wollongong, 2522, Australia, 61 2 4221 3243, stacyc@uow.edu.au %K artificial intelligence %K surveys and questionnaires %K consumer health informatics %K social welfare %K bioethics %K social values %D 2022 %7 22.8.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) for use in health care and social services is rapidly developing, but this has significant ethical, legal, and social implications. Theoretical and conceptual research in AI ethics needs to be complemented with empirical research to understand the values and judgments of members of the public, who will be the ultimate recipients of AI-enabled services. Objective: The aim of the Australian Values and Attitudes on AI (AVA-AI) study was to assess and compare Australians’ general and particular judgments regarding the use of AI, compare Australians’ judgments regarding different health care and social service applications of AI, and determine the attributes of health care and social service AI systems that Australians consider most important. Methods: We conducted a survey of the Australian population using an innovative sampling and weighting methodology involving 2 sample components: one from an omnibus survey using a sample selected using scientific probability sampling methods and one from a nonprobability-sampled web-based panel. The web-based panel sample was calibrated to the omnibus survey sample using behavioral, lifestyle, and sociodemographic variables. Univariate and bivariate analyses were performed. Results: We included weighted responses from 1950 Australians in the web-based panel along with a further 2498 responses from the omnibus survey for a subset of questions. Both weighted samples were sociodemographically well spread. An estimated 60% of Australians support the development of AI in general but, in specific health care scenarios, this diminishes to between 27% and 43% and, for social service scenarios, between 31% and 39%. Although all ethical and social dimensions of AI presented were rated as important, accuracy was consistently the most important and reducing costs the least important. Speed was also consistently lower in importance. In total, 4 in 5 Australians valued continued human contact and discretion in service provision more than any speed, accuracy, or convenience that AI systems might provide. Conclusions: The ethical and social dimensions of AI systems matter to Australians. Most think AI systems should augment rather than replace humans in the provision of both health care and social services. Although expressing broad support for AI, people made finely tuned judgments about the acceptability of particular AI applications with different potential benefits and downsides. Further qualitative research is needed to understand the reasons underpinning these judgments. The participation of ethicists, social scientists, and the public can help guide AI development and implementation, particularly in sensitive and value-laden domains such as health care and social services. %M 35994331 %R 10.2196/37611 %U https://www.jmir.org/2022/8/e37611 %U https://doi.org/10.2196/37611 %U http://www.ncbi.nlm.nih.gov/pubmed/35994331 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 8 %P e37284 %T Interactive Medical Image Labeling Tool to Construct a Robust Convolutional Neural Network Training Data Set: Development and Validation Study %A Reifs,David %A Reig-Bolaño,Ramon %A Casals,Marta %A Grau-Carrion,Sergi %+ Digital Care Research Group, Centre for Health and Social Care, Universitat of Vic-Central University of Catalonia, Carrer de la Sagrada Família, 7, Vic, 08500, Spain, 34 938861222, david.reifs@uvic.cat %K wound assessment %K pressure ulcers %K wound tissue classification %K labeling %K machine learning %D 2022 %7 22.8.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: Skin ulcers are an important cause of morbidity and mortality everywhere in the world and occur due to several causes, including diabetes mellitus, peripheral neuropathy, immobility, pressure, arteriosclerosis, infections, and venous insufficiency. Ulcers are lesions that fail to undergo an orderly healing process and produce functional and anatomical integrity in the expected time. In most cases, the methods of analysis used nowadays are rudimentary, which leads to errors and the use of invasive and uncomfortable techniques on patients. There are many studies that use a convolutional neural network to classify the different tissues in a wound. To obtain good results, the network must be trained with a correctly labeled data set by an expert in wound assessment. Typically, it is difficult to label pixel by pixel using a professional photo editor software, as this requires extensive time and effort from a health professional. Objective: The aim of this paper is to implement a new, fast, and accurate method of labeling wound samples for training a neural network to classify different tissues. Methods: We developed a support tool and evaluated its accuracy and reliability. We also compared the support tool classification with a digital gold standard (labeling the data with an image editing software). Results: The obtained comparison between the gold standard and the proposed method was 0.9789 for background, 0.9842 for intact skin, 0.8426 for granulation tissue, 0.9309 for slough, and 0.9871 for necrotic. The obtained speed on average was 2.6, compared to that of an advanced image editing user. Conclusions: This method increases tagging speed on average compared to an advanced image editing user. This increase is greater with untrained users. The samples obtained with the new system are indistinguishable from the samples made with the gold standard. %M 35994311 %R 10.2196/37284 %U https://medinform.jmir.org/2022/8/e37284 %U https://doi.org/10.2196/37284 %U http://www.ncbi.nlm.nih.gov/pubmed/35994311 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 8 %P e38440 %T Exploiting Missing Value Patterns for a Backdoor Attack on Machine Learning Models of Electronic Health Records: Development and Validation Study %A Joe,Byunggill %A Park,Yonghyeon %A Hamm,Jihun %A Shin,Insik %A Lee,Jiyeon %+ School of AI Convergence, Soongsil University, Mobility Intelligence & Computing Systems Laboratory, 369 Sangdo-ro, Dongjak-gu, Seoul, 06978, Republic of Korea, 82 2 820 0950, jylee.cs@ssu.ac.kr %K medical machine learning %K neural network %K mortality prediction %K backdoor attack %K electronic health record data %K Medical Information Mart for Intensive Care-III %K missing value %K mask %K meta-information %K variational autoencoder %D 2022 %7 19.8.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: A backdoor attack controls the output of a machine learning model in 2 stages. First, the attacker poisons the training data set, introducing a back door into the victim’s trained model. Second, during test time, the attacker adds an imperceptible pattern called a trigger to the input values, which forces the victim’s model to output the attacker’s intended values instead of true predictions or decisions. While backdoor attacks pose a serious threat to the reliability of machine learning–based medical diagnostics, existing backdoor attacks that directly change the input values are detectable relatively easily. Objective: The goal of this study was to propose and study a robust backdoor attack on mortality-prediction machine learning models that use electronic health records. We showed that our backdoor attack grants attackers full control over classification outcomes for safety-critical tasks such as mortality prediction, highlighting the importance of undertaking safe artificial intelligence research in the medical field. Methods: We present a trigger generation method based on missing patterns in electronic health record data. Compared to existing approaches, which introduce noise into the medical record, the proposed backdoor attack makes it simple to construct backdoor triggers without prior knowledge. To effectively avoid detection by manual inspectors, we employ variational autoencoders to learn the missing patterns in normal electronic health record data and produce trigger data that appears similar to this data. Results: We experimented with the proposed backdoor attack on 4 machine learning models (linear regression, multilayer perceptron, long short-term memory, and gated recurrent units) that predict in-hospital mortality using a public electronic health record data set. The results showed that the proposed technique achieved a significant drop in the victim’s discrimination performance (reducing the area under the precision-recall curve by at most 0.45), with a low poisoning rate (2%) in the training data set. In addition, the impact of the attack on general classification performance was negligible (it reduced the area under the precision-recall curve by an average of 0.01025), which makes it difficult to detect the presence of poison. Conclusions: To the best of our knowledge, this is the first study to propose a backdoor attack that uses missing information from tabular data as a trigger. Through extensive experiments, we demonstrated that our backdoor attack can inflict severe damage on medical machine learning classifiers in practice. %M 35984701 %R 10.2196/38440 %U https://medinform.jmir.org/2022/8/e38440 %U https://doi.org/10.2196/38440 %U http://www.ncbi.nlm.nih.gov/pubmed/35984701 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 5 %N 3 %P e39143 %T Improving Skin Color Diversity in Cancer Detection: Deep Learning Approach %A Rezk,Eman %A Eltorki,Mohamed %A El-Dakhakhni,Wael %+ School of Computational Science and Engineering, McMaster University, 1280 Main Street West, Hamilton, ON, L8S 4L8, Canada, 1 905 525 9140, rezke@mcmaster.ca %K deep learning %K neural network %K machine learning %K algorithm %K artificial intelligence %K skin tone diversity %K data augmentation %K skin cancer diagnosis %K generalizability %K skin %K cancer %K diagnosis %K diagnostic %K imaging %K dermatology %K digital health %K image generation %K generated image %K computer-generated %K lesion %D 2022 %7 19.8.2022 %9 Original Paper %J JMIR Dermatol %G English %X Background: The lack of dark skin images in pathologic skin lesions in dermatology resources hinders the accurate diagnosis of skin lesions in people of color. Artificial intelligence applications have further disadvantaged people of color because those applications are mainly trained with light skin color images. Objective: The aim of this study is to develop a deep learning approach that generates realistic images of darker skin colors to improve dermatology data diversity for various malignant and benign lesions. Methods: We collected skin clinical images for common malignant and benign skin conditions from DermNet NZ, the International Skin Imaging Collaboration, and Dermatology Atlas. Two deep learning methods, style transfer (ST) and deep blending (DB), were utilized to generate images with darker skin colors using the lighter skin images. The generated images were evaluated quantitively and qualitatively. Furthermore, a convolutional neural network (CNN) was trained using the generated images to assess the latter’s effect on skin lesion classification accuracy. Results: Image quality assessment showed that the ST method outperformed DB, as the former achieved a lower loss of realism score of 0.23 (95% CI 0.19-0.27) compared to 0.63 (95% CI 0.59-0.67) for the DB method. In addition, ST achieved a higher disease presentation with a similarity score of 0.44 (95% CI 0.40-0.49) compared to 0.17 (95% CI 0.14-0.21) for the DB method. The qualitative assessment completed on masked participants indicated that ST-generated images exhibited high realism, whereby 62.2% (1511/2430) of the votes for the generated images were classified as real. Eight dermatologists correctly diagnosed the lesions in the generated images with an average rate of 0.75 (360 correct diagnoses out of 480) for several malignant and benign lesions. Finally, the classification accuracy and the area under the curve (AUC) of the model when considering the generated images were 0.76 (95% CI 0.72-0.79) and 0.72 (95% CI 0.67-0.77), respectively, compared to the accuracy of 0.56 (95% CI 0.52-0.60) and AUC of 0.63 (95% CI 0.58-0.68) for the model without considering the generated images. Conclusions: Deep learning approaches can generate realistic skin lesion images that improve the skin color diversity of dermatology atlases. The diversified image bank, utilized herein to train a CNN, demonstrates the potential of developing generalizable artificial intelligence skin cancer diagnosis applications. International Registered Report Identifier (IRRID): RR2-10.2196/34896 %M 39475773 %R 10.2196/39143 %U https://derma.jmir.org/2022/3/e39143 %U https://doi.org/10.2196/39143 %U http://www.ncbi.nlm.nih.gov/pubmed/39475773 %0 Journal Article %@ 2563-3570 %I JMIR Publications %V 3 %N 1 %P e38226 %T The Application of Machine Learning in Predicting Mortality Risk in Patients With Severe Femoral Neck Fractures: Prediction Model Development Study %A Xu,Lingxiao %A Liu,Jun %A Han,Chunxia %A Ai,Zisheng %+ Department of Medical Statistics, Tongji University, No1239 Siping Road, Shanghai, 200092, China, 86 13774380743, azs1966@126.com %K machine learning %K femoral neck fracture %K hospital mortality %K hip %K fracture %K mortality %K prediction %K intensive care unit %K ICU %K decision-making %K risk %K assessment %K prognosis %D 2022 %7 19.8.2022 %9 Original Paper %J JMIR Bioinform Biotech %G English %X Background: Femoral neck fracture (FNF) accounts for approximately 3.58% of all fractures in the entire body, exhibiting an increasing trend each year. According to a survey, in 1990, the total number of hip fractures in men and women worldwide was approximately 338,000 and 917,000, respectively. In China, FNFs account for 48.22% of hip fractures. Currently, many studies have been conducted on postdischarge mortality and mortality risk in patients with FNF. However, there have been no definitive studies on in-hospital mortality or its influencing factors in patients with severe FNF admitted to the intensive care unit. Objective: In this paper, 3 machine learning methods were used to construct a nosocomial death prediction model for patients admitted to intensive care units to assist clinicians in early clinical decision-making. Methods: A retrospective analysis was conducted using information of a patient with FNF from the Medical Information Mart for Intensive Care III. After balancing the data set using the Synthetic Minority Oversampling Technique algorithm, patients were randomly separated into a 70% training set and a 30% testing set for the development and validation, respectively, of the prediction model. Random forest, extreme gradient boosting, and backpropagation neural network prediction models were constructed with nosocomial death as the outcome. Model performance was assessed using the area under the receiver operating characteristic curve, accuracy, precision, sensitivity, and specificity. The predictive value of the models was verified in comparison to the traditional logistic model. Results: A total of 366 patients with FNFs were selected, including 48 cases (13.1%) of in-hospital death. Data from 636 patients were obtained by balancing the data set with the in-hospital death group to survival group as 1:1. The 3 machine learning models exhibited high predictive accuracy, and the area under the receiver operating characteristic curve of the random forest, extreme gradient boosting, and backpropagation neural network were 0.98, 0.97, and 0.95, respectively, all with higher predictive performance than the traditional logistic regression model. Ranking the importance of the feature variables, the top 10 feature variables that were meaningful for predicting the risk of in-hospital death of patients were the Simplified Acute Physiology Score II, lactate, creatinine, gender, vitamin D, calcium, creatine kinase, creatine kinase isoenzyme, white blood cell, and age. Conclusions: Death risk assessment models constructed using machine learning have positive significance for predicting the in-hospital mortality of patients with severe disease and provide a valid basis for reducing in-hospital mortality and improving patient prognosis. %R 10.2196/38226 %U https://bioinform.jmir.org/2022/1/e38226 %U https://doi.org/10.2196/38226 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 11 %N 2 %P e37584 %T Addressing Medicine’s Dark Matter %A Rose,Christian %A Díaz,Mark %A Díaz,Tomás %+ Department of Emergency Medicine, School of Medicine, Stanford University, 900 Welch Road, Suite 350, Palo Alto, CA, 94304, United States, 1 4159159585, ccrose@stanford.edu %K big data %K AI %K artificial intelligence %K equity %K data collection %K health care %K prediction %K model %K predict %K representative %K unrepresented %D 2022 %7 17.8.2022 %9 Viewpoint %J Interact J Med Res %G English %X In the 20th century, the models used to predict the motion of heavenly bodies did not match observation. Investigating this incongruity led to the discovery of dark matter—the most abundant substance in the universe. In medicine, despite years of using a data-hungry approach, our models have been limited in their ability to predict population health outcomes—that is, our observations also do not meet our expectations. We believe this phenomenon represents medicine’s “dark matter”— the features which have a tremendous effect on clinical outcomes that we cannot directly observe yet. Advancing the information science of health care systems will thus require unique solutions and a humble approach that acknowledges its limitations. Dark matter changed the way the scientific community understood the universe; what might medicine learn from what it cannot yet see? %M 35976194 %R 10.2196/37584 %U https://www.i-jmr.org/2022/2/e37584 %U https://doi.org/10.2196/37584 %U http://www.ncbi.nlm.nih.gov/pubmed/35976194 %0 Journal Article %@ 2291-9279 %I JMIR Publications %V 10 %N 3 %P e39186 %T Breathing as an Input Modality in a Gameful Breathing Training App (Breeze 2): Development and Evaluation Study %A Lukic,Yanick Xavier %A Teepe,Gisbert Wilhelm %A Fleisch,Elgar %A Kowatsch,Tobias %+ Centre for Digital Health Interventions, Department of Management, Technology, and Economics, ETH Zurich, Weinbergstrasse 56/58, Zurich, , Switzerland, 41 446328638, ylukic@ethz.ch %K breathing training %K serious game %K biofeedback %K digital health %K mobile health %K mHealth %K mobile phone %K machine learning %K deep learning %K transfer learning %K neural networks %D 2022 %7 16.8.2022 %9 Original Paper %J JMIR Serious Games %G English %X Background: Slow-paced breathing training can have positive effects on physiological and psychological well-being. Unfortunately, use statistics indicate that adherence to breathing training apps is low. Recent work suggests that gameful breathing training may help overcome this challenge. Objective: This study aimed to introduce and evaluate the gameful breathing training app Breeze 2 and its novel real-time breathing detection algorithm that enables the interactive components of the app. Methods: We developed the breathing detection algorithm by using deep transfer learning to detect inhalation, exhalation, and nonbreathing sounds (including silence). An additional heuristic prolongs detected exhalations to stabilize the algorithm’s predictions. We evaluated Breeze 2 with 30 participants (women: n=14, 47%; age: mean 29.77, SD 7.33 years). Participants performed breathing training with Breeze 2 in 2 sessions with and without headphones. They answered questions regarding user engagement (User Engagement Scale Short Form [UES-SF]), perceived effectiveness (PE), perceived relaxation effectiveness, and perceived breathing detection accuracy. We used Wilcoxon signed-rank tests to compare the UES-SF, PE, and perceived relaxation effectiveness scores with neutral scores. Furthermore, we correlated perceived breathing detection accuracy with actual multi-class balanced accuracy to determine whether participants could perceive the actual breathing detection performance. We also conducted a repeated-measure ANOVA to investigate breathing detection differences in balanced accuracy with and without the heuristic and when classifying data captured from headphones and smartphone microphones. The analysis controlled for potential between-subject effects of the participants’ sex. Results: Our results show scores that were significantly higher than neutral scores for the UES-SF (W=459; P<.001), PE (W=465; P<.001), and perceived relaxation effectiveness (W=358; P<.001). Perceived breathing detection accuracy correlated significantly with the actual multi-class balanced accuracy (r=0.51; P<.001). Furthermore, we found that the heuristic significantly improved the breathing detection balanced accuracy (F1,25=6.23; P=.02) and that detection performed better on data captured from smartphone microphones than than on data from headphones (F1,25=17.61; P<.001). We did not observe any significant between-subject effects of sex. Breathing detection without the heuristic reached a multi-class balanced accuracy of 74% on the collected audio recordings. Conclusions: Most participants (28/30, 93%) perceived Breeze 2 as engaging and effective. Furthermore, breathing detection worked well for most participants, as indicated by the perceived detection accuracy and actual detection accuracy. In future work, we aim to use the collected breathing sounds to improve breathing detection with regard to its stability and performance. We also plan to use Breeze 2 as an intervention tool in various studies targeting the prevention and management of noncommunicable diseases. %M 35972793 %R 10.2196/39186 %U https://games.jmir.org/2022/3/e39186 %U https://doi.org/10.2196/39186 %U http://www.ncbi.nlm.nih.gov/pubmed/35972793 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 8 %P e39846 %T A Comparison Between Clinical Guidelines and Real-World Treatment Data in Examining the Use of Session Summaries: Retrospective Study %A Sadeh-Sharvit,Shiri %A Rego,Simon A %A Jefroykin,Samuel %A Peretz,Gal %A Kupershmidt,Tomer %+ Eleos Health, 260 Charles St, Waltham, MA, 02453, United States, 1 5109848132, shiri@eleos.health %K Empirically based practices %K natural language processing %K psychotherapy %K behavioral therapy %K adherence %K treatment fidelity %K clinical training %K real-world data %K real-world study %D 2022 %7 16.8.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Although behavioral interventions have been found to be efficacious and effective in randomized clinical trials for most mental illnesses, the quality and efficacy of mental health care delivery remains inadequate in real-world settings, partly owing to suboptimal treatment fidelity. This “therapist drift” is an ongoing issue that ultimately reduces the effectiveness of treatments; however, until recently, there have been limited opportunities to assess adherence beyond large randomized controlled trials. Objective: This study explored therapists’ use of a standard component that is pertinent across most behavioral treatments—prompting clients to summarize their treatment session as a means for consolidating and augmenting their understanding of the session and the treatment plan. Methods: The data set for this study comprised 17,607 behavioral treatment sessions administered by 322 therapists to 3519 patients in 37 behavioral health care programs across the United States. Sessions were captured by a therapy-specific artificial intelligence (AI) platform, and an automatic speech recognition system transcribed the treatment meeting and separated the data to the therapist and client utterances. A search for possible session summary prompts was then conducted, with 2 psychologists validating the text that emerged. Results: We found that despite clinical recommendations, only 54 (0.30%) sessions included a summary. Exploratory analyses indicated that session summaries mostly addressed relationships (n=27), work (n=20), change (n=6), and alcohol (n=5). Sessions with meeting summaries were also characterized by greater therapist interventions and included greater use of validation, complex reflections, and proactive problem-solving techniques. Conclusions: To the best of our knowledge, this is the first study to assess a large, diverse data set of real-world treatment practices. Our findings provide evidence that fidelity with the core components of empirically designed psychological interventions is a challenge in real-world settings. The results of this study can inform the development of machine learning and AI algorithms and offer nuanced, timely feedback to providers, thereby improving the delivery of evidence-based practices and quality of mental health care services and facilitating better clinical outcomes in real-world settings. %M 35972782 %R 10.2196/39846 %U https://formative.jmir.org/2022/8/e39846 %U https://doi.org/10.2196/39846 %U http://www.ncbi.nlm.nih.gov/pubmed/35972782 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 8 %P e34304 %T Tempering Expectations on the Medical Artificial Intelligence Revolution: The Medical Trainee Viewpoint %A Hu,Zoe %A Hu,Ricky %A Yau,Olivia %A Teng,Minnie %A Wang,Patrick %A Hu,Grace %A Singla,Rohit %+ School of Medicine, Queen's University, 166 Brock Street, Kingston, ON, K7L5G2, Canada, 1 6132042952, zhu@qmed.ca %K medical education %K artificial intelligence %K health care trainees %K AI %K health care workers %D 2022 %7 15.8.2022 %9 Viewpoint %J JMIR Med Inform %G English %X The rapid development of artificial intelligence (AI) in medicine has resulted in an increased number of applications deployed in clinical trials. AI tools have been developed with goals of improving diagnostic accuracy, workflow efficiency through automation, and discovery of novel features in clinical data. There is subsequent concern on the role of AI in replacing existing tasks traditionally entrusted to physicians. This has implications for medical trainees who may make decisions based on the perception of how disruptive AI may be to their future career. This commentary discusses current barriers to AI adoption to moderate concerns of the role of AI in the clinical setting, particularly as a standalone tool that replaces physicians. Technical limitations of AI include generalizability of performance and deficits in existing infrastructure to accommodate data, both of which are less obvious in pilot studies, where high performance is achieved in a controlled data processing environment. Economic limitations include rigorous regulatory requirements to deploy medical devices safely, particularly if AI is to replace human decision-making. Ethical guidelines are also required in the event of dysfunction to identify responsibility of the developer of the tool, health care authority, and patient. The consequences are apparent when identifying the scope of existing AI tools, most of which aim to be physician assisting rather than a physician replacement. The combination of the limitations will delay the onset of ubiquitous AI tools that perform standalone clinical tasks. The role of the physician likely remains paramount to clinical decision-making in the near future. %M 35969464 %R 10.2196/34304 %U https://medinform.jmir.org/2022/8/e34304 %U https://doi.org/10.2196/34304 %U http://www.ncbi.nlm.nih.gov/pubmed/35969464 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 9 %N 8 %P e39807 %T Using Voice Biomarkers to Classify Suicide Risk in Adult Telehealth Callers: Retrospective Observational Study %A Iyer,Ravi %A Nedeljkovic,Maja %A Meyer,Denny %+ Centre for Mental Health, Swinburne University of Technology, 34 Wakefield Street, Hawthorn, 3122, Australia, 61 454565575, raviiyer@swin.edu.au %K voice biometrics %K suicide prevention %K machine learning %K telehealth %K suicide %K telehealth %K risk prediction %K prediction model %K voice biomarker %K mental health %D 2022 %7 15.8.2022 %9 Original Paper %J JMIR Ment Health %G English %X Background: Artificial intelligence has the potential to innovate current practices used to detect the imminent risk of suicide and to address shortcomings in traditional assessment methods. Objective: In this paper, we sought to automatically classify short segments (40 milliseconds) of speech according to low versus imminent risk of suicide in a large number (n=281) of telephone calls made to 2 telehealth counselling services in Australia. Methods: A total of 281 help line telephone call recordings sourced from On The Line, Australia (n=266, 94.7%) and 000 Emergency services, Canberra (n=15, 5.3%) were included in this study. Imminent risk of suicide was coded for when callers affirmed intent, plan, and the availability of means; level of risk was assessed by the responding counsellor and reassessed by a team of clinical researchers using the Columbia Suicide Severity Rating Scale (=5/6). Low risk of suicide was coded for in an absence of intent, plan, and means and via Columbia suicide Severity Scale Ratings (=1/2). Preprocessing involved normalization and pre-emphasis of voice signals, while voice biometrics were extracted using the statistical language r. Candidate predictors were identified using Lasso regression. Each voice biomarker was assessed as a predictor of suicide risk using a generalized additive mixed effects model with splines to account for nonlinearity. Finally, a component-wise gradient boosting model was used to classify each call recording based on precoded suicide risk ratings. Results: A total of 77 imminent-risk calls were compared with 204 low-risk calls. Moreover, 36 voice biomarkers were extracted from each speech frame. Caller sex was a significant moderating factor (β=–.84, 95% CI –0.85, –0.84; t=6.59, P<.001). Candidate biomarkers were reduced to 11 primary markers, with distinct models developed for men and women. Using leave-one-out cross-validation, ensuring that the speech frames of no single caller featured in both training and test data sets simultaneously, an area under the precision or recall curve of 0.985 was achieved (95% CI 0.97, 1.0). The gamboost classification model correctly classified 469,332/470,032 (99.85%) speech frames. Conclusions: This study demonstrates an objective, efficient, and economical assessment of imminent suicide risk in an ecologically valid setting with potential applications to real-time assessment and response. Trial Registration: Australian New Zealand Clinical Trials Registry ACTRN12622000486729; https://www.anzctr.org.au/ACTRN12622000486729.aspx %M 35969444 %R 10.2196/39807 %U https://mental.jmir.org/2022/8/e39807 %U https://doi.org/10.2196/39807 %U http://www.ncbi.nlm.nih.gov/pubmed/35969444 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 8 %P e37054 %T Feasibility of Conducting Long-term Health and Behaviors Follow-up in Adolescents: Longitudinal Observational Study %A Cucchiaro,Giovanni %A Ahumada,Luis %A Gray,Geoffrey %A Fierstein,Jamie %A Yates,Hannah %A Householder,Kym %A Frye,William %A Rehman,Mohamed %+ Johns Hopkins All Children's Hospital, 601 5th Street South, St. Petersburg, FL, 33701, United States, 1 6266168290, gcucchi1@jhmi.edu %K Fitbit %K wearables %K health tracker %K survey %K adolescents %K psychosocial %K long term %K follow-up %K feasibility %K artificial intelligence %K machine learning %K posterior spine fusion %K operation %K surgery %D 2022 %7 15.8.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Machine learning uses algorithms that improve automatically through experience. This statistical learning approach is a natural extension of traditional statistical methods and can offer potential advantages for certain problems. The feasibility of using machine learning techniques in health care is predicated on access to a sufficient volume of data in a problem space. Objective: This study aimed to assess the feasibility of data collection from an adolescent population before and after a posterior spine fusion operation. Methods: Both physical and psychosocial data were collected. Adolescents scheduled for a posterior spine fusion operation were approached when they were scheduled for the surgery. The study collected repeated measures of patient data, including at least 2 weeks prior to the operation and 6 months after the patients were discharged from the hospital. Patients were provided with a Fitbit Charge 4 (consumer-grade health tracker) and instructed to wear it as often as possible. A third-party web-based portal was used to collect and store the Fitbit data, and patients were trained on how to download and sync their personal device data on step counts, sleep time, and heart rate onto the web-based portal. Demographic and physiologic data recorded in the electronic medical record were retrieved from the hospital data warehouse. We evaluated changes in the patients’ psychological profile over time using several validated questionnaires (ie, Pain Catastrophizing Scale, Patient Health Questionnaire, Generalized Anxiety Disorder Scale, and Pediatric Quality of Life Inventory). Questionnaires were administered to patients using Qualtrics software. Patients received the questionnaire prior to and during the hospitalization and again at 3 and 6 months postsurgery. We administered paper-based questionnaires for the self-report of daily pain scores and the use of analgesic medications. Results: There were several challenges to data collection from the study population. Only 38% (32/84) of the patients we approached met eligibility criteria, and 50% (16/32) of the enrolled patients dropped out during the follow-up period—on average 17.6 weeks into the study. Of those who completed the study, 69% (9/13) reliably wore the Fitbit and downloaded data into the web-based portal. These patients also had a high response rate to the psychosocial surveys. However, none of the patients who finished the study completed the paper-based pain diary. There were no difficulties accessing the demographic and clinical data stored in the hospital data warehouse. Conclusions: This study identifies several challenges to long-term medical follow-up in adolescents, including willingness to participate in these types of studies and compliance with the various data collection approaches. Several of these challenges—insufficient incentives and personal contact between researchers and patients—should be addressed in future studies. %M 35969442 %R 10.2196/37054 %U https://formative.jmir.org/2022/8/e37054 %U https://doi.org/10.2196/37054 %U http://www.ncbi.nlm.nih.gov/pubmed/35969442 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 8 %P e38092 %T Design and Formative Evaluation of a Virtual Voice-Based Coach for Problem-solving Treatment: Observational Study %A Kannampallil,Thomas %A Ronneberg,Corina R %A Wittels,Nancy E %A Kumar,Vikas %A Lv,Nan %A Smyth,Joshua M %A Gerber,Ben S %A Kringle,Emily A %A Johnson,Jillian A %A Yu,Philip %A Steinman,Lesley E %A Ajilore,Olu A %A Ma,Jun %+ University of Illinois at Chicago, 1747 W. Roosevelt Rd, Room 466 (MC 275), Chicago, IL, 60608, United States, 1 (312) 413 9830, maj2015@uic.edu %K voice assistants %K behavioral therapy %K problem-solving therapy %K mental health %K artificial intelligence %K user evaluation %D 2022 %7 12.8.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Artificial intelligence has provided new opportunities for human interactions with technology for the practice of medicine. Among the recent artificial intelligence innovations, personal voice assistants have been broadly adopted. This highlights their potential for health care–related applications such as behavioral counseling to promote healthy lifestyle habits and emotional well-being. However, the use of voice-based applications for behavioral therapy has not been previously evaluated. Objective: This study aimed to conduct a formative user evaluation of Lumen, a virtual voice-based coach developed as an Alexa skill that delivers evidence-based, problem-solving treatment for patients with mild to moderate depression and/or anxiety. Methods: A total of 26 participants completed 2 therapy sessions—an introductory (session 1) and a problem-solving (session 2)—with Lumen. Following each session with Lumen, participants completed user experience, task-related workload, and work alliance surveys. They also participated in semistructured interviews addressing the benefits, challenges and barriers to Lumen use, and design recommendations. We evaluated the differences in user experience, task load, and work alliance between sessions using 2-tailed paired t tests. Interview transcripts were coded using an inductive thematic analysis to characterize the participants’ perspectives regarding Lumen use. Results: Participants found Lumen to provide high pragmatic usability and favorable user experience, with marginal task load during interactions for both Lumen sessions. However, participants experienced a higher temporal workload during the problem-solving session, suggesting a feeling of being rushed during their communicative interactions. On the basis of the qualitative analysis, the following themes were identified: Lumen’s on-demand accessibility and the delivery of a complex problem-solving treatment task with a simplistic structure for achieving therapy goals; themes related to Lumen improvements included streamlining and improved personalization of conversations, slower pacing of conversations, and providing additional context during therapy sessions. Conclusions: On the basis of an in-depth formative evaluation, we found that Lumen supported the ability to conduct cognitively plausible interactions for the delivery of behavioral therapy. Several design suggestions identified from the study including reducing temporal and cognitive load during conversational interactions, developing more natural conversations, and expanding privacy and security features were incorporated in the revised version of Lumen. Although further research is needed, the promising findings from this study highlight the potential for using Lumen to deliver personalized and accessible mental health care, filling a gap in traditional mental health services. %M 35969431 %R 10.2196/38092 %U https://formative.jmir.org/2022/8/e38092 %U https://doi.org/10.2196/38092 %U http://www.ncbi.nlm.nih.gov/pubmed/35969431 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 8 %P e36443 %T Qualitative Evaluation of an Artificial Intelligence–Based Clinical Decision Support System to Guide Rhythm Management of Atrial Fibrillation: Survey Study %A Stacy,John %A Kim,Rachel %A Barrett,Christopher %A Sekar,Balaviknesh %A Simon,Steven %A Banaei-Kashani,Farnoush %A Rosenberg,Michael A %+ Department of Medicine, University of Colorado, 12631 E 17th Ave, Mailbox B177, Aurora, CO, 80045, United States, 1 303 724 1785, john.stacy@cuanschutz.edu %K Clinical decision support system %K machine learning %K supervised learning %K reinforcement learning %K atrial fibrillation %K rhythm strategy %D 2022 %7 11.8.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Despite the numerous studies evaluating various rhythm control strategies for atrial fibrillation (AF), determination of the optimal strategy in a single patient is often based on trial and error, with no one-size-fits-all approach based on international guidelines/recommendations. The decision, therefore, remains personal and lends itself well to help from a clinical decision support system, specifically one guided by artificial intelligence (AI). QRhythm utilizes a 2-stage machine learning (ML) model to identify the optimal rhythm management strategy in a given patient based on a set of clinical factors, in which the model first uses supervised learning to predict the actions of an expert clinician and identifies the best strategy through reinforcement learning to obtain the best clinical outcome—a composite of symptomatic recurrence, hospitalization, and stroke. Objective: We qualitatively evaluated a novel, AI-based, clinical decision support system (CDSS) for AF rhythm management, called QRhythm, which uses both supervised and reinforcement learning to recommend either a rate control or one of 3 types of rhythm control strategies—external cardioversion, antiarrhythmic medication, or ablation—based on individual patient characteristics. Methods: Thirty-three clinicians, including cardiology attendings and fellows and internal medicine attendings and residents, performed an assessment of QRhythm, followed by a survey to assess relative comfort with automated CDSS in rhythm management and to examine areas for future development. Results: The 33 providers were surveyed with training levels ranging from resident to fellow to attending. Of the characteristics of the app surveyed, safety was most important to providers, with an average importance rating of 4.7 out of 5 (SD 0.72). This priority was followed by clinical integrity (a desire for the advice provided to make clinical sense; importance rating 4.5, SD 0.9), backward interpretability (transparency in the population used to create the algorithm; importance rating 4.3, SD 0.65), transparency of the algorithm (reasoning underlying the decisions made; importance rating 4.3, SD 0.88), and provider autonomy (the ability to challenge the decisions made by the model; importance rating 3.85, SD 0.83). Providers who used the app ranked the integrity of recommendations as their highest concern with ongoing clinical use of the model, followed by efficacy of the application and patient data security. Trust in the app varied; 1 (17%) provider responded that they somewhat disagreed with the statement, “I trust the recommendations provided by the QRhythm app,” 2 (33%) providers responded with neutrality to the statement, and 3 (50%) somewhat agreed with the statement. Conclusions: Safety of ML applications was the highest priority of the providers surveyed, and trust of such models remains varied. Widespread clinical acceptance of ML in health care is dependent on how much providers trust the algorithms. Building this trust involves ensuring transparency and interpretability of the model. %M 35969422 %R 10.2196/36443 %U https://formative.jmir.org/2022/8/e36443 %U https://doi.org/10.2196/36443 %U http://www.ncbi.nlm.nih.gov/pubmed/35969422 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 8 %P e36010 %T Overview of Artificial Intelligence–Driven Wearable Devices for Diabetes: Scoping Review %A Ahmed,Arfan %A Aziz,Sarah %A Abd-alrazaq,Alaa %A Farooq,Faisal %A Sheikh,Javaid %+ AI Center for Precision Health, Weill Cornell Medicine-Qatar, Education City, Qatar Foundation, PO Box 24144, Doha, Qatar, 974 4492 8826, ara4013@qatar-med.cornell.edu %K diabetes %K artificial intelligence %K wearable devices %K machine learning %K mobile phone %D 2022 %7 9.8.2022 %9 Review %J J Med Internet Res %G English %X Background: Prevalence of diabetes has steadily increased over the last few decades with 1.5 million deaths reported in 2012 alone. Traditionally, analyzing patients with diabetes has remained a largely invasive approach. Wearable devices (WDs) make use of sensors historically reserved for hospital settings. WDs coupled with artificial intelligence (AI) algorithms show promise to help understand and conclude meaningful information from the gathered data and provide advanced and clinically meaningful analytics. Objective: This review aimed to provide an overview of AI-driven WD features for diabetes and their use in monitoring diabetes-related parameters. Methods: We searched 7 of the most popular bibliographic databases using 3 groups of search terms related to diabetes, WDs, and AI. A 2-stage process was followed for study selection: reading abstracts and titles followed by full-text screening. Two reviewers independently performed study selection and data extraction, and disagreements were resolved by consensus. A narrative approach was used to synthesize the data. Results: From an initial 3872 studies, we report the features from 37 studies post filtering according to our predefined inclusion criteria. Most of the studies targeted type 1 diabetes, type 2 diabetes, or both (21/37, 57%). Many studies (15/37, 41%) reported blood glucose as their main measurement. More than half of the studies (21/37, 57%) had the aim of estimation and prediction of glucose or glucose level monitoring. Over half of the reviewed studies looked at wrist-worn devices. Only 41% of the study devices were commercially available. We observed the use of multiple sensors with photoplethysmography sensors being most prevalent in 32% (12/37) of studies. Studies reported and compared >1 machine learning (ML) model with high levels of accuracy. Support vector machine was the most reported (13/37, 35%), followed by random forest (12/37, 32%). Conclusions: This review is the most extensive work, to date, summarizing WDs that use ML for people with diabetes, and provides research direction to those wanting to further contribute to this emerging field. Given the advancements in WD technologies replacing the need for invasive hospital setting devices, we see great advancement potential in this domain. Further work is needed to validate the ML approaches on clinical data from WDs and provide meaningful analytics that could serve as data gathering, monitoring, prediction, classification, and recommendation devices in the context of diabetes. %M 35943772 %R 10.2196/36010 %U https://www.jmir.org/2022/8/e36010 %U https://doi.org/10.2196/36010 %U http://www.ncbi.nlm.nih.gov/pubmed/35943772 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 8 %P e36199 %T Application of Artificial Intelligence in Shared Decision Making: Scoping Review %A Abbasgholizadeh Rahimi,Samira %A Cwintal,Michelle %A Huang,Yuhui %A Ghadiri,Pooria %A Grad,Roland %A Poenaru,Dan %A Gore,Genevieve %A Zomahoun,Hervé Tchala Vignon %A Légaré,France %A Pluye,Pierre %+ Department of Family Medicine, McGill University, 5858 Cote-des-Neiges Rd, Suite 300, Montreal, QC, H3S 1Z1, Canada, 1 (514)399 9218, samira.rahimi@mcgill.ca %K artificial intelligence %K machine learning %K shared decision making %K patient-centered care %K scoping review %D 2022 %7 9.8.2022 %9 Review %J JMIR Med Inform %G English %X Background: Artificial intelligence (AI) has shown promising results in various fields of medicine. It has the potential to facilitate shared decision making (SDM). However, there is no comprehensive mapping of how AI may be used for SDM. Objective: We aimed to identify and evaluate published studies that have tested or implemented AI to facilitate SDM. Methods: We performed a scoping review informed by the methodological framework proposed by Levac et al, modifications to the original Arksey and O'Malley framework of a scoping review, and the Joanna Briggs Institute scoping review framework. We reported our results based on the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) reporting guideline. At the identification stage, an information specialist performed a comprehensive search of 6 electronic databases from their inception to May 2021. The inclusion criteria were: all populations; all AI interventions that were used to facilitate SDM, and if the AI intervention was not used for the decision-making point in SDM, it was excluded; any outcome related to patients, health care providers, or health care systems; studies in any health care setting, only studies published in the English language, and all study types. Overall, 2 reviewers independently performed the study selection process and extracted data. Any disagreements were resolved by a third reviewer. A descriptive analysis was performed. Results: The search process yielded 1445 records. After removing duplicates, 894 documents were screened, and 6 peer-reviewed publications met our inclusion criteria. Overall, 2 of them were conducted in North America, 2 in Europe, 1 in Australia, and 1 in Asia. Most articles were published after 2017. Overall, 3 articles focused on primary care, and 3 articles focused on secondary care. All studies used machine learning methods. Moreover, 3 articles included health care providers in the validation stage of the AI intervention, and 1 article included both health care providers and patients in clinical validation, but none of the articles included health care providers or patients in the design and development of the AI intervention. All used AI to support SDM by providing clinical recommendations or predictions. Conclusions: Evidence of the use of AI in SDM is in its infancy. We found AI supporting SDM in similar ways across the included articles. We observed a lack of emphasis on patients’ values and preferences, as well as poor reporting of AI interventions, resulting in a lack of clarity about different aspects. Little effort was made to address the topics of explainability of AI interventions and to include end-users in the design and development of the interventions. Further efforts are required to strengthen and standardize the use of AI in different steps of SDM and to evaluate its impact on various decisions, populations, and settings. %M 35943793 %R 10.2196/36199 %U https://medinform.jmir.org/2022/8/e36199 %U https://doi.org/10.2196/36199 %U http://www.ncbi.nlm.nih.gov/pubmed/35943793 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 9 %N 8 %P e38428 %T Predicting Patient Wait Times by Using Highly Deidentified Data in Mental Health Care: Enhanced Machine Learning Approach %A Rastpour,Amir %A McGregor,Carolyn %+ Faculty of Business and Information Technology, Ontario Tech University, 2000 Simcoe St N, Oshawa, ON, L1G 0C5, Canada, 1 905 721 8668 ext 2830, amir.rastpour@ontariotechu.ca %K mental health care %K outpatient clinics %K wait time prediction %K machine learning %K random forest %K system’s knowledge %D 2022 %7 9.8.2022 %9 Original Paper %J JMIR Ment Health %G English %X Background: Wait times impact patient satisfaction, treatment effectiveness, and the efficiency of care that the patients receive. Wait time prediction in mental health is a complex task and is affected by the difficulty in predicting the required number of treatment sessions for outpatients, high no-show rates, and the possibility of using group treatment sessions. The task of wait time analysis becomes even more challenging if the input data has low utility, which happens when the data is highly deidentified by removing both direct and quasi identifiers. Objective: The first aim of this study was to develop machine learning models to predict the wait time from referral to the first appointment for psychiatric outpatients by using real-time data. The second aim was to enhance the performance of these predictive models by utilizing the system’s knowledge while the input data were highly deidentified. The third aim was to identify the factors that drove long wait times, and the fourth aim was to build these models such that they were practical and easy-to-implement (and therefore, attractive to care providers). Methods: We analyzed retrospective highly deidentified administrative data from 8 outpatient clinics at Ontario Shores Centre for Mental Health Sciences in Canada by using 6 machine learning methods to predict the first appointment wait time for new outpatients. We used the system’s knowledge to mitigate the low utility of our data. The data included 4187 patients who received care through 30,342 appointments. Results: The average wait time varied widely between different types of mental health clinics. For more than half of the clinics, the average wait time was longer than 3 months. The number of scheduled appointments and the rate of no-shows varied widely among clinics. Despite these variations, the random forest method provided the minimum root mean square error values for 4 of the 8 clinics, and the second minimum root mean square error for the other 4 clinics. Utilizing the system’s knowledge increased the utility of our highly deidentified data and improved the predictive power of the models. Conclusions: The random forest method, enhanced with the system’s knowledge, provided reliable wait time predictions for new outpatients, regardless of low utility of the highly deidentified input data and the high variation in wait times across different clinics and patient types. The priority system was identified as a factor that contributed to long wait times, and a fast-track system was suggested as a potential solution. %M 35943774 %R 10.2196/38428 %U https://mental.jmir.org/2022/8/e38428 %U https://doi.org/10.2196/38428 %U http://www.ncbi.nlm.nih.gov/pubmed/35943774 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 8 %P e38082 %T Predicting Mortality in Intensive Care Unit Patients With Heart Failure Using an Interpretable Machine Learning Model: Retrospective Cohort Study %A Li,Jili %A Liu,Siru %A Hu,Yundi %A Zhu,Lingfeng %A Mao,Yujia %A Liu,Jialin %+ Department of Medical Informatics, West China Hospital, Sichuan University, No 37 Guoxue Road, Chengdu, 610041, China, 86 28 85422306, dljl8@163.com %K heart failure %K mortality %K intensive care unit %K prediction %K XGBoost %K SHAP %K SHapley Additive exPlanation %D 2022 %7 9.8.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Heart failure (HF) is a common disease and a major public health problem. HF mortality prediction is critical for developing individualized prevention and treatment plans. However, due to their lack of interpretability, most HF mortality prediction models have not yet reached clinical practice. Objective: We aimed to develop an interpretable model to predict the mortality risk for patients with HF in intensive care units (ICUs) and used the SHapley Additive exPlanation (SHAP) method to explain the extreme gradient boosting (XGBoost) model and explore prognostic factors for HF. Methods: In this retrospective cohort study, we achieved model development and performance comparison on the eICU Collaborative Research Database (eICU-CRD). We extracted data during the first 24 hours of each ICU admission, and the data set was randomly divided, with 70% used for model training and 30% used for model validation. The prediction performance of the XGBoost model was compared with three other machine learning models by the area under the curve. We used the SHAP method to explain the XGBoost model. Results: A total of 2798 eligible patients with HF were included in the final cohort for this study. The observed in-hospital mortality of patients with HF was 9.97%. Comparatively, the XGBoost model had the highest predictive performance among four models with an area under the curve (AUC) of 0.824 (95% CI 0.7766-0.8708), whereas support vector machine had the poorest generalization ability (AUC=0.701, 95% CI 0.6433-0.7582). The decision curve showed that the net benefit of the XGBoost model surpassed those of other machine learning models at 10%~28% threshold probabilities. The SHAP method reveals the top 20 predictors of HF according to the importance ranking, and the average of the blood urea nitrogen was recognized as the most important predictor variable. Conclusions: The interpretable predictive model helps physicians more accurately predict the mortality risk in ICU patients with HF, and therefore, provides better treatment plans and optimal resource allocation for their patients. In addition, the interpretable framework can increase the transparency of the model and facilitate understanding the reliability of the predictive model for the physicians. %M 35943767 %R 10.2196/38082 %U https://www.jmir.org/2022/8/e38082 %U https://doi.org/10.2196/38082 %U http://www.ncbi.nlm.nih.gov/pubmed/35943767 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 9 %N 3 %P e34514 %T Consumer Perspectives on the Use of Artificial Intelligence Technology and Automation in Crisis Support Services: Mixed Methods Study %A Ma,Jennifer S %A O’Riordan,Megan %A Mazzer,Kelly %A Batterham,Philip J %A Bradford,Sally %A Kõlves,Kairi %A Titov,Nickolai %A Klein,Britt %A Rickwood,Debra J %+ Discipline of Psychology, Faculty of Health, University of Canberra, 11 Kirinari Street, Bruce, ACT, 2617, Australia, 61 (0)2 6201 2701, Debra.Rickwood@canberra.edu.au %K consumer %K community %K help-seeker %K perspective %K technology %K artificial intelligence %K crisis %K support %K acceptability %D 2022 %7 5.8.2022 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Emerging technologies, such as artificial intelligence (AI), have the potential to enhance service responsiveness and quality, improve reach to underserved groups, and help address the lack of workforce capacity in health and mental health care. However, little research has been conducted on the acceptability of AI, particularly in mental health and crisis support, and how this may inform the development of responsible and responsive innovation in the area. Objective: This study aims to explore the level of support for the use of technology and automation, such as AI, in Lifeline’s crisis support services in Australia; the likelihood of service use if technology and automation were implemented; the impact of demographic characteristics on the level of support and likelihood of service use; and reasons for not using Lifeline’s crisis support services if technology and automation were implemented in the future. Methods: A mixed methods study involving a computer-assisted telephone interview and a web-based survey was undertaken from 2019 to 2020 to explore expectations and anticipated outcomes of Lifeline’s crisis support services in a nationally representative community sample (n=1300) and a Lifeline help-seeker sample (n=553). Participants were aged between 18 and 93 years. Quantitative descriptive analysis, binary logistic regression models, and qualitative thematic analysis were conducted to address the research objectives. Results: One-third of the community and help-seeker participants did not support the collection of information about service users through technology and automation (ie, via AI), and approximately half of the participants reported that they would be less likely to use the service if automation was introduced. Significant demographic differences were observed between the community and help-seeker samples. Of the demographics, only older age predicted being less likely to endorse technology and automation to tailor Lifeline’s crisis support service and use such services (odds ratio 1.48-1.66, 99% CI 1.03-2.38; P<.001 to P=.005). The most common reason for reluctance, reported by both samples, was that respondents wanted to speak to a real person, assuming that human counselors would be replaced by automated robots or machine services. Conclusions: Although Lifeline plans to always have a real person providing crisis support, help-seekers automatically fear this will not be the case if new technology and automation such as AI are introduced. Consequently, incorporating innovative use of technology to improve help-seeker outcomes in such services will require careful messaging and assurance that the human connection will continue. %M 35930334 %R 10.2196/34514 %U https://humanfactors.jmir.org/2022/3/e34514 %U https://doi.org/10.2196/34514 %U http://www.ncbi.nlm.nih.gov/pubmed/35930334 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 8 %P e37486 %T Improving the Performance of Outcome Prediction for Inpatients With Acute Myocardial Infarction Based on Embedding Representation Learned From Electronic Medical Records: Development and Validation Study %A Huang,Yanqun %A Zheng,Zhimin %A Ma,Moxuan %A Xin,Xin %A Liu,Honglei %A Fei,Xiaolu %A Wei,Lan %A Chen,Hui %+ School of Biomedical Engineering, Capital Medical University, No. 10, Xitoutiao, You An Men, Fengtai District, Beijing, 100069, China, 86 01083911545, chenhui@ccmu.edu.cn %K representation learning %K skip-gram %K feature association strengths %K feature importance %K mortality risk prediction %K acute myocardial infarction %D 2022 %7 3.8.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: The widespread secondary use of electronic medical records (EMRs) promotes health care quality improvement. Representation learning that can automatically extract hidden information from EMR data has gained increasing attention. Objective: We aimed to propose a patient representation with more feature associations and task-specific feature importance to improve the outcome prediction performance for inpatients with acute myocardial infarction (AMI). Methods: Medical concepts, including patients’ age, gender, disease diagnoses, laboratory tests, structured radiological features, procedures, and medications, were first embedded into real-value vectors using the improved skip-gram algorithm, where concepts in the context windows were selected by feature association strengths measured by association rule confidence. Then, each patient was represented as the sum of the feature embeddings weighted by the task-specific feature importance, which was applied to facilitate predictive model prediction from global and local perspectives. We finally applied the proposed patient representation into mortality risk prediction for 3010 and 1671 AMI inpatients from a public data set and a private data set, respectively, and compared it with several reference representation methods in terms of the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), and F1-score. Results: Compared with the reference methods, the proposed embedding-based representation showed consistently superior predictive performance on the 2 data sets, achieving mean AUROCs of 0.878 and 0.973, AUPRCs of 0.220 and 0.505, and F1-scores of 0.376 and 0.674 for the public and private data sets, respectively, while the greatest AUROCs, AUPRCs, and F1-scores among the reference methods were 0.847 and 0.939, 0.196 and 0.283, and 0.344 and 0.361 for the public and private data sets, respectively. Feature importance integrated in patient representation reflected features that were also critical in prediction tasks and clinical practice. Conclusions: The introduction of feature associations and feature importance facilitated an effective patient representation and contributed to prediction performance improvement and model interpretation. %M 35921141 %R 10.2196/37486 %U https://www.jmir.org/2022/8/e37486 %U https://doi.org/10.2196/37486 %U http://www.ncbi.nlm.nih.gov/pubmed/35921141 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 8 %P e34126 %T A Questionnaire-Based Ensemble Learning Model to Predict the Diagnosis of Vertigo: Model Development and Validation Study %A Yu,Fangzhou %A Wu,Peixia %A Deng,Haowen %A Wu,Jingfang %A Sun,Shan %A Yu,Huiqian %A Yang,Jianming %A Luo,Xianyang %A He,Jing %A Ma,Xiulan %A Wen,Junxiong %A Qiu,Danhong %A Nie,Guohui %A Liu,Rizhao %A Hu,Guohua %A Chen,Tao %A Zhang,Cheng %A Li,Huawei %+ Department of Otorhinolaryngology, Eye & ENT Hospital, Fudan University, Room 611, Building 9, No. 83, Fenyang Road, Xuhui District, Shanghai, 200031, China, 86 021 64377134 ext 2669, hwli@shmu.edu.cn %K vestibular disorders %K machine learning %K diagnostic model %K vertigo %K ENT %K questionnaire %D 2022 %7 3.8.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Questionnaires have been used in the past 2 decades to predict the diagnosis of vertigo and assist clinical decision-making. A questionnaire-based machine learning model is expected to improve the efficiency of diagnosis of vestibular disorders. Objective: This study aims to develop and validate a questionnaire-based machine learning model that predicts the diagnosis of vertigo. Methods: In this multicenter prospective study, patients presenting with vertigo entered a consecutive cohort at their first visit to the ENT and vertigo clinics of 7 tertiary referral centers from August 2019 to March 2021, with a follow-up period of 2 months. All participants completed a diagnostic questionnaire after eligibility screening. Patients who received only 1 final diagnosis by their treating specialists for their primary complaint were included in model development and validation. The data of patients enrolled before February 1, 2021 were used for modeling and cross-validation, while patients enrolled afterward entered external validation. Results: A total of 1693 patients were enrolled, with a response rate of 96.2% (1693/1760). The median age was 51 (IQR 38-61) years, with 991 (58.5%) females; 1041 (61.5%) patients received the final diagnosis during the study period. Among them, 928 (54.8%) patients were included in model development and validation, and 113 (6.7%) patients who enrolled later were used as a test set for external validation. They were classified into 5 diagnostic categories. We compared 9 candidate machine learning methods, and the recalibrated model of light gradient boosting machine achieved the best performance, with an area under the curve of 0.937 (95% CI 0.917-0.962) in cross-validation and 0.954 (95% CI 0.944-0.967) in external validation. Conclusions: The questionnaire-based light gradient boosting machine was able to predict common vestibular disorders and assist decision-making in ENT and vertigo clinics. Further studies with a larger sample size and the participation of neurologists will help assess the generalization and robustness of this machine learning method. %M 35921135 %R 10.2196/34126 %U https://www.jmir.org/2022/8/e34126 %U https://doi.org/10.2196/34126 %U http://www.ncbi.nlm.nih.gov/pubmed/35921135 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 8 %P e39288 %T Effectiveness of Artificial Intelligence–Assisted Decision-making to Improve Vulnerable Women’s Participation in Cervical Cancer Screening in France: Protocol for a Cluster Randomized Controlled Trial (AppDate-You) %A Selmouni,Farida %A Guy,Marine %A Muwonge,Richard %A Nassiri,Abdelhak %A Lucas,Eric %A Basu,Partha %A Sauvaget,Catherine %+ Early Detection, Prevention & Infections Branch, International Agency for Research on Cancer, 150, Cours Albert Thomas, Lyon, 69372, France, 33 0472738499, selmounif@iarc.fr %K cervical cancer %K screening %K chatbot %K decision aid %K artificial intelligence %K cluster randomized controlled trial %D 2022 %7 2.8.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: The French organized population-based cervical cancer screening (CCS) program transitioned from a cytology-based to a human papillomavirus (HPV)–based screening strategy in August 2020. HPV testing is offered every 5 years, starting at the age of 30 years. In the new program, women are invited to undergo an HPV test at a gynecologist’s, primary care physician’s, or midwife’s office, a private clinic or health center, family planning center, or hospital. HPV self-sampling (HPVss) was also made available as an additional approach. However, French studies reported that less than 20% of noncompliant women performed vaginal self-sampling when a kit was sent to their home. Women with lower income and educational levels participate less in CCS. Lack of information about the disease and the benefits of CCS were reported as one of the major barriers among noncompliant women. This barrier could be addressed by overcoming disparities in HPV- and cervical cancer–related knowledge and perceptions about CCS. Objective: This study aimed to assess the effectiveness of a chatbot-based decision aid to improve women’s participation in the HPVss detection-based CCS care pathway. Methods: AppDate-You is a 2-arm cluster randomized controlled trial (cRCT) nested within the French organized CCS program. Eligible women are those aged 30-65 years who have not been screened for CC for more than 4 years and live in the disadvantaged clusters in the Occitanie Region, France. In total, 32 clusters will be allocated to the intervention and control arms, 16 in each arm (approximately 4000 women). Eligible women living in randomly selected disadvantaged clusters will be identified using the Regional Cancer Screening Coordinating Centre of Occitanie (CRCDC-OC) database. Women in the experimental group will receive screening reminder letters and HPVss kits, combined with access to a chatbot-based decision aid tailored to women with lower education attainment. Women in the control group will receive the reminder letters and HPVss kits (standard of care). The CRCDC-OC database will be used to check trial progress and assess the intervention’s impact. The trial has 2 primary outcomes: (1) the proportion of screening participation within 12 months among women recalled for CCS and (2) the proportion of HPVss-positive women who are “well-managed” as stipulated in the French guidelines. Results: To date, the AppDate-You study group is preparing and developing the chatbot-based decision aid (intervention). The cRCT will be conducted once the decision aid has been completed and validated. Recruitment of women is expected to begin in January 2023. Conclusions: This study is the first to evaluate the impact of a chatbot-based decision aid to promote the CCS program and increase its performance. The study results will inform policy makers and health professionals as well as the research community. Trial Registration: ClinicalTrials.gov NCT05286034; https://clinicaltrials.gov/ct2/show/NCT05286034 International Registered Report Identifier (IRRID): PRR1-10.2196/39288 %M 35771872 %R 10.2196/39288 %U https://www.researchprotocols.org/2022/8/e39288 %U https://doi.org/10.2196/39288 %U http://www.ncbi.nlm.nih.gov/pubmed/35771872 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 8 %P e35396 %T Diagnosis of Atrial Fibrillation Using Machine Learning With Wearable Devices After Cardiac Surgery: Algorithm Development Study %A Hiraoka,Daisuke %A Inui,Tomohiko %A Kawakami,Eiryo %A Oya,Megumi %A Tsuji,Ayumu %A Honma,Koya %A Kawasaki,Yohei %A Ozawa,Yoshihito %A Shiko,Yuki %A Ueda,Hideki %A Kohno,Hiroki %A Matsuura,Kaoru %A Watanabe,Michiko %A Yakita,Yasunori %A Matsumiya,Goro %+ Department of Cardiovascular Surgery, University of Chiba, 1-8-1 Inohana, Chuo-ku, Chiba, 260-8677, Japan, 81 08041941178, d.the.lion.hearted@gmail.com %K wearable device %K atrial fibrillation %K photoplethysmography %K cardiology %K heart %K mHealth %K mobile health %K pulse %K development %K pilot study %K Apple Watch %K sensor %K algorithm %K detection %K diagnose %K cardiac surgery %K machine learning %D 2022 %7 1.8.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Some attempts have been made to detect atrial fibrillation (AF) with a wearable device equipped with photoelectric volumetric pulse wave technology, and it is expected to be applied under real clinical conditions. Objective: This study is the second part of a 2-phase study aimed at developing a method for immediate detection of paroxysmal AF, using a wearable device with built-in photoplethysmography (PPG). The objective of this study is to develop an algorithm to immediately diagnose AF by an Apple Watch equipped with a PPG sensor that is worn by patients undergoing cardiac surgery and to use machine learning on the pulse data output from the device. Methods: A total of 80 patients who underwent cardiac surgery at a single institution between June 2020 and March 2021 were monitored for postoperative AF, using a telemetry-monitored electrocardiogram (ECG) and an Apple Watch. AF was diagnosed by qualified physicians from telemetry-monitored ECGs and 12-lead ECGs; a diagnostic algorithm was developed using machine learning on the pulse rate data output from the Apple Watch. Results: One of the 80 patients was excluded from the analysis due to redness caused by wearing the Apple Watch. Of 79 patients, 27 (34.2%) developed AF, and 199 events of AF including brief AF were observed. Of them, 18 events of AF lasting longer than 1 hour were observed, and cross-correlation analysis showed that pulse rate measured by Apple Watch was strongly correlated (cross-correlation functions [CCF]: 0.6-0.8) with 8 events and very strongly correlated (CCF>0.8) with 3 events. The diagnostic accuracy by machine learning was 0.9416 (sensitivity 0.909 and specificity 0.838 at the point closest to the top left) for the area under the receiver operating characteristic curve. Conclusions: We were able to safely monitor pulse rate in patients who wore an Apple Watch after cardiac surgery. Although the pulse rate measured by the PPG sensor does not follow the heart rate recorded by telemetry-monitored ECGs in some parts, which may reduce the accuracy of AF diagnosis by machine learning, we have shown the possibility of clinical application of using only the pulse rate collected by the PPG sensor for the early detection of AF. %M 35916709 %R 10.2196/35396 %U https://formative.jmir.org/2022/8/e35396 %U https://doi.org/10.2196/35396 %U http://www.ncbi.nlm.nih.gov/pubmed/35916709 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 8 %N 3 %P e35893 %T Providing Care Beyond Therapy Sessions With a Natural Language Processing–Based Recommender System That Identifies Cancer Patients Who Experience Psychosocial Challenges and Provides Self-care Support: Pilot Study %A Leung,Yvonne W %A Park,Bomi %A Heo,Rachel %A Adikari,Achini %A Chackochan,Suja %A Wong,Jiahui %A Alie,Elyse %A Gancarz,Mathew %A Kacala,Martyna %A Hirst,Graeme %A de Silva,Daswin %A French,Leon %A Bender,Jacqueline %A Mishna,Faye %A Gratzer,David %A Alahakoon,Damminda %A Esplen,Mary Jane %+ de Souza Institute, University Health Network, 222 St Patrick St Office 503, Toronto, ON, M5T 1V4, Canada, 1 844 758 6891, yvonne.leung@desouzainstitute.com %K artificial intelligence %K natural language processing %K online support groups %K supportive care in cancer %K recommender system %D 2022 %7 29.7.2022 %9 Original Paper %J JMIR Cancer %G English %X Background: The negative psychosocial impacts of cancer diagnoses and treatments are well documented. Virtual care has become an essential mode of care delivery during the COVID-19 pandemic, and online support groups (OSGs) have been shown to improve accessibility to psychosocial and supportive care. de Souza Institute offers CancerChatCanada, a therapist-led OSG service where sessions are monitored by an artificial intelligence–based co-facilitator (AICF). The AICF is equipped with a recommender system that uses natural language processing to tailor online resources to patients according to their psychosocial needs. Objective: We aimed to outline the development protocol and evaluate the AICF on its precision and recall in recommending resources to cancer OSG members. Methods: Human input informed the design and evaluation of the AICF on its ability to (1) appropriately identify keywords indicating a psychosocial concern and (2) recommend the most appropriate online resource to the OSG member expressing each concern. Three rounds of human evaluation and algorithm improvement were performed iteratively. Results: We evaluated 7190 outputs and achieved a precision of 0.797, a recall of 0.981, and an F1 score of 0.880 by the third round of evaluation. Resources were recommended to 48 patients, and 25 (52%) accessed at least one resource. Of those who accessed the resources, 19 (75%) found them useful. Conclusions: The preliminary findings suggest that the AICF can help provide tailored support for cancer OSG members with high precision, recall, and satisfaction. The AICF has undergone rigorous human evaluation, and the results provide much-needed evidence, while outlining potential strengths and weaknesses for future applications in supportive care. %M 35904877 %R 10.2196/35893 %U https://cancer.jmir.org/2022/3/e35893 %U https://doi.org/10.2196/35893 %U http://www.ncbi.nlm.nih.gov/pubmed/35904877 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 7 %P e37928 %T Development of an Interoperable and Easily Transferable Clinical Decision Support System Deployment Platform: System Design and Development Study %A Yoo,Junsang %A Lee,Jeonghoon %A Min,Ji Young %A Choi,Sae Won %A Kwon,Joon-myoung %A Cho,Insook %A Lim,Chiyeon %A Choi,Mi Young %A Cha,Won Chul %+ Department of Emergency Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-ro, Gangnam-gu, Seoul, 06351, Republic of Korea, 82 3410 2053, wc.cha@samsung.com %K clinical decision support system %K decision making %K decision aid %K decision support %K common data model %K model %K development %K electronic health record %K medical record %K EHR %K EMR %K Fast Healthcare Interoperability Resource %K interoperability %K machine learning %K clinical decision %K health technology %K algorithm %K intelligent algorithm network %K modeling %D 2022 %7 27.7.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: A clinical decision support system (CDSS) is recognized as a technology that enhances clinical efficacy and safety. However, its full potential has not been realized, mainly due to clinical data standards and noninteroperable platforms. Objective: In this paper, we introduce the common data model–based intelligent algorithm network environment (CANE) platform that supports the implementation and deployment of a CDSS. Methods: CDSS reasoning engines, usually represented as R or Python objects, are deployed into the CANE platform and converted into C# objects. When a clinician requests CANE-based decision support in the electronic health record (EHR) system, patients’ information is transformed into Health Level 7 Fast Healthcare Interoperability Resources (FHIR) format and transmitted to the CANE server inside the hospital firewall. Upon receiving the necessary data, the CANE system’s modules perform the following tasks: (1) the preprocessing module converts the FHIRs into the input data required by the specific reasoning engine, (2) the reasoning engine module operates the target algorithms, (3) the integration module communicates with the other institutions’ CANE systems to request and transmit a summary report to aid in decision support, and (4) creates a user interface by integrating the summary report and the results calculated by the reasoning engine. Results: We developed a CANE system such that any algorithm implemented in the system can be directly called through the RESTful application programming interface when it is integrated with an EHR system. Eight algorithms were developed and deployed in the CANE system. Using a knowledge-based algorithm, physicians can screen patients who are prone to sepsis and obtain treatment guides for patients with sepsis with the CANE system. Further, using a nonknowledge-based algorithm, the CANE system supports emergency physicians’ clinical decisions about optimum resource allocation by predicting a patient’s acuity and prognosis during triage. Conclusions: We successfully developed a common data model–based platform that adheres to medical informatics standards and could aid artificial intelligence model deployment using R or Python. %M 35896020 %R 10.2196/37928 %U https://www.jmir.org/2022/7/e37928 %U https://doi.org/10.2196/37928 %U http://www.ncbi.nlm.nih.gov/pubmed/35896020 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 5 %N 3 %P e39335 %T Advance Planning for Technology Use in Dementia Care: Development, Design, and Feasibility of a Novel Self-administered Decision-Making Tool %A Berridge,Clara %A Turner,Natalie R %A Liu,Liu %A Karras,Sierramatice W %A Chen,Amy %A Fredriksen-Goldsen,Karen %A Demiris,George %+ School of Social Work, University of Washington, 4101 15th Ave NE, Seattle, WA, 98105, United States, 1 (206) 543 5640, clarawb@uw.edu %K Alzheimer disease %K advance care planning %K dyadic intervention %K technology %K remote monitoring %K artificial intelligence %K older adult %K seniors %K human-computer interaction %K aging %K elderly population %K digital tool %K educational tool %K dementia care %K ethics %K informed consent %D 2022 %7 27.7.2022 %9 Original Paper %J JMIR Aging %G English %X Background: Monitoring technologies are used to collect a range of information, such as one’s location out of the home or movement within the home, and transmit that information to caregivers to support aging in place. Their surveilling nature, however, poses ethical dilemmas and can be experienced as intrusive to people living with Alzheimer disease (AD) and AD-related dementias. These challenges are compounded when older adults are not engaged in decision-making about how they are monitored. Dissemination of these technologies is outpacing our understanding of how to communicate their functions, risks, and benefits to families and older adults. To date, there are no tools to help families understand the functions of monitoring technologies or guide them in balancing their perceived need for ongoing surveillance and the older adult’s dignity and wishes. Objective: We designed, developed, and piloted a communication and education tool in the form of a web application called Let’s Talk Tech to support family decision-making about diverse technologies used in dementia home care. The knowledge base about how to design online interventions for people living with mild dementia is still in development, and dyadic interventions used in dementia care remain rare. We describe the intervention’s motivation and development process, and the feasibility of using this self-administered web application intervention in a pilot sample of people living with mild AD and their family care partners. Methods: We surveyed 29 mild AD dementia care dyads living together before and after they completed the web application intervention and interviewed each dyad about their experiences with it. We report postintervention measures of feasibility (recruitment, enrollment, and retention) and acceptability (satisfaction, quality, and usability). Descriptive statistics were calculated for survey items, and thematic analysis was used with interview transcripts to illuminate participants’ experiences and recommendations to improve the intervention. Results: The study enrolled 33 people living with AD and their care partners, and 29 (88%) dyads completed the study (all but one were spousal dyads). Participants were asked to complete 4 technology modules, and all completed them. The majority of participants rated the tool as having the right length (>90%), having the right amount of information (>84%), being very clearly worded (>74%), and presenting information in a balanced way (>90%). Most felt the tool was easy to use and helpful, and would likely recommend it to others. Conclusions: This study demonstrated that our intervention to educate and facilitate conversation and documentation of preferences is preliminarily feasible and acceptable to mild AD care dyads. Effectively involving older adults in these decisions and informing care partners of their preferences could enable families to avoid conflicts or risks associated with uninformed or disempowered use and to personalize use so both members of the dyad can experience benefits. %M 35896014 %R 10.2196/39335 %U https://aging.jmir.org/2022/3/e39335 %U https://doi.org/10.2196/39335 %U http://www.ncbi.nlm.nih.gov/pubmed/35896014 %0 Journal Article %@ 2817-1705 %I JMIR Publications %V 1 %N 1 %P e37508 %T Artificial Intelligence–Assisted Diagnosis of Anterior Cruciate Ligament Tears From Magnetic Resonance Images: Algorithm Development and Validation Study %A Chen,Kun-Hui %A Yang,Chih-Yu %A Wang,Hsin-Yi %A Ma,Hsiao-Li %A Lee,Oscar Kuang-Sheng %+ Institute of Clinical Medicine, National Yang Ming Chiao Tung University, No 155, Sec 2, Linong Street, Taipei, 11221, Taiwan, 886 2 28757391, oscarlee9203@gmail.com %K artificial intelligence %K convolutional neural network %K magnetic resonance imaging %K MRI %K deep learning %K anterior cruciate ligament %K sports medicine %K machine learning %K ligament %K sport %K diagnosis %K tear %K damage %K imaging %K development %K validation %K algorithm %D 2022 %7 26.7.2022 %9 Original Paper %J J AI %G English %X Background: Anterior cruciate ligament (ACL) injuries are common in sports and are critical knee injuries that require prompt diagnosis. Magnetic resonance imaging (MRI) is a strong, noninvasive tool for detecting ACL tears, which requires training to read accurately. Clinicians with different experiences in reading MR images require different information for the diagnosis of ACL tears. Artificial intelligence (AI) image processing could be a promising approach in the diagnosis of ACL tears. Objective: This study sought to use AI to (1) diagnose ACL tears from complete MR images, (2) identify torn-ACL images from complete MR images with a diagnosis of ACL tears, and (3) differentiate intact-ACL and torn-ACL MR images from the selected MR images. Methods: The sagittal MR images of torn ACL (n=1205) and intact ACL (n=1018) from 800 cases and the complete knee MR images of 200 cases (100 torn ACL and 100 intact ACL) from patients aged 20-40 years were retrospectively collected. An AI approach using a convolutional neural network was applied to build models for the objective. The MR images of 200 independent cases (100 torn ACL and 100 intact ACL) were used as the test set for the models. The MR images of 40 randomly selected cases from the test set were used to compare the reading accuracy of ACL tears between the trained model and clinicians with different levels of experience. Results: The first model differentiated between torn-ACL, intact-ACL, and other images from complete MR images with an accuracy of 0.9946, and the sensitivity, specificity, precision, and F1-score were 0.9344, 0.9743, 0.8659, and 0.8980, respectively. The final accuracy for ACL-tear diagnosis was 0.96. The model showed a significantly higher reading accuracy than less experienced clinicians. The second model identified torn-ACL images from complete MR images with a diagnosis of ACL tear with an accuracy of 0.9943, and the sensitivity, specificity, precision, and F1-score were 0.9154, 0.9660, 0.8167, and 0.8632, respectively. The third model differentiated torn- and intact-ACL images with an accuracy of 0.9691, and the sensitivity, specificity, precision, and F1-score were 0.9827, 0.9519, 0.9632, and 0.9728, respectively. Conclusions: This study demonstrates the feasibility of using an AI approach to provide information to clinicians who need different information from MRI to diagnose ACL tears. %M 38875555 %R 10.2196/37508 %U https://ai.jmir.org/2022/1/e37508 %U https://doi.org/10.2196/37508 %U http://www.ncbi.nlm.nih.gov/pubmed/38875555 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 7 %P e33717 %T Developing, Implementing, and Evaluating an Artificial Intelligence–Guided Mental Health Resource Navigation Chatbot for Health Care Workers and Their Families During and Following the COVID-19 Pandemic: Protocol for a Cross-sectional Study %A Noble,Jasmine M %A Zamani,Ali %A Gharaat,MohamadAli %A Merrick,Dylan %A Maeda,Nathanial %A Lambe Foster,Alex %A Nikolaidis,Isabella %A Goud,Rachel %A Stroulia,Eleni %A Agyapong,Vincent I O %A Greenshaw,Andrew J %A Lambert,Simon %A Gallson,Dave %A Porter,Ken %A Turner,Debbie %A Zaiane,Osmar %+ Department of Computing Science, University of Alberta, ATH 443 (Athabasca Hall), Edmonton, AB, T6G 2E8, Canada, 1 780 492 2860, zaiane@ualberta.ca %K eHealth %K chatbot %K conversational agent %K health system navigation %K electronic health care %K mobile phone %D 2022 %7 25.7.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: Approximately 1 in 3 Canadians will experience an addiction or mental health challenge at some point in their lifetime. Unfortunately, there are multiple barriers to accessing mental health care, including system fragmentation, episodic care, long wait times, and insufficient support for health system navigation. In addition, stigma may further reduce an individual’s likelihood of seeking support. Digital technologies present new and exciting opportunities to bridge significant gaps in mental health care service provision, reduce barriers pertaining to stigma, and improve health outcomes for patients and mental health system integration and efficiency. Chatbots (ie, software systems that use artificial intelligence to carry out conversations with people) may be explored to support those in need of information or access to services and present the opportunity to address gaps in traditional, fragmented, or episodic mental health system structures on demand with personalized attention. The recent COVID-19 pandemic has exacerbated even further the need for mental health support among Canadians and called attention to the inefficiencies of our system. As health care workers and their families are at an even greater risk of mental illness and psychological distress during the COVID-19 pandemic, this technology will be first piloted with the goal of supporting this vulnerable group. Objective: This pilot study seeks to evaluate the effectiveness of the Mental Health Intelligent Information Resource Assistant in supporting health care workers and their families in the Canadian provinces of Alberta and Nova Scotia with the provision of appropriate information on mental health issues, services, and programs based on personalized needs. Methods: The effectiveness of the technology will be assessed via voluntary follow-up surveys and an analysis of client interactions and engagement with the chatbot. Client satisfaction with the chatbot will also be assessed. Results: This project was initiated on April 1, 2021. Ethics approval was granted on August 12, 2021, by the University of Alberta Health Research Board (PRO00109148) and on April 21, 2022, by the Nova Scotia Health Authority Research Ethics Board (1027474). Data collection is anticipated to take place from May 2, 2022, to May 2, 2023. Publication of preliminary results will be sought in spring or summer 2022, with a more comprehensive evaluation completed by spring 2023 following the collection of a larger data set. Conclusions: Our findings can be incorporated into public policy and planning around mental health system navigation by Canadian mental health care providers—from large public health authorities to small community-based, not-for-profit organizations. This may serve to support the development of an additional touch point, or point of entry, for individuals to access the appropriate services or care when they need them, wherever they are. International Registered Report Identifier (IRRID): PRR1-10.2196/33717 %M 35877158 %R 10.2196/33717 %U https://www.researchprotocols.org/2022/7/e33717 %U https://doi.org/10.2196/33717 %U http://www.ncbi.nlm.nih.gov/pubmed/35877158 %0 Journal Article %@ 2371-4379 %I JMIR Publications %V 7 %N 3 %P e34699 %T Type 1 Diabetes Hypoglycemia Prediction Algorithms: Systematic Review %A Tsichlaki,Stella %A Koumakis,Lefteris %A Tsiknakis,Manolis %+ Department of Electrical & Computer Engineering, Hellenic Mediterranean University, Gianni Kornarou, Estavromenos 1, Heraklion, 71004, Greece, 30 6945231917, stsichlaki@gmail.com %K type 1 diabetes %K hypoglycemia %K predictive models %K continuous glucose monitoring %K heart rate variability %K artificial intelligence %D 2022 %7 21.7.2022 %9 Review %J JMIR Diabetes %G English %X Background: Diabetes is a chronic condition that necessitates regular monitoring and self-management of the patient’s blood glucose levels. People with type 1 diabetes (T1D) can live a productive life if they receive proper diabetes care. Nonetheless, a loose glycemic control might increase the risk of developing hypoglycemia. This incident can occur because of a variety of causes, such as taking additional doses of insulin, skipping meals, or overexercising. Mainly, the symptoms of hypoglycemia range from mild dysphoria to more severe conditions, if not detected in a timely manner. Objective: In this review, we aimed to report on innovative detection techniques and tactics for identifying and preventing hypoglycemic episodes, focusing on T1D. Methods: A systematic literature search following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines was performed focusing on the PubMed, GoogleScholar, IEEEXplore, and ACM Digital Library to find articles on technologies related to hypoglycemia detection in patients with T1D. Results: The presented approaches have been used or devised to enhance blood glucose monitoring and boost its efficacy in forecasting future glucose levels, which could aid the prediction of future episodes of hypoglycemia. We detected 19 predictive models for hypoglycemia, specifically on T1D, using a wide range of algorithmic methodologies, spanning from statistics (1.9/19, 10%) to machine learning (9.88/19, 52%) and deep learning (7.22/19, 38%). The algorithms used most were the Kalman filtering and classification models (support vector machine, k-nearest neighbors, and random forests). The performance of the predictive models was found to be satisfactory overall, reaching accuracies between 70% and 99%, which proves that such technologies are capable of facilitating the prediction of T1D hypoglycemia. Conclusions: It is evident that continuous glucose monitoring can improve glucose control in diabetes; however, predictive models for hypo- and hyperglycemia using only mainstream noninvasive sensors such as wristbands and smartwatches are foreseen to be the next step for mobile health in T1D. Prospective studies are required to demonstrate the value of such models in real-life mobile health interventions. %M 35862181 %R 10.2196/34699 %U https://diabetes.jmir.org/2022/3/e34699 %U https://doi.org/10.2196/34699 %U http://www.ncbi.nlm.nih.gov/pubmed/35862181 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 7 %P e35197 %T Exploring an Artificial Intelligence–Based, Gamified Phone App Prototype to Track and Improve Food Choices of Adolescent Girls in Vietnam: Acceptability, Usability, and Likeability Study %A C Braga,Bianca %A Nguyen,Phuong H %A Aberman,Noora-Lisa %A Doyle,Frank %A Folson,Gloria %A Hoang,Nga %A Huynh,Phuong %A Koch,Bastien %A McCloskey,Peter %A Tran,Lan %A Hughes,David %A Gelli,Aulo %+ Friedman School of Nutrition Science and Policy, Tufts University, 150 Harrison Ave, Boston, MA, 02111, United States, 1 6176363777, curi.bianca@tufts.edu %K adolescent %K dietary quality %K food choice %K gamification %K low- and middle-income country %K smartphone app %K mobile phone %D 2022 %7 21.7.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Adolescents’ consumption of healthy foods is suboptimal in low- and middle-income countries. Adolescents’ fondness for games and social media and the increasing access to smartphones make apps suitable for collecting dietary data and influencing their food choices. Little is known about how adolescents use phones to track and shape their food choices. Objective: This study aimed to examine the acceptability, usability, and likability of a mobile phone app prototype developed to collect dietary data using artificial intelligence–based image recognition of foods, provide feedback, and motivate users to make healthier food choices. The findings were used to improve the design of the app. Methods: A total of 4 focus group discussions (n=32 girls, aged 15-17 years) were conducted in Vietnam. Qualitative data were collected and analyzed by grouping ideas into common themes based on content analysis and ground theory. Results: Adolescents accepted most of the individual- and team-based dietary goals presented in the app prototype to help them make healthier food choices. They deemed the overall app wireframes, interface, and graphic design as acceptable, likable, and usable but suggested the following modifications: tailored feedback based on users’ medical history, anthropometric characteristics, and fitness goals; new language on dietary goals; provision of information about each of the food group dietary goals; wider camera frame to fit the whole family food tray, as meals are shared in Vietnam; possibility of digitally separating food consumption on shared meals; and more appealing graphic design, including unique badge designs for each food group. Participants also liked the app’s feedback on food choices in the form of badges, notifications, and statistics. A new version of the app was designed incorporating adolescent’s feedback to improve its acceptability, usability, and likability. Conclusions: A phone app prototype designed to track food choice and help adolescent girls from low- and middle-income countries make healthier food choices was found to be acceptable, likable, and usable. Further research is needed to examine the feasibility of using this technology at scale. %M 35862147 %R 10.2196/35197 %U https://formative.jmir.org/2022/7/e35197 %U https://doi.org/10.2196/35197 %U http://www.ncbi.nlm.nih.gov/pubmed/35862147 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 7 %P e37576 %T Feasibility and Impact of Integrating an Artificial Intelligence–Based Diagnosis Aid for Autism Into the Extension for Community Health Outcomes Autism Primary Care Model: Protocol for a Prospective Observational Study %A Sohl,Kristin %A Kilian,Rachel %A Brewer Curran,Alicia %A Mahurin,Melissa %A Nanclares-Nogués,Valeria %A Liu-Mayo,Stuart %A Salomon,Carmela %A Shannon,Jennifer %A Taraman,Sharief %+ Cognoa, Inc, 2185 Park Blvd, Palo Alto, CA, 94306, United States, 1 6507852624, sharief@cognoa.com %K autism spectrum disorder %K diagnosis %K artificial intelligence %K primary care %K machine learning %K Software as a Medical Device %K mobile phone %D 2022 %7 19.7.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: The Extension for Community Health Outcomes (ECHO) Autism Program trains clinicians to screen, diagnose, and care for children with autism spectrum disorder (ASD) in primary care settings. This study will assess the feasibility and impact of integrating an artificial intelligence (AI)–based ASD diagnosis aid (the device) into the existing ECHO Autism Screening Tool for Autism in Toddlers and Young Children (STAT) diagnosis model. The prescription-only Software as a Medical Device, designed for use in children aged 18 to 72 months at risk for developmental delay, produces ASD diagnostic recommendations after analyzing behavioral features from 3 distinct inputs: a caregiver questionnaire, 2 short home videos analyzed by trained video analysts, and a health care provider questionnaire. The device is not a stand-alone diagnostic and should be used in conjunction with clinical judgment. Objective: This study aims to assess the feasibility and impact of integrating an AI-based ASD diagnosis aid into the ECHO Autism STAT diagnosis model. The time from initial ECHO Autism clinician concern to ASD diagnosis is the primary end point. Secondary end points include the time from initial caregiver concern to ASD diagnosis, time from diagnosis to treatment initiation, and clinician and caregiver experience of device use as part of the ASD diagnostic journey. Methods: Research participants for this prospective observational study will be patients suspected of having ASD (aged 18-72 months) and their caregivers and up to 15 trained ECHO Autism clinicians recruited by the ECHO Autism Communities research team from across rural and suburban areas of the United States. Clinicians will provide routine clinical care and conduct best practice ECHO Autism diagnostic evaluations in addition to prescribing the device. Outcome data will be collected via a combination of electronic questionnaires, reviews of standard clinical care records, and analysis of device outputs. The expected study duration is no more than 12 months. The study was approved by the institutional review board of the University of Missouri-Columbia (institutional review board–assigned project number 2075722). Results: Participant recruitment began in April 2022. As of June 2022, a total of 41 participants have been enrolled. Conclusions: This prospective observational study will be the first to evaluate the use of a novel AI-based ASD diagnosis aid as part of a real-world primary care diagnostic pathway. If device integration into primary care proves feasible and efficacious, prolonged delays between the first ASD concern and eventual diagnosis may be reduced. Streamlining primary care ASD diagnosis could potentially reduce the strain on specialty services and allow a greater proportion of children to commence early intervention during a critical neurodevelopmental window. Trial Registration: ClinicalTrials.gov NCT05223374; https://clinicaltrials.gov/ct2/show/NCT05223374 International Registered Report Identifier (IRRID): PRR1-10.2196/37576 %M 35852831 %R 10.2196/37576 %U https://www.researchprotocols.org/2022/7/e37576 %U https://doi.org/10.2196/37576 %U http://www.ncbi.nlm.nih.gov/pubmed/35852831 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 7 %P e36176 %T Machine Learning Prediction of Hypoglycemia and Hyperglycemia From Electronic Health Records: Algorithm Development and Validation %A Witte,Harald %A Nakas,Christos %A Bally,Lia %A Leichtle,Alexander Benedikt %+ University Institute of Clinical Chemistry, Inselspital - Bern University Hospital and University of Bern, Freiburgstrasse 10, Bern, 3010, Switzerland, 41 316328330, Alexander.Leichtle@insel.ch %K diabetes %K blood glucose decompensation %K multiclass prediction model %K dysglycemia %K hyperglycemia %K hypoglycemia %D 2022 %7 18.7.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Acute blood glucose (BG) decompensations (hypoglycemia and hyperglycemia) represent a frequent and significant risk for inpatients and adversely affect patient outcomes and safety. The increasing need for BG management in inpatients poses a high demand on clinical staff and health care systems in addition. Objective: This study aimed to generate a broadly applicable multiclass classification model for predicting BG decompensation events from patients’ electronic health records to indicate where adjustments in patient monitoring and therapeutic interventions are required. This should allow for taking proactive measures before BG levels are derailed. Methods: A retrospective cohort study was conducted on patients who were hospitalized at a tertiary hospital in Bern, Switzerland. Using patient details and routine data from electronic health records, a multiclass prediction model for BG decompensation events (<3.9 mmol/L [hypoglycemia] or >10, >13.9, or >16.7 mmol/L [representing different degrees of hyperglycemia]) was generated based on a second-level ensemble of gradient-boosted binary trees. Results: A total of 63,579 hospital admissions of 38,250 patients were included in this study. The multiclass prediction model reached specificities of 93.7%, 98.9%, and 93.9% and sensitivities of 67.1%, 59%, and 63.6% for the main categories of interest, which were nondecompensated cases, hypoglycemia, or hyperglycemia, respectively. The median prediction horizon was 7 hours and 4 hours for hypoglycemia and hyperglycemia, respectively. Conclusions: Electronic health records have the potential to reliably predict all types of BG decompensation. Readily available patient details and routine laboratory data can support the decisions for proactive interventions and thus help to reduce the detrimental health effects of hypoglycemia and hyperglycemia. %M 35526139 %R 10.2196/36176 %U https://formative.jmir.org/2022/7/e36176 %U https://doi.org/10.2196/36176 %U http://www.ncbi.nlm.nih.gov/pubmed/35526139 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 7 %P e37233 %T Drug Recommendation System for Diabetes Using a Collaborative Filtering and Clustering Approach: Development and Performance Evaluation %A Granda Morales,Luis Fernando %A Valdiviezo-Diaz,Priscila %A Reátegui,Ruth %A Barba-Guaman,Luis %+ Departamento de Ciencias de la Computación y Electrónica, Universidad Técnica Particular de Loja, San Cayetano Alto, Loja, 1101608, Ecuador, 593 7 3701444 ext 2325, pmvaldiviezo@utpl.edu.ec %K clustering %K collaborative filtering %K diabetes %K recommender system %K recommend %K drug %K chronic disease %K patient information %K data mining %K machine learning %D 2022 %7 15.7.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Diabetes is a public health problem worldwide. Although diabetes is a chronic and incurable disease, measures and treatments can be taken to control it and keep the patient stable. Diabetes has been the subject of extensive research, ranging from disease prevention to the use of technologies for its diagnosis and control. Health institutions obtain information required for the diagnosis of diabetes through various tests, and appropriate treatment is provided according to the diagnosis. These institutions have databases with large volumes of information that can be analyzed and used in different applications such as pattern discovery and outcome prediction, which can help health personnel in making decisions about treatments or determining the appropriate prescriptions for diabetes management. Objective: The aim of this study was to develop a drug recommendation system for patients with diabetes based on collaborative filtering and clustering techniques as a complement to the treatments given by the treating doctor. Methods: The data set used contains information from patients with diabetes available in the University of California Irvine Machine Learning Repository. Data mining techniques were applied for processing and analysis of the data set. Unsupervised learning techniques were used for dimensionality reduction and patient clustering. Drug predictions were obtained with a user-based collaborative filtering approach, which enabled creating a patient profile that can be compared with the profiles of other patients with similar characteristics. Finally, recommendations were made considering the identified patient groups. The performance of the system was evaluated using metrics to assess the quality of the groups and the quality of the predictions and recommendations. Results: Principal component analysis to reduce the dimensionality of the data showed that eight components best explained the variability of the data. We identified six groups of patients using the clustering algorithm, which were evenly distributed. These groups were identified based on the available information of patients with diabetes, and then the variation between groups was examined to predict a suitable medication for a target patient. The recommender system achieved good results in the quality of predictions with a mean squared error metric of 0.51 and accuracy in the quality of recommendations of 0.61, which is acceptable. Conclusions: This work presents a recommendation system that suggests medications according to drug information and the characteristics of patients with diabetes. Some aspects related to this disease were analyzed based on the data set used from patients with diabetes. The experimental results with clustering and prediction techniques were found to be acceptable for the recommendation process. This system can provide a novel perspective for health institutions that require technologies to support health care personnel in the management of diabetes treatment and control. %M 35838763 %R 10.2196/37233 %U https://www.jmir.org/2022/7/e37233 %U https://doi.org/10.2196/37233 %U http://www.ncbi.nlm.nih.gov/pubmed/35838763 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 8 %N 7 %P e34583 %T Linguistic Pattern–Infused Dual-Channel Bidirectional Long Short-term Memory With Attention for Dengue Case Summary Generation From the Program for Monitoring Emerging Diseases–Mail Database: Algorithm Development Study %A Chang,Yung-Chun %A Chiu,Yu-Wen %A Chuang,Ting-Wu %+ Department of Molecular Parasitology and Tropical Diseases, School of Medicine, College of Medicine, Taipei Medical University, No 250, Wu-Hsing Street, Taipei, 110, Taiwan, 886 27361661 ext 3123, chtingwu@tmu.edu.tw %K ProMED-mail %K natural language processing %K dengue %K dual channel %K bidirectional long short-term memory %D 2022 %7 13.7.2022 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Globalization and environmental changes have intensified the emergence or re-emergence of infectious diseases worldwide, such as outbreaks of dengue fever in Southeast Asia. Collaboration on region-wide infectious disease surveillance systems is therefore critical but difficult to achieve because of the different transparency levels of health information systems in different countries. Although the Program for Monitoring Emerging Diseases (ProMED)–mail is the most comprehensive international expert–curated platform providing rich disease outbreak information on humans, animals, and plants, the unstructured text content of the reports makes analysis for further application difficult. Objective: To make monitoring the epidemic situation in Southeast Asia more efficient, this study aims to develop an automatic summary of the alert articles from ProMED-mail, a huge textual data source. In this paper, we proposed a text summarization method that uses natural language processing technology to automatically extract important sentences from alert articles in ProMED-mail emails to generate summaries. Using our method, we can quickly capture crucial information to help make important decisions regarding epidemic surveillance. Methods: Our data, which span a period from 1994 to 2019, come from the ProMED-mail website. We analyzed the collected data to establish a unique Taiwan dengue corpus that was validated with professionals’ annotations to achieve almost perfect agreement (Cohen κ=90%). To generate a ProMED-mail summary, we developed a dual-channel bidirectional long short-term memory with attention mechanism with infused latent syntactic features to identify key sentences from the alerting article. Results: Our method is superior to many well-known machine learning and neural network approaches in identifying important sentences, achieving a macroaverage F1 score of 93%. Moreover, it can successfully extract the relevant correct information on dengue fever from a ProMED-mail alerting article, which can help researchers or general users to quickly understand the essence of the alerting article at first glance. In addition to verifying the model, we also recruited 3 professional experts and 2 students from related fields to participate in a satisfaction survey on the generated summaries, and the results show that 84% (63/75) of the summaries received high satisfaction ratings. Conclusions: The proposed approach successfully fuses latent syntactic features into a deep neural network to analyze the syntactic, semantic, and contextual information in the text. It then exploits the derived information to identify crucial sentences in the ProMED-mail alerting article. The experiment results show that the proposed method is not only effective but also outperforms the compared methods. Our approach also demonstrates the potential for case summary generation from ProMED-mail alerting articles. In terms of practical application, when a new alerting article arrives, our method can quickly identify the relevant case information, which is the most critical part, to use as a reference or for further analysis. %M 35830225 %R 10.2196/34583 %U https://publichealth.jmir.org/2022/7/e34583 %U https://doi.org/10.2196/34583 %U http://www.ncbi.nlm.nih.gov/pubmed/35830225 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 7 %P e37744 %T Using Machine Learning to Efficiently Vaccinate Homebound Patients Against COVID-19: A Real-time Immunization Campaign %A Kumar,Anish %A Ren,Jennifer %A Ornstein,Katherine A %A Gliatto,Peter M %+ Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY, 10029, United States, 1 9085668399, anish.kumar@icahn.mssm.edu %K home care %K covid %K vaccination %K COVID-19 %K machine learning %K vaccine %K geographic cluster %K patient data %K clustering algorithm %K geospatial %K digital surveillance %K public health %K logistic operation %K electronic health record integration %D 2022 %7 12.7.2022 %9 Research Letter %J J Med Internet Res %G English %X %M 35679053 %R 10.2196/37744 %U https://www.jmir.org/2022/7/e37744 %U https://doi.org/10.2196/37744 %U http://www.ncbi.nlm.nih.gov/pubmed/35679053 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 7 %P e36490 %T A Review of Artificial Intelligence Applications in Hematology Management: Current Practices and Future Prospects %A El Alaoui,Yousra %A Elomri,Adel %A Qaraqe,Marwa %A Padmanabhan,Regina %A Yasin Taha,Ruba %A El Omri,Halima %A EL Omri,Abdelfatteh %A Aboumarzouk,Omar %+ College of Science and Engineering, Hamad Bin Khalifa University, Education City, Qatar Foundation, Gate 8, Ar-Rayyan, Doha, PO Box 34110, Qatar, 974 44458536, aelomri@hbku.edu.qa %K cancer %K oncology %K hematology %K machine learning %K deep learning %K artificial intelligence %K prediction %K malignancy %K management %D 2022 %7 12.7.2022 %9 Review %J J Med Internet Res %G English %X Background: Machine learning (ML) and deep learning (DL) methods have recently garnered a great deal of attention in the field of cancer research by making a noticeable contribution to the growth of predictive medicine and modern oncological practices. Considerable focus has been particularly directed toward hematologic malignancies because of the complexity in detecting early symptoms. Many patients with blood cancer do not get properly diagnosed until their cancer has reached an advanced stage with limited treatment prospects. Hence, the state-of-the-art revolves around the latest artificial intelligence (AI) applications in hematology management. Objective: This comprehensive review provides an in-depth analysis of the current AI practices in the field of hematology. Our objective is to explore the ML and DL applications in blood cancer research, with a special focus on the type of hematologic malignancies and the patient’s cancer stage to determine future research directions in blood cancer. Methods: We searched a set of recognized databases (Scopus, Springer, and Web of Science) using a selected number of keywords. We included studies written in English and published between 2015 and 2021. For each study, we identified the ML and DL techniques used and highlighted the performance of each model. Results: Using the aforementioned inclusion criteria, the search resulted in 567 papers, of which 144 were selected for review. Conclusions: The current literature suggests that the application of AI in the field of hematology has generated impressive results in the screening, diagnosis, and treatment stages. Nevertheless, optimizing the patient’s pathway to treatment requires a prior prediction of the malignancy based on the patient’s symptoms or blood records, which is an area that has still not been properly investigated. %M 35819826 %R 10.2196/36490 %U https://www.jmir.org/2022/7/e36490 %U https://doi.org/10.2196/36490 %U http://www.ncbi.nlm.nih.gov/pubmed/35819826 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 7 %P e38584 %T Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation %A Jiang,Chao %A Ngo,Victoria %A Chapman,Richard %A Yu,Yue %A Liu,Hongfang %A Jiang,Guoqian %A Zong,Nansu %+ Department of Artificial Intelligence and Informatics Research, Mayo Clinic, 200 First St SW, Rochester, MN, 55905, United States, 1 507 284 2511, Zong.Nansu@mayo.edu %K adversarial generative network %K knowledge graph %K deep denoising %K machine learning %K COVID-19 %K biomedical %K neural network %K network model %K training data %D 2022 %7 6.7.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Multiple types of biomedical associations of knowledge graphs, including COVID-19–related ones, are constructed based on co-occurring biomedical entities retrieved from recent literature. However, the applications derived from these raw graphs (eg, association predictions among genes, drugs, and diseases) have a high probability of false-positive predictions as co-occurrences in the literature do not always mean there is a true biomedical association between two entities. Objective: Data quality plays an important role in training deep neural network models; however, most of the current work in this area has been focused on improving a model’s performance with the assumption that the preprocessed data are clean. Here, we studied how to remove noise from raw knowledge graphs with limited labeled information. Methods: The proposed framework used generative-based deep neural networks to generate a graph that can distinguish the unknown associations in the raw training graph. Two generative adversarial network models, NetGAN and Cross-Entropy Low-rank Logits (CELL), were adopted for the edge classification (ie, link prediction), leveraging unlabeled link information based on a real knowledge graph built from LitCovid and Pubtator. Results: The performance of link prediction, especially in the extreme case of training data versus test data at a ratio of 1:9, demonstrated that the proposed method still achieved favorable results (area under the receiver operating characteristic curve >0.8 for the synthetic data set and 0.7 for the real data set), despite the limited amount of testing data available. Conclusions: Our preliminary findings showed the proposed framework achieved promising results for removing noise during data preprocessing of the biomedical knowledge graph, potentially improving the performance of downstream applications by providing cleaner data. %M 35658098 %R 10.2196/38584 %U https://www.jmir.org/2022/7/e38584 %U https://doi.org/10.2196/38584 %U http://www.ncbi.nlm.nih.gov/pubmed/35658098 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 6 %P e36151 %T Identifying Medication-Related Intents From a Bidirectional Text Messaging Platform for Hypertension Management Using an Unsupervised Learning Approach: Retrospective Observational Pilot Study %A Davoudi,Anahita %A Lee,Natalie S %A Luong,ThaiBinh %A Delaney,Timothy %A Asch,Elizabeth %A Chaiyachati,Krisda %A Mowery,Danielle %+ Department of Biostatistics, Epidemiology, and Informatics, Institute for Biomedical Informatics, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104-6021, United States, 1 2157466677, dlmowery@pennmedicine.upenn.edu %K chatbots %K secure messaging systems %K unsupervised learning %K latent Dirichlet allocation %K natural language processing %D 2022 %7 29.6.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Free-text communication between patients and providers plays an increasing role in chronic disease management, through platforms varying from traditional health care portals to novel mobile messaging apps. These text data are rich resources for clinical purposes, but their sheer volume render them difficult to manage. Even automated approaches, such as natural language processing, require labor-intensive manual classification for developing training data sets. Automated approaches to organizing free-text data are necessary to facilitate use of free-text communication for clinical care. Objective: The aim of this study was to apply unsupervised learning approaches to (1) understand the types of topics discussed and (2) learn medication-related intents from messages sent between patients and providers through a bidirectional text messaging system for managing participant blood pressure (BP). Methods: This study was a secondary analysis of deidentified messages from a remote, mobile, text-based employee hypertension management program at an academic institution. We trained a latent Dirichlet allocation (LDA) model for each message type (ie, inbound patient messages and outbound provider messages) and identified the distribution of major topics and significant topics (probability >.20) across message types. Next, we annotated all medication-related messages with a single medication intent. Then, we trained a second medication-specific LDA (medLDA) model to assess how well the unsupervised method could identify more fine-grained medication intents. We encoded each medication message with n-grams (n=1-3 words) using spaCy, clinical named entities using Stanza, and medication categories using MedEx; we then applied chi-square feature selection to learn the most informative features associated with each medication intent. Results: In total, 253 participants and 5 providers engaged in the program, generating 12,131 total messages: 46.90% (n=5689) patient messages and 53.10% (n=6442) provider messages. Most patient messages corresponded to BP reporting, BP encouragement, and appointment scheduling; most provider messages corresponded to BP reporting, medication adherence, and confirmatory statements. Most patient and provider messages contained 1 topic and few contained more than 3 topics identified using LDA. In total, 534 medication messages were annotated with a single medication intent. Of these, 282 (52.8%) were patient medication messages: most referred to the medication request intent (n=134, 47.5%). Most of the 252 (47.2%) provider medication messages referred to the medication question intent (n=173, 68.7%). Although the medLDA model could identify a majority intent within each topic, it could not distinguish medication intents with low prevalence within patient or provider messages. Richer feature engineering identified informative lexical-semantic patterns associated with each medication intent class. Conclusions: LDA can be an effective method for generating subgroups of messages with similar term usage and facilitating the review of topics to inform annotations. However, few training cases and shared vocabulary between intents precludes the use of LDA for fully automated, deep, medication intent classification. International Registered Report Identifier (IRRID): RR2-10.1101/2021.12.23.21268061 %M 35767327 %R 10.2196/36151 %U https://www.jmir.org/2022/6/e36151 %U https://doi.org/10.2196/36151 %U http://www.ncbi.nlm.nih.gov/pubmed/35767327 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 5 %N 2 %P e38896 %T The Effect of Cognitive Function Health Care Using Artificial Intelligence Robots for Older Adults: Systematic Review and Meta-analysis %A Lee,Hocheol %A Chung,Min Ah %A Kim,Hyeji %A Nam,Eun Woo %+ Department of Health Administration, College of Software and Digital Healthcare Convergence, Yonsei University, Unit 412, Chang-jo gwan, 1 Yonseidae-gil, Wonju, 26493, Republic of Korea, 82 33 760 2413, ewnam@yonsei.ac.kr %K older adult population %K older adults %K cognition %K cognitive function %K artificial intelligence %K socially assistive robots %K AI SAR %K social prescription %K dementia %K social support %K aging %K caregiver %K caregiving %K meta-analysis %K review %K Cochrane collaboration %K assistive robot %K assistive technology %D 2022 %7 28.6.2022 %9 Review %J JMIR Aging %G English %X Background: With rapidly aging populations in most parts of the world, it is only natural that the need for caregivers for older adults is going to increase in the near future. Therefore, most technologically proficient countries are in the process of using artificial intelligence (AI) to build socially assistive robots (SAR) to play the role of caregivers in enhancing interaction and social participation among older adults. Objective: This study aimed to examine the effect of intervention through AI SAR on the cognitive function of older adults through a systematic literature review. Methods: We conducted a meta-analysis of the various existing studies on the effect of AI SAR on the cognitive function of older adults to standardize the results and clarify the effect of each method and indicator. Cochrane collaboration and the systematic literature review flow of PRISMA (Preferred Reporting Item Systematic Reviews and Meta-Analyses) were used on original, peer-reviewed studies published from January 2010 to March 2022. The search words were derived by combining keywords including Population, Intervention, and Outcome—according to the Population, Intervention, Comparison, Outcome, Time, Setting, and Study Design principle—for the question “What is the effect of AI SAR on the cognitive function of older adults in comparison with a control group?” (Population: adults aged ≥65 years; Intervention: AI SAR; Comparison: comparison group; Outcome: popular function; and Study Design: prospective study). For any study, if one condition among subjects, intervention, comparison, or study design was different from those indicated, the study was excluded from the literature review. Results: In total, 9 studies were selected (6 randomized controlled trials and 3 quasi-experimental design studies) for the meta-analysis. Publication bias was examined using the contour-enhanced funnel plot method to confirm the reliability and validity of the 9 studies. The meta-analysis revealed that the average effect size of AI SAR was shown to be Hedges g=0.43 (95% CI –0.04 to 0.90), indicating that AI SAR are effective in reducing the Mini Mental State Examination scale, which reflects cognitive function. Conclusions: The 9 studies that were analyzed used SAR in the form of animals, robots, and humans. Among them, AI SAR in anthropomorphic form were able to improve cognitive function more effectively. The development and expansion of AI SAR programs to various functions including health notification, play therapy, counseling service, conversation, and dementia prevention programs are expected to improve the quality of care for older adults and prevent the overload of caregivers. AI SAR can be considered a representative, digital, and social prescription program and a nonpharmacological intervention program that communicates with older adults 24 hours a day. Despite its effectiveness, ethical issues, the digital literacy needs of older adults, social awareness and reliability, and technological advancement pose challenges in implementing AI SAR. Future research should include bigger sample sizes, pre-post studies, as well as studies using an older adult control group. %M 35672268 %R 10.2196/38896 %U https://aging.jmir.org/2022/2/e38896 %U https://doi.org/10.2196/38896 %U http://www.ncbi.nlm.nih.gov/pubmed/35672268 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 6 %P e37028 %T The AI Will See You Now: Feasibility and Acceptability of a Conversational AI Medical Interviewing System %A Hong,Grace %A Smith,Margaret %A Lin,Steven %+ Stanford Healthcare AI Applied Research Team, Division of Primary Care and Population Health, Stanford University School of Medicine, 585 Broadway, Suite 800, Redwood City, CA, 94063, United States, 1 847 800 1377, hongrace@stanford.edu %K artificial intelligence %K feasibility studies %K patient acceptance of health care %K diagnostic errors %K patient-generated health data %K clinical %K medical history %K healthcare %K health care %D 2022 %7 27.6.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Primary care physicians (PCPs) are often limited in their ability to collect detailed medical histories from patients, which can lead to errors or delays in diagnosis. Recent advances in artificial intelligence (AI) show promise in augmenting current human-driven methods of collecting personal and family histories; however, such tools are largely unproven. Objective: The main aim of this pilot study was to evaluate the feasibility and acceptability of a conversational AI medical interviewing system among patients. Methods: The study was conducted among adult patients empaneled at a family medicine clinic within a large academic medical center in Northern California. Participants were asked to test an AI medical interviewing system, which uses a conversational avatar and chatbot to capture medical histories and identify patients with risk factors. After completing an interview with the AI system, participants completed a web-based survey inquiring about the performance of the system, the ease of using the system, and attitudes toward the system. Responses on a 7-point Likert scale were collected and evaluated using descriptive statistics. Results: A total of 20 patients with a mean age of 50 years completed an interview with the AI system, including 12 females (60%) and 8 males (40%); 11 were White (55%), 8 were Asian (40%), and 1 was Black (5%), and 19 had at least a bachelor’s degree (95%). Most participants agreed that using the system to collect histories could help their PCPs have a better understanding of their health (16/20, 80%) and help them stay healthy through identification of their health risks (14/20, 70%). Those who reported that the system was clear and understandable, and that they were able to learn it quickly, tended to be younger; those who reported that the tool could motivate them to share more comprehensive histories with their PCPs tended to be older. Conclusions: In this feasibility and acceptability pilot of a conversational AI medical interviewing system, the majority of patients believed that it could help clinicians better understand their health and identify health risks; however, patients were split on the effort required to use the system, and whether AI should be used for medical interviewing. Our findings suggest areas for further research, such as understanding the user interface factors that influence ease of use and adoption, and the reasons behind patients’ attitudes toward AI-assisted history-taking. %M 35759326 %R 10.2196/37028 %U https://formative.jmir.org/2022/6/e37028 %U https://doi.org/10.2196/37028 %U http://www.ncbi.nlm.nih.gov/pubmed/35759326 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 6 %P e37209 %T Triage Errors in Primary and Pre–Primary Care %A Nguyen,Hai %A Meczner,Andras %A Burslam-Dawe,Krista %A Hayhoe,Benedict %+ Your.MD Ltd, 5th Floor Lincoln House, 296-302 High Holborn, London, WC1V 7JH, United Kingdom, 44 7847349099, hai@livehealthily.com %K triage errors %K pre-primary care %K digital symptom checker %K primary care %K viewpoint %K triage %K symptom checker %K emergency care %D 2022 %7 24.6.2022 %9 Viewpoint %J J Med Internet Res %G English %X Triage errors are a major concern in health care due to resulting harmful delays in treatments or inappropriate allocation of resources. With the increasing popularity of digital symptom checkers in pre–primary care settings, and amid claims that artificial intelligence outperforms doctors, the accuracy of triage by digital symptom checkers is ever more scrutinized. This paper examines the context and challenges of triage in primary care, pre–primary care, and emergency care, as well as reviews existing evidence on the prevalence of triage errors in all three settings. Implications for development, research, and practice are highlighted, and recommendations are made on how digital symptom checkers should be best positioned. %M 35749166 %R 10.2196/37209 %U https://www.jmir.org/2022/6/e37209 %U https://doi.org/10.2196/37209 %U http://www.ncbi.nlm.nih.gov/pubmed/35749166 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 6 %P e37004 %T Exploring Longitudinal Cough, Breath, and Voice Data for COVID-19 Progression Prediction via Sequential Deep Learning: Model Development and Validation %A Dang,Ting %A Han,Jing %A Xia,Tong %A Spathis,Dimitris %A Bondareva,Erika %A Siegele-Brown,Chloë %A Chauhan,Jagmohan %A Grammenos,Andreas %A Hasthanasombat,Apinan %A Floto,R Andres %A Cicuta,Pietro %A Mascolo,Cecilia %+ Department of Computer Science and Technology, University of Cambridge, 15 JJ Thomson Ave, Cambridge, CB3 0FD, United Kingdom, 44 7895587796, td464@cam.ac.uk %K COVID-19 %K audio %K COVID-19 progression %K deep learning %K mobile health %K longitudinal study %D 2022 %7 21.6.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Recent work has shown the potential of using audio data (eg, cough, breathing, and voice) in the screening for COVID-19. However, these approaches only focus on one-off detection and detect the infection, given the current audio sample, but do not monitor disease progression in COVID-19. Limited exploration has been put forward to continuously monitor COVID-19 progression, especially recovery, through longitudinal audio data. Tracking disease progression characteristics and patterns of recovery could bring insights and lead to more timely treatment or treatment adjustment, as well as better resource management in health care systems. Objective: The primary objective of this study is to explore the potential of longitudinal audio samples over time for COVID-19 progression prediction and, especially, recovery trend prediction using sequential deep learning techniques. Methods: Crowdsourced respiratory audio data, including breathing, cough, and voice samples, from 212 individuals over 5-385 days were analyzed, alongside their self-reported COVID-19 test results. We developed and validated a deep learning–enabled tracking tool using gated recurrent units (GRUs) to detect COVID-19 progression by exploring the audio dynamics of the individuals’ historical audio biomarkers. The investigation comprised 2 parts: (1) COVID-19 detection in terms of positive and negative (healthy) tests using sequential audio signals, which was primarily assessed in terms of the area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity, with 95% CIs, and (2) longitudinal disease progression prediction over time in terms of probability of positive tests, which was evaluated using the correlation between the predicted probability trajectory and self-reported labels. Results: We first explored the benefits of capturing longitudinal dynamics of audio biomarkers for COVID-19 detection. The strong performance, yielding an AUROC of 0.79, a sensitivity of 0.75, and a specificity of 0.71 supported the effectiveness of the approach compared to methods that do not leverage longitudinal dynamics. We further examined the predicted disease progression trajectory, which displayed high consistency with longitudinal test results with a correlation of 0.75 in the test cohort and 0.86 in a subset of the test cohort with 12 (57.1%) of 21 COVID-19–positive participants who reported disease recovery. Our findings suggest that monitoring COVID-19 evolution via longitudinal audio data has potential in the tracking of individuals’ disease progression and recovery. Conclusions: An audio-based COVID-19 progression monitoring system was developed using deep learning techniques, with strong performance showing high consistency between the predicted trajectory and the test results over time, especially for recovery trend predictions. This has good potential in the postpeak and postpandemic era that can help guide medical treatment and optimize hospital resource allocations. The changes in longitudinal audio samples, referred to as audio dynamics, are associated with COVID-19 progression; thus, modeling the audio dynamics can potentially capture the underlying disease progression process and further aid COVID-19 progression prediction. This framework provides a flexible, affordable, and timely tool for COVID-19 tracking, and more importantly, it also provides a proof of concept of how telemonitoring could be applicable to respiratory diseases monitoring, in general. %M 35653606 %R 10.2196/37004 %U https://www.jmir.org/2022/6/e37004 %U https://doi.org/10.2196/37004 %U http://www.ncbi.nlm.nih.gov/pubmed/35653606 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 9 %N 2 %P e35421 %T Toward an Ecologically Valid Conceptual Framework for the Use of Artificial Intelligence in Clinical Settings: Need for Systems Thinking, Accountability, Decision-making, Trust, and Patient Safety Considerations in Safeguarding the Technology and Clinicians %A Choudhury,Avishek %+ Industrial and Management Systems Engineering, Benjamin M Statler College of Engineering and Mineral Resources, West Virginia University, 1306 Evansdale Drive, PO Box 6107, Morgantown, WV, 26506-6107, United States, 1 5156080777, avishek.choudhury@mail.wvu.edu %K health care %K artificial intelligence %K ecological validity %K trust in AI %K clinical workload %K patient safety %K AI accountability %K reliability %D 2022 %7 21.6.2022 %9 Viewpoint %J JMIR Hum Factors %G English %X The health care management and the medical practitioner literature lack a descriptive conceptual framework for understanding the dynamic and complex interactions between clinicians and artificial intelligence (AI) systems. As most of the existing literature has been investigating AI’s performance and effectiveness from a statistical (analytical) standpoint, there is a lack of studies ensuring AI’s ecological validity. In this study, we derived a framework that focuses explicitly on the interaction between AI and clinicians. The proposed framework builds upon well-established human factors models such as the technology acceptance model and expectancy theory. The framework can be used to perform quantitative and qualitative analyses (mixed methods) to capture how clinician-AI interactions may vary based on human factors such as expectancy, workload, trust, cognitive variables related to absorptive capacity and bounded rationality, and concerns for patient safety. If leveraged, the proposed framework can help to identify factors influencing clinicians’ intention to use AI and, consequently, improve AI acceptance and address the lack of AI accountability while safeguarding the patients, clinicians, and AI technology. Overall, this paper discusses the concepts, propositions, and assumptions of the multidisciplinary decision-making literature, constituting a sociocognitive approach that extends the theories of distributed cognition and, thus, will account for the ecological validity of AI. %M 35727615 %R 10.2196/35421 %U https://humanfactors.jmir.org/2022/2/e35421 %U https://doi.org/10.2196/35421 %U http://www.ncbi.nlm.nih.gov/pubmed/35727615 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 6 %P e33368 %T The Drivers of Acceptance of Artificial Intelligence–Powered Care Pathways Among Medical Professionals: Web-Based Survey Study %A Cornelissen,Lisa %A Egher,Claudia %A van Beek,Vincent %A Williamson,Latoya %A Hommes,Daniel %+ Faculty of Science, Athena Institute, Vrije Universiteit Amsterdam, Boelelaan 1105, Amsterdam, 1081HV, Netherlands, 31 655320046, lisa.cornelissen@dearhealth.com %K technology acceptance %K artificial intelligence %K health care providers %K machine learning %K technology adoption %K health innovation %K user adoption %D 2022 %7 21.6.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: The emergence of Artificial Intelligence (AI) has been proven beneficial in several health care areas. Nevertheless, the uptake of AI in health care delivery remains poor. Despite the fact that the acceptance of AI-based technologies among medical professionals is a key barrier to their implementation, knowledge about what informs such attitudes is scarce. Objective: The aim of this study was to identify and examine factors that influence the acceptability of AI-based technologies among medical professionals. Methods: A survey was developed based on the Unified Theory of Acceptance and Use of Technology model, which was extended by adding the predictor variables perceived trust, anxiety and innovativeness, and the moderator profession. The web-based survey was completed by 67 medical professionals in the Netherlands. The data were analyzed by performing a multiple linear regression analysis followed by a moderating analysis using the Hayes PROCESS macro (SPSS; version 26.0, IBM Corp). Results: Multiple linear regression showed that the model explained 75.4% of the variance in the acceptance of AI-powered care pathways (adjusted R2=0.754; F9,0=22.548; P<.001). The variables medical performance expectancy (β=.465; P<.001), effort expectancy (β=–.215; P=.005), perceived trust (β=.221; P=.007), nonmedical performance expectancy (β=.172; P=.08), facilitating conditions (β=–.160; P=.005), and professional identity (β=.156; P=.06) were identified as significant predictors of acceptance. Social influence of patients (β=.042; P=.63), anxiety (β=.021; P=.84), and innovativeness (β=.078; P=.30) were not identified as significant predictors. A moderating effect by gender was found between the relationship of facilitating conditions and acceptance (β=–.406; P=.09). Conclusions: Medical performance expectancy was the most significant predictor of AI-powered care pathway acceptance among medical professionals. Nonmedical performance expectancy, effort expectancy, perceived trust, and professional identity were also found to significantly influence the acceptance of AI-powered care pathways. These factors should be addressed for successful implementation of AI-powered care pathways in health care delivery. The study was limited to medical professionals in the Netherlands, where uptake of AI technologies is still in an early stage. Follow-up multinational studies should further explore the predictors of acceptance of AI-powered care pathways over time, in different geographies, and with bigger samples. %M 35727614 %R 10.2196/33368 %U https://formative.jmir.org/2022/6/e33368 %U https://doi.org/10.2196/33368 %U http://www.ncbi.nlm.nih.gov/pubmed/35727614 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 6 %P e33637 %T Applying the Health Belief Model to Characterize Racial/Ethnic Differences in Digital Conversations Related to Depression Pre- and Mid-COVID-19: Descriptive Analysis %A Castilla-Puentes,Ruby %A Pesa,Jacqueline %A Brethenoux,Caroline %A Furey,Patrick %A Gil Valletta,Liliana %A Falcone,Tatiana %+ Center for Public Health Practice, Drexel University, 530 S 2nd st Suite 743, Philadelphia, PA, 19147, United States, 1 6108642528, rcastil4@its.jnj.com %K depression %K COVID-19 %K treatment %K race/ethnicity %K digital conversations %K health belief model %K artificial intelligence %K natural language processing %D 2022 %7 20.6.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: The prevalence of depression in the United States is >3 times higher mid-COVID-19 versus prepandemic. Racial/ethnic differences in mindsets around depression and the potential impact of the COVID-19 pandemic are not well characterized. Objective: This study aims to describe attitudes, mindsets, key drivers, and barriers related to depression pre- and mid-COVID-19 by race/ethnicity using digital conversations about depression mapped to health belief model (HBM) concepts. Methods: Advanced search, data extraction, and artificial intelligence–powered tools were used to harvest, mine, and structure open-source digital conversations of US adults who engaged in conversations about depression pre- (February 1, 2019-February 29, 2020) and mid-COVID-19 pandemic (March 1, 2020-November 1, 2020) across the internet. Natural language processing, text analytics, and social data mining were used to categorize conversations that included a self-identifier into racial/ethnic groups. Conversations were mapped to HBM concepts (ie, perceived susceptibility, perceived severity, perceived benefits, perceived barriers, cues to action, and self-efficacy). Results are descriptive in nature. Results: Of 2.9 and 1.3 million relevant digital conversations pre- and mid-COVID-19, race/ethnicity was determined among 1.8 million (62.2%) and 979,000 (75.3%) conversations, respectively. Pre-COVID-19, 1.3 million (72.1%) conversations about depression were analyzed among non-Hispanic Whites (NHW), 227,200 (12.6%) among Black Americans (BA), 189,200 (10.5%) among Hispanics, and 86,800 (4.8%) among Asian Americans (AS). Mid-COVID-19, a total of 736,100 (75.2%) conversations about depression were analyzed among NHW, 131,800 (13.5%) among BA, 78,300 (8.0%) among Hispanics, and 32,800 (3.3%) among AS. Conversations among all racial/ethnic groups had a negative tone, which increased pre- to mid-COVID-19; finding support from others was seen as a benefit among most groups. Hispanics had the highest rate of any racial/ethnic group of conversations showing an avoiding mindset toward their depression. Conversations related to external barriers to seeking treatment (eg, stigma, lack of support, and lack of resources) were generally more prevalent among Hispanics, BA, and AS than among NHW. Being able to benefit others and building a support system were key drivers to seeking help or treatment for all racial/ethnic groups. Conclusions: There were considerable racial/ethnic differences in drivers and barriers to seeking help and treatment for depression pre- and mid-COVID-19. As expected, COVID-19 has made conversations about depression more negative and with frequent discussions of barriers to seeking care. Applying concepts of the HBM to data on digital conversation about depression allowed organization of the most frequent themes by race/ethnicity. Individuals of all groups came online to discuss their depression. These data highlight opportunities for culturally competent and targeted approaches to addressing areas amenable to change that might impact the ability of people to ask for or receive mental health help, such as the constructs that comprise the HBM. %M 35275834 %R 10.2196/33637 %U https://formative.jmir.org/2022/6/e33637 %U https://doi.org/10.2196/33637 %U http://www.ncbi.nlm.nih.gov/pubmed/35275834 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 6 %P e34305 %T Vaccine Adverse Event Mining of Twitter Conversations: 2-Phase Classification Study %A Khademi Habibabadi,Sedigheh %A Delir Haghighi,Pari %A Burstein,Frada %A Buttery,Jim %+ Centre for Health Analytics, Melbourne Children’s Campus, 50 Flemington Rd, Melbourne, 3052, Australia, 61 0383416200, sedigh.khademi@gmail.com %K immunization %K vaccines %K natural language processing %K vaccine adverse effects %K vaccine safety %K social media %K Twitter %K machine learning %D 2022 %7 16.6.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: Traditional monitoring for adverse events following immunization (AEFI) relies on various established reporting systems, where there is inevitable lag between an AEFI occurring and its potential reporting and subsequent processing of reports. AEFI safety signal detection strives to detect AEFI as early as possible, ideally close to real time. Monitoring social media data holds promise as a resource for this. Objective: The primary aim of this study is to investigate the utility of monitoring social media for gaining early insights into vaccine safety issues, by extracting vaccine adverse event mentions (VAEMs) from Twitter, using natural language processing techniques. The secondary aims are to document the natural language processing techniques used and identify the most effective of them for identifying tweets that contain VAEM, with a view to define an approach that might be applicable to other similar social media surveillance tasks. Methods: A VAEM-Mine method was developed that combines topic modeling with classification techniques to extract maximal VAEM posts from a vaccine-related Twitter stream, with high degree of confidence. The approach does not require a targeted search for specific vaccine reaction–indicative words, but instead, identifies VAEM posts according to their language structure. Results: The VAEM-Mine method isolated 8992 VAEMs from 811,010 vaccine-related Twitter posts and achieved an F1 score of 0.91 in the classification phase. Conclusions: Social media can assist with the detection of vaccine safety signals as a valuable complementary source for monitoring mentions of vaccine adverse events. A social media–based VAEM data stream can be assessed for changes to detect possible emerging vaccine safety signals, helping to address the well-recognized limitations of passive reporting systems, including lack of timeliness and underreporting. %M 35708760 %R 10.2196/34305 %U https://medinform.jmir.org/2022/6/e34305 %U https://doi.org/10.2196/34305 %U http://www.ncbi.nlm.nih.gov/pubmed/35708760 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 6 %P e36958 %T Predicting Risk of Hypoglycemia in Patients With Type 2 Diabetes by Electronic Health Record–Based Machine Learning: Development and Validation %A Yang,Hao %A Li,Jiaxi %A Liu,Siru %A Yang,Xiaoling %A Liu,Jialin %+ Information Center, West China Hospital, Sichuan University, No 37 Guoxue Road, Chengdu, 610041, China, 86 28 85422306, dljl8@163.com %K diabetes %K type 2 diabetes %K hypoglycemia %K learning %K machine learning model %K EHR %K electronic health record %K XGBoost %K natural language processing %D 2022 %7 16.6.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: Hypoglycemia is a common adverse event in the treatment of diabetes. To efficiently cope with hypoglycemia, effective hypoglycemia prediction models need to be developed. Objective: The aim of this study was to develop and validate machine learning models to predict the risk of hypoglycemia in adult patients with type 2 diabetes. Methods: We used the electronic health records of all adult patients with type 2 diabetes admitted to West China Hospital between November 2019 and December 2021. The prediction model was developed based on XGBoost and natural language processing. F1 score, area under the receiver operating characteristic curve (AUC), and decision curve analysis (DCA) were used as the main criteria to evaluate model performance. Results: We included 29,843 patients with type 2 diabetes, of whom 2804 patients (9.4%) developed hypoglycemia. In this study, the embedding machine learning model (XGBoost3) showed the best performance among all the models. The AUC and the accuracy of XGBoost are 0.82 and 0.93, respectively. The XGboost3 was also superior to other models in DCA. Conclusions: The Paragraph Vector–Distributed Memory model can effectively extract features and improve the performance of the XGBoost model, which can then effectively predict hypoglycemia in patients with type 2 diabetes. %M 35708754 %R 10.2196/36958 %U https://medinform.jmir.org/2022/6/e36958 %U https://doi.org/10.2196/36958 %U http://www.ncbi.nlm.nih.gov/pubmed/35708754 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 6 %P e34405 %T Development and Validation of Population Clusters for Integrating Health and Social Care: Protocol for a Mixed Methods Study in Multiple Long-Term Conditions (Cluster-Artificial Intelligence for Multiple Long-Term Conditions) %A Dambha-Miller,Hajira %A Simpson,Glenn %A Akyea,Ralph K %A Hounkpatin,Hilda %A Morrison,Leanne %A Gibson,Jon %A Stokes,Jonathan %A Islam,Nazrul %A Chapman,Adriane %A Stuart,Beth %A Zaccardi,Francesco %A Zlatev,Zlatko %A Jones,Karen %A Roderick,Paul %A Boniface,Michael %A Santer,Miriam %A Farmer,Andrew %+ Primary Care Research Centre, Aldermoor Close, Aldermoor, Southampton, SO14 1ST, United Kingdom, 44 7746148820, H.Dambha-Miller@soton.ac.uk %K artificial intelligence %K social care %K multimorbidity %K big data %K protocol %K mixed method %K long-term health %D 2022 %7 16.6.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: Multiple long-term health conditions (multimorbidity) (MLTC-M) are increasingly prevalent and associated with high rates of morbidity, mortality, and health care expenditure. Strategies to address this have primarily focused on the biological aspects of disease, but MLTC-M also result from and are associated with additional psychosocial, economic, and environmental barriers. A shift toward more personalized, holistic, and integrated care could be effective. This could be made more efficient by identifying groups of populations based on their health and social needs. In turn, these will contribute to evidence-based solutions supporting delivery of interventions tailored to address the needs pertinent to each cluster. Evidence is needed on how to generate clusters based on health and social needs and quantify the impact of clusters on long-term health and costs. Objective: We intend to develop and validate population clusters that consider determinants of health and social care needs for people with MLTC-M using data-driven machine learning (ML) methods compared to expert-driven approaches within primary care national databases, followed by evaluation of cluster trajectories and their association with health outcomes and costs. Methods: The mixed methods program of work with parallel work streams include the following: (1) qualitative semistructured interview studies exploring patient, caregiver, and professional views on clinical and socioeconomic factors influencing experiences of living with or seeking care in MLTC-M; (2) modified Delphi with relevant stakeholders to generate variables on health and social (wider) determinants and to examine the feasibility of including these variables within existing primary care databases; and (3) cohort study with expert-driven segmentation, alongside data-driven algorithms. Outputs will be compared, clusters characterized, and trajectories over time examined to quantify associations with mortality, additional long-term conditions, worsening frailty, disease severity, and 10-year health and social care costs. Results: The study will commence in October 2021 and is expected to be completed by October 2023. Conclusions: By studying MLTC-M clusters, we will assess how more personalized care can be developed, how accurate costs can be provided, and how to better understand the personal and medical profiles and environment of individuals within each cluster. Integrated care that considers “whole persons” and their environment is essential in addressing the complex, diverse, and individual needs of people living with MLTC-M. International Registered Report Identifier (IRRID): PRR1-10.2196/34405 %M 35708751 %R 10.2196/34405 %U https://www.researchprotocols.org/2022/6/e34405 %U https://doi.org/10.2196/34405 %U http://www.ncbi.nlm.nih.gov/pubmed/35708751 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 6 %P e37689 %T Identifying the Risk of Sepsis in Patients With Cancer Using Digital Health Care Records: Machine Learning–Based Approach %A Yang,Donghun %A Kim,Jimin %A Yoo,Junsang %A Cha,Won Chul %A Paik,Hyojung %+ Center for Supercomputing Applications, Division of National Supercomputing, Korea Institute of Science and Technology Information, 245 Daehak-ro, Yuseong-Gu, Daejeon, 34141, Republic of Korea, 82 428690791, hyojungpaik@gmail.com %K sepsis %K cancer %K EHR %K machine learning %K deep learning %K mortality rate %K learning model %K electronic health record %K network based analysis %K sepsis risk %K risk model %K prediction model %D 2022 %7 15.6.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: Sepsis is diagnosed in millions of people every year, resulting in a high mortality rate. Although patients with sepsis present multimorbid conditions, including cancer, sepsis predictions have mainly focused on patients with severe injuries. Objective: In this paper, we present a machine learning–based approach to identify the risk of sepsis in patients with cancer using electronic health records (EHRs). Methods: We utilized deidentified anonymized EHRs of 8580 patients with cancer from the Samsung Medical Center in Korea in a longitudinal manner between 2014 and 2019. To build a prediction model based on physical status that would differ between sepsis and nonsepsis patients, we analyzed 2462 laboratory test results and 2266 medication prescriptions using graph network and statistical analyses. The medication relationships and lab test results from each analysis were used as additional learning features to train our predictive model. Results: Patients with sepsis showed differential medication trajectories and physical status. For example, in the network-based analysis, narcotic analgesics were prescribed more often in the sepsis group, along with other drugs. Likewise, 35 types of lab tests, including albumin, globulin, and prothrombin time, showed significantly different distributions between sepsis and nonsepsis patients (P<.001). Our model outperformed the model trained using only common EHRs, showing an improved accuracy, area under the receiver operating characteristic (AUROC), and F1 score by 11.9%, 11.3%, and 13.6%, respectively. For the random forest–based model, the accuracy, AUROC, and F1 score were 0.692, 0.753, and 0.602, respectively. Conclusions: We showed that lab tests and medication relationships can be used as efficient features for predicting sepsis in patients with cancer. Consequently, identifying the risk of sepsis in patients with cancer using EHRs and machine learning is feasible. %M 35704364 %R 10.2196/37689 %U https://medinform.jmir.org/2022/6/e37689 %U https://doi.org/10.2196/37689 %U http://www.ncbi.nlm.nih.gov/pubmed/35704364 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 6 %P e34678 %T Perspective of Information Technology Decision Makers on Factors Influencing Adoption and Implementation of Artificial Intelligence Technologies in 40 German Hospitals: Descriptive Analysis %A Weinert,Lina %A Müller,Julia %A Svensson,Laura %A Heinze,Oliver %+ Institute of Medical Informatics, Heidelberg University Hospital, Im Neuenheimer Feld 130.3, Heidelberg, 69120, Germany, 49 622156 ext 34367, lina.weinert@med.uni-heidelberg.de %K artificial intelligence %K AI readiness %K implementation %K decision-making %K descriptive analysis %K quantitative study %D 2022 %7 15.6.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: New artificial intelligence (AI) tools are being developed at a high speed. However, strategies and practical experiences surrounding the adoption and implementation of AI in health care are lacking. This is likely because of the high implementation complexity of AI, legacy IT infrastructure, and unclear business cases, thus complicating AI adoption. Research has recently started to identify the factors influencing AI readiness of organizations. Objective: This study aimed to investigate the factors influencing AI readiness as well as possible barriers to AI adoption and implementation in German hospitals. We also assessed the status quo regarding the dissemination of AI tools in hospitals. We focused on IT decision makers, a seldom studied but highly relevant group. Methods: We created a web-based survey based on recent AI readiness and implementation literature. Participants were identified through a publicly accessible database and contacted via email or invitational leaflets sent by mail, in some cases accompanied by a telephonic prenotification. The survey responses were analyzed using descriptive statistics. Results: We contacted 609 possible participants, and our database recorded 40 completed surveys. Most participants agreed or rather agreed with the statement that AI would be relevant in the future, both in Germany (37/40, 93%) and in their own hospital (36/40, 90%). Participants were asked whether their hospitals used or planned to use AI technologies. Of the 40 participants, 26 (65%) answered “yes.” Most AI technologies were used or planned for patient care, followed by biomedical research, administration, and logistics and central purchasing. The most important barriers to AI were lack of resources (staff, knowledge, and financial). Relevant possible opportunities for using AI were increase in efficiency owing to time-saving effects, competitive advantages, and increase in quality of care. Most AI tools in use or in planning have been developed with external partners. Conclusions: Few tools have been implemented in routine care, and many hospitals do not use or plan to use AI in the future. This can likely be explained by missing or unclear business cases or the need for a modern IT infrastructure to integrate AI tools in a usable manner. These shortcomings complicate decision-making and resource attribution. As most AI technologies already in use were developed in cooperation with external partners, these relationships should be fostered. IT decision makers should assess their hospitals’ readiness for AI individually with a focus on resources. Further research should continue to monitor the dissemination of AI tools and readiness factors to determine whether improvements can be made over time. This monitoring is especially important with regard to government-supported investments in AI technologies that could alleviate financial burdens. Qualitative studies with hospital IT decision makers should be conducted to further explore the reasons for slow AI. %M 35704378 %R 10.2196/34678 %U https://medinform.jmir.org/2022/6/e34678 %U https://doi.org/10.2196/34678 %U http://www.ncbi.nlm.nih.gov/pubmed/35704378 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 6 %P e37532 %T Emerging Trends and Research Foci in Artificial Intelligence for Retinal Diseases: Bibliometric and Visualization Study %A Zhao,Junqiang %A Lu,Yi %A Qian,Yong %A Luo,Yuxin %A Yang,Weihua %+ The Laboratory of Artificial Intelligence and Bigdata in Ophthalmology, Affiliated Eye Hospital of Nanjing Medical University, No.138 Hanzhong Road, Gulou District, Nanjing, Jiangsu, 210029, China, 86 13867252557, benben0606@139.com %K artificial intelligence %K retinal disease %K data visualization %K bibliometric %K citespace, VOSviewer %K retinal %K eye %K visual impairment %D 2022 %7 14.6.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Patients with retinal diseases may exhibit serious complications that cause severe visual impairment owing to a lack of awareness of retinal diseases and limited medical resources. Understanding how artificial intelligence (AI) is used to make predictions and perform relevant analyses is a very active area of research on retinal diseases. In this study, the relevant Science Citation Index (SCI) literature on the AI of retinal diseases published from 2012 to 2021 was integrated and analyzed. Objective: The aim of this study was to gain insights into the overall application of AI technology to the research of retinal diseases from set time and space dimensions. Methods: Citation data downloaded from the Web of Science Core Collection database for AI in retinal disease publications from January 1, 2012, to December 31, 2021, were considered for this analysis. Information retrieval was analyzed using the online analysis platforms of literature metrology: Bibliometrc, CiteSpace V, and VOSviewer. Results: A total of 197 institutions from 86 countries contributed to relevant publications; China had the largest number and researchers from University College London had the highest H-index. The reference clusters of SCI papers were clustered into 12 categories. “Deep learning” was the cluster with the widest range of cocited references. The burst keywords represented the research frontiers in 2018-2021, which were “eye disease” and “enhancement.” Conclusions: This study provides a systematic analysis method on the literature regarding AI in retinal diseases. Bibliometric analysis enabled obtaining results that were objective and comprehensive. In the future, high-quality retinal image–forming AI technology with strong stability and clinical applicability will continue to be encouraged. %M 35700021 %R 10.2196/37532 %U https://www.jmir.org/2022/6/e37532 %U https://doi.org/10.2196/37532 %U http://www.ncbi.nlm.nih.gov/pubmed/35700021 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 6 %P e36501 %T Acceptance, Barriers, and Facilitators to Implementing Artificial Intelligence–Based Decision Support Systems in Emergency Departments: Quantitative and Qualitative Evaluation %A Fujimori,Ryo %A Liu,Keibun %A Soeno,Shoko %A Naraba,Hiromu %A Ogura,Kentaro %A Hara,Konan %A Sonoo,Tomohiro %A Ogura,Takayuki %A Nakamura,Kensuke %A Goto,Tadahiro %+ Faculty of Medicine, The University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo, 1138655, Japan, 81 08095261328, fujimori0203@gmail.com %K clinical decision support system %K preimplementation %K qualitative %K mixed methods %K artificial intelligence %K emergency medicine %K CDSS %K computerized decision %K computerized decision support system %K AI %K AI-based %K CFIR %K quantitative analysis %D 2022 %7 13.6.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Despite the increasing availability of clinical decision support systems (CDSSs) and rising expectation for CDSSs based on artificial intelligence (AI), little is known about the acceptance of AI-based CDSS by physicians and its barriers and facilitators in emergency care settings. Objective: We aimed to evaluate the acceptance, barriers, and facilitators to implementing AI-based CDSSs in the emergency care setting through the opinions of physicians on our newly developed, real-time AI-based CDSS, which alerts ED physicians by predicting aortic dissection based on numeric and text information from medical charts, by using the Unified Theory of Acceptance and Use of Technology (UTAUT; for quantitative evaluation) and the Consolidated Framework for Implementation Research (CFIR; for qualitative evaluation) frameworks. Methods: This mixed methods study was performed from March to April 2021. Transitional year residents (n=6), emergency medicine residents (n=5), and emergency physicians (n=3) from two community, tertiary care hospitals in Japan were included. We first developed a real-time CDSS for predicting aortic dissection based on numeric and text information from medical charts (eg, chief complaints, medical history, vital signs) with natural language processing. This system was deployed on the internet, and the participants used the system with clinical vignettes of model cases. Participants were then involved in a mixed methods evaluation consisting of a UTAUT-based questionnaire with a 5-point Likert scale (quantitative) and a CFIR-based semistructured interview (qualitative). Cronbach α was calculated as a reliability estimate for UTAUT subconstructs. Interviews were sampled, transcribed, and analyzed using the MaxQDA software. The framework analysis approach was used during the study to determine the relevance of the CFIR constructs. Results: All 14 participants completed the questionnaires and interviews. Quantitative analysis revealed generally positive responses for user acceptance with all scores above the neutral score of 3.0. In addition, the mixed methods analysis identified two significant barriers (System Performance, Compatibility) and two major facilitators (Evidence Strength, Design Quality) for implementation of AI-based CDSSs in emergency care settings. Conclusions: Our mixed methods evaluation based on theoretically grounded frameworks revealed the acceptance, barriers, and facilitators of implementation of AI-based CDSS. Although the concern of system failure and overtrusting of the system could be barriers to implementation, the locality of the system and designing an intuitive user interface could likely facilitate the use of optimal AI-based CDSS. Alleviating and resolving these factors should be key to achieving good user acceptance of AI-based CDSS. %M 35699995 %R 10.2196/36501 %U https://formative.jmir.org/2022/6/e36501 %U https://doi.org/10.2196/36501 %U http://www.ncbi.nlm.nih.gov/pubmed/35699995 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 6 %P e30210 %T Machine Learning–Based Text Analysis to Predict Severely Injured Patients in Emergency Medical Dispatch: Model Development and Validation %A Chin,Kuan-Chen %A Cheng,Yu-Chia %A Sun,Jen-Tang %A Ou,Chih-Yen %A Hu,Chun-Hua %A Tsai,Ming-Chi %A Ma,Matthew Huei-Ming %A Chiang,Wen-Chu %A Chen,Albert Y %+ Department of Civil Engineering, National Taiwan University, No 1, Section 4, Roosevelt Rd, Taipei City, 106, Taiwan, 886 2 3366 4255, AlbertChen@ntu.edu.tw %K emergency medical service %K emergency medical dispatch %K dispatcher %K trauma %K machine learning %K frequency–inverse document frequency %K Bernoulli naïve Bayes %D 2022 %7 10.6.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Early recognition of severely injured patients in prehospital settings is of paramount importance for timely treatment and transportation of patients to further treatment facilities. The dispatching accuracy has seldom been addressed in previous studies. Objective: In this study, we aimed to build a machine learning–based model through text mining of emergency calls for the automated identification of severely injured patients after a road accident. Methods: Audio recordings of road accidents in Taipei City, Taiwan, in 2018 were obtained and randomly sampled. Data on call transfers or non-Mandarin speeches were excluded. To predict cases of severe trauma identified on-site by emergency medical technicians, all included cases were evaluated by both humans (6 dispatchers) and a machine learning model, that is, a prehospital-activated major trauma (PAMT) model. The PAMT model was developed using term frequency–inverse document frequency, rule-based classification, and a Bernoulli naïve Bayes classifier. Repeated random subsampling cross-validation was applied to evaluate the robustness of the model. The prediction performance of dispatchers and the PAMT model, in severe cases, was compared. Performance was indicated by sensitivity, specificity, positive predictive value, negative predictive value, and accuracy. Results: Although the mean sensitivity and negative predictive value obtained by the PAMT model were higher than those of dispatchers, they obtained higher mean specificity, positive predictive value, and accuracy. The mean accuracy of the PAMT model, from certainty level 0 (lowest certainty) to level 6 (highest certainty), was higher except for levels 5 and 6. The overall performances of the dispatchers and the PAMT model were similar; however, the PAMT model had higher accuracy in cases where the dispatchers were less certain of their judgments. Conclusions: A machine learning–based model, called the PAMT model, was developed to predict severe road accident trauma. The results of our study suggest that the accuracy of the PAMT model is not superior to that of the participating dispatchers; however, it may assist dispatchers when they lack confidence while making a judgment. %M 35687393 %R 10.2196/30210 %U https://www.jmir.org/2022/6/e30210 %U https://doi.org/10.2196/30210 %U http://www.ncbi.nlm.nih.gov/pubmed/35687393 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 10 %N 6 %P e35053 %T Emerging Artificial Intelligence–Empowered mHealth: Scoping Review %A Bhatt,Paras %A Liu,Jia %A Gong,Yanmin %A Wang,Jing %A Guo,Yuanxiong %+ Department of Electrical & Computer Engineering, The University of Texas at San Antonio, 1 UTSA Circle, San Antonio, TX, 78249, United States, 1 210 458 8028, yuanxiong.guo@utsa.edu %K mobile health units %K telemedicine %K machine learning %K artificial intelligence %K review literature as topic %D 2022 %7 9.6.2022 %9 Review %J JMIR Mhealth Uhealth %G English %X Background: Artificial intelligence (AI) has revolutionized health care delivery in recent years. There is an increase in research for advanced AI techniques, such as deep learning, to build predictive models for the early detection of diseases. Such predictive models leverage mobile health (mHealth) data from wearable sensors and smartphones to discover novel ways for detecting and managing chronic diseases and mental health conditions. Objective: Currently, little is known about the use of AI-powered mHealth (AIM) settings. Therefore, this scoping review aims to map current research on the emerging use of AIM for managing diseases and promoting health. Our objective is to synthesize research in AIM models that have increasingly been used for health care delivery in the last 2 years. Methods: Using Arksey and O’Malley’s 5-point framework for conducting scoping reviews, we reviewed AIM literature from the past 2 years in the fields of biomedical technology, AI, and information systems. We searched 3 databases, PubsOnline at INFORMS, e-journal archive at MIS Quarterly, and Association for Computing Machinery (ACM) Digital Library using keywords such as “mobile healthcare,” “wearable medical sensors,” “smartphones”, and “AI.” We included AIM articles and excluded technical articles focused only on AI models. We also used the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) technique for identifying articles that represent a comprehensive view of current research in the AIM domain. Results: We screened 108 articles focusing on developing AIM models for ensuring better health care delivery, detecting diseases early, and diagnosing chronic health conditions, and 37 articles were eligible for inclusion, with 31 of the 37 articles being published last year (76%). Of the included articles, 9 studied AI models to detect serious mental health issues, such as depression and suicidal tendencies, and chronic health conditions, such as sleep apnea and diabetes. Several articles discussed the application of AIM models for remote patient monitoring and disease management. The considered primary health concerns belonged to 3 categories: mental health, physical health, and health promotion and wellness. Moreover, 14 of the 37 articles used AIM applications to research physical health, representing 38% of the total studies. Finally, 28 out of the 37 (76%) studies used proprietary data sets rather than public data sets. We found a lack of research in addressing chronic mental health issues and a lack of publicly available data sets for AIM research. Conclusions: The application of AIM models for disease detection and management is a growing research domain. These models provide accurate predictions for enabling preventive care on a broader scale in the health care domain. Given the ever-increasing need for remote disease management during the pandemic, recent AI techniques, such as federated learning and explainable AI, can act as a catalyst for increasing the adoption of AIM and enabling secure data sharing across the health care industry. %M 35679107 %R 10.2196/35053 %U https://mhealth.jmir.org/2022/6/e35053 %U https://doi.org/10.2196/35053 %U http://www.ncbi.nlm.nih.gov/pubmed/35679107 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 6 %P e33211 %T Ageism and Artificial Intelligence: Protocol for a Scoping Review %A Chu,Charlene H %A Leslie,Kathleen %A Shi,Jiamin %A Nyrup,Rune %A Bianchi,Andria %A Khan,Shehroz S %A Rahimi,Samira Abbasgholizadeh %A Lyn,Alexandra %A Grenier,Amanda %+ Lawrence S Bloomberg Faculty of Nursing, University of Toronto, 155 College St, Unit 130, Toronto, ON, M5T 1P8, Canada, 1 416 946 0217, charlene.chu@utoronto.ca %K artificial intelligence %K ageism %K age-related biases %K gerontology %K algorithms %K search strategy %K health database %K human rights %K ethics %D 2022 %7 9.6.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: Artificial intelligence (AI) has emerged as a major driver of technological development in the 21st century, yet little attention has been paid to algorithmic biases toward older adults. Objective: This paper documents the search strategy and process for a scoping review exploring how age-related bias is encoded or amplified in AI systems as well as the corresponding legal and ethical implications. Methods: The scoping review follows a 6-stage methodology framework developed by Arksey and O’Malley. The search strategy has been established in 6 databases. We will investigate the legal implications of ageism in AI by searching grey literature databases, targeted websites, and popular search engines and using an iterative search strategy. Studies meet the inclusion criteria if they are in English, peer-reviewed, available electronically in full text, and meet one of the following two additional criteria: (1) include “bias” related to AI in any application (eg, facial recognition) and (2) discuss bias related to the concept of old age or ageism. At least two reviewers will independently conduct the title, abstract, and full-text screening. Search results will be reported using the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) reporting guideline. We will chart data on a structured form and conduct a thematic analysis to highlight the societal, legal, and ethical implications reported in the literature. Results: The database searches resulted in 7595 records when the searches were piloted in November 2021. The scoping review will be completed by December 2022. Conclusions: The findings will provide interdisciplinary insights into the extent of age-related bias in AI systems. The results will contribute foundational knowledge that can encourage multisectoral cooperation to ensure that AI is developed and deployed in a manner consistent with ethical values and human rights legislation as it relates to an older and aging population. We will publish the review findings in peer-reviewed journals and disseminate the key results with stakeholders via workshops and webinars. Trial Registration: OSF Registries AMG5P; https://osf.io/amg5p International Registered Report Identifier (IRRID): DERR1-10.2196/33211 %M 35679118 %R 10.2196/33211 %U https://www.researchprotocols.org/2022/6/e33211 %U https://doi.org/10.2196/33211 %U http://www.ncbi.nlm.nih.gov/pubmed/35679118 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 6 %P e30630 %T Evaluation of Dietary Management Using Artificial Intelligence and Human Interventions: Nonrandomized Controlled Trial %A Okaniwa,Fusae %A Yoshida,Hiroshi %+ Department of Theoretical Social Security Research, National Institute of Population and Social Security Research, 2-2-3 Uchisaiwaicho, Chiyoda-ku, Tokyo, 100-0011, Japan, 81 3 3595 2984, okaniwa-fusae@ipss.go.jp %K health promotion %K dietary management %K intervention %K artificial intelligence %K body fat percentage %K body mass index %K behavioral economics %K nonprofessional %K Japan %D 2022 %7 8.6.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: There has been an increase in personal health records with the increased use of wearable devices and smartphone apps to improve health. Traditional health promotion programs by human professionals have limitations in terms of cost and reach. Due to labor shortages and to save costs, there has been a growing emphasis in the medical field on building health guidance systems using artificial intelligence (AI). AI will replace advanced human tasks to some extent in the future. However, it is difficult to sustain behavioral change through technology alone at present. Objective: This study investigates whether AI alone can effectively encourage healthy behaviors or whether human interventions are needed to achieve and sustain health-related behavioral change. We examined the effectiveness of AI and human interventions to encourage dietary management behaviors. In addition, we elucidated the conditions for maximizing the effect of AI on health improvement. We hypothesized that the combination of AI and human interventions will maximize their effectiveness. Methods: We conducted a 3-month experiment by recruiting participants who were users of a smartphone diet management app. We recruited 102 participants and divided them into 3 groups. Treatment group I received text messages using the standard features of the app (AI-based text message intervention). Treatment group II received video messages from a companion, in addition to the text messages (combined text message and human video message intervention by AI). The control group used the app to keep a dietary record, but no feedback was provided (no intervention). We examine the participants’ continuity and the effects on physical indicators. Results: Combined AI and video messaging (treatment group II) led to a lower dropout rate from the program compared to the control group, and the Cox proportional-hazards model estimate showed a hazard ratio (HR) of 0.078, which was statistically significant at the 5% level. Further, human intervention with AI and video messaging significantly reduced the body fat percentage (BFP) of participants after 3 months compared to the control group, and the rate of reduction was greater in the group with more individualized intervention. The AI-based text messages affected the BMI but had no significant effect on the BFP. Conclusions: This experiment shows that it is challenging to sustain participants' healthy behavior with AI intervention alone. The results also suggest that even if the health information conveyed is the same, the information conveyed by humans and AI is more effective in improving health than the information sent by AI alone. The support received from the companion in the form of video messages may have promoted voluntary health behaviors. It is noteworthy that companions were competent, even though they were nonexperts. This means that person-to-person communication is crucial for health interventions. %M 35675107 %R 10.2196/30630 %U https://formative.jmir.org/2022/6/e30630 %U https://doi.org/10.2196/30630 %U http://www.ncbi.nlm.nih.gov/pubmed/35675107 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 6 %P e34295 %T Machine Learning–Based Prediction Models for Different Clinical Risks in Different Hospitals: Evaluation of Live Performance %A Sun,Hong %A Depraetere,Kristof %A Meesseman,Laurent %A Cabanillas Silva,Patricia %A Szymanowsky,Ralph %A Fliegenschmidt,Janis %A Hulde,Nikolai %A von Dossow,Vera %A Vanbiervliet,Martijn %A De Baerdemaeker,Jos %A Roccaro-Waldmeyer,Diana M %A Stieg,Jörg %A Domínguez Hidalgo,Manuel %A Dahlweid,Fried-Michael %+ Dedalus Healthcare, Roderveldlaan 2, Antwerp, 2600, Belgium, 32 3444 8108, hong.sun@dedalus.com %K machine learning %K clinical risk prediction %K prediction %K model %K model evaluation %K scalability %K risk %K live clinical workflow %K delirium %K sepsis %K acute kidney injury %K kidney %K EHR %K electronic health record %K workflow %K algorithm %D 2022 %7 7.6.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Machine learning algorithms are currently used in a wide array of clinical domains to produce models that can predict clinical risk events. Most models are developed and evaluated with retrospective data, very few are evaluated in a clinical workflow, and even fewer report performances in different hospitals. In this study, we provide detailed evaluations of clinical risk prediction models in live clinical workflows for three different use cases in three different hospitals. Objective: The main objective of this study was to evaluate clinical risk prediction models in live clinical workflows and compare their performance in these setting with their performance when using retrospective data. We also aimed at generalizing the results by applying our investigation to three different use cases in three different hospitals. Methods: We trained clinical risk prediction models for three use cases (ie, delirium, sepsis, and acute kidney injury) in three different hospitals with retrospective data. We used machine learning and, specifically, deep learning to train models that were based on the Transformer model. The models were trained using a calibration tool that is common for all hospitals and use cases. The models had a common design but were calibrated using each hospital’s specific data. The models were deployed in these three hospitals and used in daily clinical practice. The predictions made by these models were logged and correlated with the diagnosis at discharge. We compared their performance with evaluations on retrospective data and conducted cross-hospital evaluations. Results: The performance of the prediction models with data from live clinical workflows was similar to the performance with retrospective data. The average value of the area under the receiver operating characteristic curve (AUROC) decreased slightly by 0.6 percentage points (from 94.8% to 94.2% at discharge). The cross-hospital evaluations exhibited severely reduced performance: the average AUROC decreased by 8 percentage points (from 94.2% to 86.3% at discharge), which indicates the importance of model calibration with data from the deployment hospital. Conclusions: Calibrating the prediction model with data from different deployment hospitals led to good performance in live settings. The performance degradation in the cross-hospital evaluation identified limitations in developing a generic model for different hospitals. Designing a generic process for model development to generate specialized prediction models for each hospital guarantees model performance in different hospitals. %M 35502887 %R 10.2196/34295 %U https://www.jmir.org/2022/6/e34295 %U https://doi.org/10.2196/34295 %U http://www.ncbi.nlm.nih.gov/pubmed/35502887 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 8 %N 2 %P e35587 %T Needs, Challenges, and Applications of Artificial Intelligence in Medical Education Curriculum %A Grunhut,Joel %A Marques,Oge %A Wyatt,Adam T M %+ Schmidt College of Medicine, Florida Atlantic University, 777 Glades Road BC-71, Boca Raton, FL, 33431, United States, 1 561 297 4828, jgrunhut2019@health.fau.edu %K artificial intelligence %K AI %K medical education %K medical student %D 2022 %7 7.6.2022 %9 Viewpoint %J JMIR Med Educ %G English %X Artificial intelligence (AI) is on course to become a mainstay in the patient’s room, physician’s office, and the surgical suite. Current advancements in health care technology might put future physicians in an insufficiently equipped position to deal with the advancements and challenges brought about by AI and machine learning solutions. Physicians will be tasked regularly with clinical decision-making with the assistance of AI-driven predictions. Present-day physicians are not trained to incorporate the suggestions of such predictions on a regular basis nor are they knowledgeable in an ethical approach to incorporating AI in their practice and evolving standards of care. Medical schools do not currently incorporate AI in their curriculum due to several factors, including the lack of faculty expertise, the lack of evidence to support the growing desire by students to learn about AI, or the lack of Liaison Committee on Medical Education’s guidance on AI in medical education. Medical schools should incorporate AI in the curriculum as a longitudinal thread in current subjects. Current students should understand the breadth of AI tools, the framework of engineering and designing AI solutions to clinical issues, and the role of data in the development of AI innovations. Study cases in the curriculum should include an AI recommendation that may present critical decision-making challenges. Finally, the ethical implications of AI in medicine must be at the forefront of any comprehensive medical education. %M 35671077 %R 10.2196/35587 %U https://mededu.jmir.org/2022/2/e35587 %U https://doi.org/10.2196/35587 %U http://www.ncbi.nlm.nih.gov/pubmed/35671077 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 6 %P e34298 %T Investigating the Potential for Clinical Decision Support in Sub-Saharan Africa With AFYA (Artificial Intelligence-Based Assessment of Health Symptoms in Tanzania): Protocol for a Prospective, Observational Pilot Study %A Schmude,Marcel %A Salim,Nahya %A Azadzoy,Hila %A Bane,Mustafa %A Millen,Elizabeth %A O’Donnell,Lisa %A Bode,Philipp %A Türk,Ewelina %A Vaidya,Ria %A Gilbert,Stephen %+ Ada Health GmbH, Karl-Liebknecht-Str. 1, Berlin, 10178, Germany, 49 030 40367390, science@ada.com %K differential diagnosis %K artificial intelligence %K clinical decision support systems %K decision support %K diagnostic decision support systems %K diagnosis %K Africa %K low income %K middle income %K user centred design %K user centered design %K symptom assessment %K chatbot %K health app %K prototype %D 2022 %7 7.6.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: Low- and middle-income countries face difficulties in providing adequate health care. One of the reasons is a shortage of qualified health workers. Diagnostic decision support systems are designed to aid clinicians in their work and have the potential to mitigate pressure on health care systems. Objective: The Artificial Intelligence–Based Assessment of Health Symptoms in Tanzania (AFYA) study will evaluate the potential of an English-language artificial intelligence–based prototype diagnostic decision support system for mid-level health care practitioners in a low- or middle-income setting. Methods: This is an observational, prospective clinical study conducted in a busy Tanzanian district hospital. In addition to usual care visits, study participants will consult a mid-level health care practitioner, who will use a prototype diagnostic decision support system, and a study physician. The accuracy and comprehensiveness of the differential diagnosis provided by the diagnostic decision support system will be evaluated against a gold-standard differential diagnosis provided by an expert panel. Results: Patient recruitment started in October 2021. Participants were recruited directly in the waiting room of the outpatient clinic at the hospital. Data collection will conclude in May 2022. Data analysis is planned to be finished by the end of June 2022. The results will be published in a peer-reviewed journal. Conclusions: Most diagnostic decision support systems have been developed and evaluated in high-income countries, but there is great potential for these systems to improve the delivery of health care in low- and middle-income countries. The findings of this real-patient study will provide insights based on the performance and usability of a prototype diagnostic decision support system in low- or middle-income countries. Trial Registration: ClinicalTrials.gov NCT04958577; http://clinicaltrials.gov/ct2/show/NCT04958577 International Registered Report Identifier (IRRID): DERR1-10.2196/34298 %M 35671073 %R 10.2196/34298 %U https://www.researchprotocols.org/2022/6/e34298 %U https://doi.org/10.2196/34298 %U http://www.ncbi.nlm.nih.gov/pubmed/35671073 %0 Journal Article %@ 2561-3278 %I JMIR Publications %V 7 %N 1 %P e33771 %T The Classification of Abnormal Hand Movement to Aid in Autism Detection: Machine Learning Study %A Lakkapragada,Anish %A Kline,Aaron %A Mutlu,Onur Cezmi %A Paskov,Kelley %A Chrisman,Brianna %A Stockham,Nathaniel %A Washington,Peter %A Wall,Dennis Paul %+ Information and Computer Sciences, University of Hawai‘i at Mānoa, 2500 Campus Rd, Honolulu, HI, 96822, United States, 1 5126800926, peter.y.washington@hawaii.edu %K deep learning %K machine learning %K activity recognition %K applied machine learning %K landmark detection %K autism %K diagnosis %K health informatics %K detection %K feasibility %K video %K model %K neural network %D 2022 %7 6.6.2022 %9 Original Paper %J JMIR Biomed Eng %G English %X Background: A formal autism diagnosis can be an inefficient and lengthy process. Families may wait several months or longer before receiving a diagnosis for their child despite evidence that earlier intervention leads to better treatment outcomes. Digital technologies that detect the presence of behaviors related to autism can scale access to pediatric diagnoses. A strong indicator of the presence of autism is self-stimulatory behaviors such as hand flapping. Objective: This study aims to demonstrate the feasibility of deep learning technologies for the detection of hand flapping from unstructured home videos as a first step toward validation of whether statistical models coupled with digital technologies can be leveraged to aid in the automatic behavioral analysis of autism. To support the widespread sharing of such home videos, we explored privacy-preserving modifications to the input space via conversion of each video to hand landmark coordinates and measured the performance of corresponding time series classifiers. Methods: We used the Self-Stimulatory Behavior Dataset (SSBD) that contains 75 videos of hand flapping, head banging, and spinning exhibited by children. From this data set, we extracted 100 hand flapping videos and 100 control videos, each between 2 to 5 seconds in duration. We evaluated five separate feature representations: four privacy-preserved subsets of hand landmarks detected by MediaPipe and one feature representation obtained from the output of the penultimate layer of a MobileNetV2 model fine-tuned on the SSBD. We fed these feature vectors into a long short-term memory network that predicted the presence of hand flapping in each video clip. Results: The highest-performing model used MobileNetV2 to extract features and achieved a test F1 score of 84 (SD 3.7; precision 89.6, SD 4.3 and recall 80.4, SD 6) using 5-fold cross-validation for 100 random seeds on the SSBD data (500 total distinct folds). Of the models we trained on privacy-preserved data, the model trained with all hand landmarks reached an F1 score of 66.6 (SD 3.35). Another such model trained with a select 6 landmarks reached an F1 score of 68.3 (SD 3.6). A privacy-preserved model trained using a single landmark at the base of the hands and a model trained with the average of the locations of all the hand landmarks reached an F1 score of 64.9 (SD 6.5) and 64.2 (SD 6.8), respectively. Conclusions: We created five lightweight neural networks that can detect hand flapping from unstructured videos. Training a long short-term memory network with convolutional feature vectors outperformed training with feature vectors of hand coordinates and used almost 900,000 fewer model parameters. This study provides the first step toward developing precise deep learning methods for activity detection of autism-related behaviors. %M 27666281 %R 10.2196/33771 %U https://biomedeng.jmir.org/2022/1/e33771 %U https://doi.org/10.2196/33771 %U http://www.ncbi.nlm.nih.gov/pubmed/27666281 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 6 %P e35831 %T Machine Learning in Health Promotion and Behavioral Change: Scoping Review %A Goh,Yong Shian %A Ow Yong,Jenna Qing Yun %A Chee,Bernice Qian Hui %A Kuek,Jonathan Han Loong %A Ho,Cyrus Su Hui %+ Alice Lee Centre for Nursing Studies, National University of Singapore, MD11, Clinical Research Centre, Level 2, 10 Medical Drive, Singapore, 117597, Singapore, 65 93896825, nurgys@nus.edu.sg %K machine learning %K health promotion %K health behavioral changes %K artificial intelligence %D 2022 %7 2.6.2022 %9 Review %J J Med Internet Res %G English %X Background: Despite health behavioral change interventions targeting modifiable lifestyle factors underlying chronic diseases, dropouts and nonadherence of individuals have remained high. The rapid development of machine learning (ML) in recent years, alongside its ability to provide readily available personalized experience for users, holds much potential for success in health promotion and behavioral change interventions. Objective: The aim of this paper is to provide an overview of the existing research on ML applications and harness their potential in health promotion and behavioral change interventions. Methods: A scoping review was conducted based on the 5-stage framework by Arksey and O’Malley and the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses for Scoping Reviews) guidelines. A total of 9 databases (the Cochrane Library, CINAHL, Embase, Ovid, ProQuest, PsycInfo, PubMed, Scopus, and Web of Science) were searched from inception to February 2021, without limits on the dates and types of publications. Studies were included in the review if they had incorporated ML in any health promotion or behavioral change interventions, had studied at least one group of participants, and had been published in English. Publication-related information (author, year, aim, and findings), area of health promotion, user data analyzed, type of ML used, challenges encountered, and future research were extracted from each study. Results: A total of 29 articles were included in this review. Three themes were generated, which are as follows: (1) enablers, which is the adoption of information technology for optimizing systemic operation; (2) challenges, which comprises the various hurdles and limitations presented in the articles; and (3) future directions, which explores prospective strategies in health promotion through ML. Conclusions: The challenges pertained to not only the time- and resource-consuming nature of ML-based applications, but also the burden on users for data input and the degree of personalization. Future works may consider designs that correspondingly mitigate these challenges in areas that receive limited attention, such as smoking and mental health. %M 35653177 %R 10.2196/35831 %U https://www.jmir.org/2022/6/e35831 %U https://doi.org/10.2196/35831 %U http://www.ncbi.nlm.nih.gov/pubmed/35653177 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 5 %P e36388 %T Evaluation and Mitigation of Racial Bias in Clinical Machine Learning Models: Scoping Review %A Huang,Jonathan %A Galal,Galal %A Etemadi,Mozziyar %A Vaidyanathan,Mahesh %+ Department of Anesthesiology, Northwestern University Feinberg School of Medicine, 420 E Superior St, Chicago, IL, 60611, United States, 1 (312) 503 8194, galal.galal@nm.org %K artificial intelligence %K machine learning %K race %K bias %K racial bias %K scoping review %K algorithm %K algorithmic fairness %K clinical machine learning %K medical machine learning %K fairness %K assessment %K model %K diagnosis %K outcome prediction %K score prediction %K prediction %K mitigation %D 2022 %7 31.5.2022 %9 Review %J JMIR Med Inform %G English %X Background: Racial bias is a key concern regarding the development, validation, and implementation of machine learning (ML) models in clinical settings. Despite the potential of bias to propagate health disparities, racial bias in clinical ML has yet to be thoroughly examined and best practices for bias mitigation remain unclear. Objective: Our objective was to perform a scoping review to characterize the methods by which the racial bias of ML has been assessed and describe strategies that may be used to enhance algorithmic fairness in clinical ML. Methods: A scoping review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) Extension for Scoping Reviews. A literature search using PubMed, Scopus, and Embase databases, as well as Google Scholar, identified 635 records, of which 12 studies were included. Results: Applications of ML were varied and involved diagnosis, outcome prediction, and clinical score prediction performed on data sets including images, diagnostic studies, clinical text, and clinical variables. Of the 12 studies, 1 (8%) described a model in routine clinical use, 2 (17%) examined prospectively validated clinical models, and the remaining 9 (75%) described internally validated models. In addition, 8 (67%) studies concluded that racial bias was present, 2 (17%) concluded that it was not, and 2 (17%) assessed the implementation of bias mitigation strategies without comparison to a baseline model. Fairness metrics used to assess algorithmic racial bias were inconsistent. The most commonly observed metrics were equal opportunity difference (5/12, 42%), accuracy (4/12, 25%), and disparate impact (2/12, 17%). All 8 (67%) studies that implemented methods for mitigation of racial bias successfully increased fairness, as measured by the authors’ chosen metrics. Preprocessing methods of bias mitigation were most commonly used across all studies that implemented them. Conclusions: The broad scope of medical ML applications and potential patient harms demand an increased emphasis on evaluation and mitigation of racial bias in clinical ML. However, the adoption of algorithmic fairness principles in medicine remains inconsistent and is limited by poor data availability and ML model reporting. We recommend that researchers and journal editors emphasize standardized reporting and data availability in medical ML studies to improve transparency and facilitate evaluation for racial bias. %M 35639450 %R 10.2196/36388 %U https://medinform.jmir.org/2022/5/e36388 %U https://doi.org/10.2196/36388 %U http://www.ncbi.nlm.nih.gov/pubmed/35639450 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 8 %N 2 %P e30537 %T Harnessing Natural Language Processing to Support Decisions Around Workplace-Based Assessment: Machine Learning Study of Competency-Based Medical Education %A Yilmaz,Yusuf %A Jurado Nunez,Alma %A Ariaeinejad,Ali %A Lee,Mark %A Sherbino,Jonathan %A Chan,Teresa M %+ Division of Emergency Medicine, Department of Medicine, Faculty of Health Sciences, McMaster University, McMaster Clinics, Room 255, 237 Barton St E, Hamilton, ON, L8L 2X2, Canada, 1 905 525 9140, teresa.chan@medportal.ca %K natural language processing %K machine learning algorithms %K competency-based medical education %K assessment %K medical education %K medical residents %K machine learning %K work performance %K prediction models %D 2022 %7 27.5.2022 %9 Original Paper %J JMIR Med Educ %G English %X Background: Residents receive a numeric performance rating (eg, 1-7 scoring scale) along with a narrative (ie, qualitative) feedback based on their performance in each workplace-based assessment (WBA). Aggregated qualitative data from WBA can be overwhelming to process and fairly adjudicate as part of a global decision about learner competence. Current approaches with qualitative data require a human rater to maintain attention and appropriately weigh various data inputs within the constraints of working memory before rendering a global judgment of performance. Objective: This study explores natural language processing (NLP) and machine learning (ML) applications for identifying trainees at risk using a large WBA narrative comment data set associated with numerical ratings. Methods: NLP was performed retrospectively on a complete data set of narrative comments (ie, text-based feedback to residents based on their performance on a task) derived from WBAs completed by faculty members from multiple hospitals associated with a single, large, residency program at McMaster University, Canada. Narrative comments were vectorized to quantitative ratings using the bag-of-n-grams technique with 3 input types: unigram, bigrams, and trigrams. Supervised ML models using linear regression were trained with the quantitative ratings, performed binary classification, and output a prediction of whether a resident fell into the category of at risk or not at risk. Sensitivity, specificity, and accuracy metrics are reported. Results: The database comprised 7199 unique direct observation assessments, containing both narrative comments and a rating between 3 and 7 in imbalanced distribution (scores 3-5: 726 ratings; and scores 6-7: 4871 ratings). A total of 141 unique raters from 5 different hospitals and 45 unique residents participated over the course of 5 academic years. When comparing the 3 different input types for diagnosing if a trainee would be rated low (ie, 1-5) or high (ie, 6 or 7), our accuracy for trigrams was 87%, bigrams 86%, and unigrams 82%. We also found that all 3 input types had better prediction accuracy when using a bimodal cut (eg, lower or higher) compared with predicting performance along the full 7-point rating scale (50%-52%). Conclusions: The ML models can accurately identify underperforming residents via narrative comments provided for WBAs. The words generated in WBAs can be a worthy data set to augment human decisions for educators tasked with processing large volumes of narrative assessments. %M 35622398 %R 10.2196/30537 %U https://mededu.jmir.org/2022/2/e30537 %U https://doi.org/10.2196/30537 %U http://www.ncbi.nlm.nih.gov/pubmed/35622398 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 5 %P e35738 %T Life Course Digital Twins–Intelligent Monitoring for Early and Continuous Intervention and Prevention (LifeTIME): Proposal for a Retrospective Cohort Study %A Milne-Ives,Madison %A Fraser,Lorna K %A Khan,Asiya %A Walker,David %A van Velthoven,Michelle Helena %A May,Jon %A Wolfe,Ingrid %A Harding,Tracey %A Meinert,Edward %+ Centre for Health Technology, University of Plymouth, Room 2, 6 Kirkby Place, Plymouth, PL4 6DN, United Kingdom, 44 1752600600, edward.meinert@plymouth.ac.uk %K artificial intelligence %K machine learning %K mulitmorbidity %K mental health %K health care %K AI %K outcome %K NCDS %K national child development study %D 2022 %7 26.5.2022 %9 Proposal %J JMIR Res Protoc %G English %X Background: Multimorbidity, which is associated with significant negative outcomes for individuals and health care systems, is increasing in the United Kingdom. However, there is a lack of knowledge about the risk factors (including health, behavior, and environment) for multimorbidity over time. An interdisciplinary approach is essential, as data science, artificial intelligence, and engineering concepts (digital twins) can identify key risk factors throughout the life course, potentially enabling personalized simulation of life-course risk for the development of multimorbidity. Predicting the risk of developing clusters of health conditions before they occur would add clinical value by enabling targeted early preventive interventions, advancing personalized care to improve outcomes, and reducing the burden on health care systems. Objective: This study aims to identify key risk factors that predict multimorbidity throughout the life course by developing an intelligent agent using digital twins so that early interventions can be delivered to improve health outcomes. The objectives of this study are to identify key predictors of lifetime risk of multimorbidity, create a series of simulated computational digital twins that predict risk levels for specific clusters of factors, and test the feasibility of the system. Methods: This study will use machine learning to develop digital twins by identifying key risk factors throughout the life course that predict the risk of later multimorbidity. The first stage of the development will be the training of a base predictive model. Data from the National Child Development Study, the North West London Integrated Care Record, the Clinical Practice Research Datalink, and Cerner’s Real World Data will be split into subsets for training and validation, which will be done following the k-fold cross-validation procedure and assessed with the Prediction Model Risk of Bias Assessment Tool (PROBAST). In addition, 2 data sets—the Early-Life Data Cross-linkage in Research study and the Children and Young People’s Health Partnership randomized controlled trial—will be used to develop a series of digital twin personas that simulate clusters of factors to predict different risk levels of developing multimorbidity. Results: The expected results are a validated model, a series of digital twin personas, and a proof-of-concept assessment. Conclusions: Digital twins could provide an individualized early warning system that predicts the risk of future health conditions and recommends the most effective intervention to minimize that risk. These insights could significantly improve an individual’s quality of life and healthy life expectancy and reduce population-level health burdens. International Registered Report Identifier (IRRID): PRR1-10.2196/35738 %M 35617022 %R 10.2196/35738 %U https://www.researchprotocols.org/2022/5/e35738 %U https://doi.org/10.2196/35738 %U http://www.ncbi.nlm.nih.gov/pubmed/35617022 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 8 %N 5 %P e30426 %T Identifying Cases of Shoulder Injury Related to Vaccine Administration (SIRVA) in the United States: Development and Validation of a Natural Language Processing Method %A Zheng,Chengyi %A Duffy,Jonathan %A Liu,In-Lu Amy %A Sy,Lina S %A Navarro,Ronald A %A Kim,Sunhea S %A Ryan,Denison S %A Chen,Wansu %A Qian,Lei %A Mercado,Cheryl %A Jacobsen,Steven J %+ Department of Research and Evaluation, Kaiser Permanente Southern California, 100 S Los Robles Ave, 2nd floor, Pasadena, CA, 91101, United States, 1 626 986 8665, chengyi.x.zheng@kp.org %K health %K informatics %K shoulder injury related to vaccine administration %K SIRVA %K natural language processing %K NLP %K causal relation %K temporal relation %K pharmacovigilance %K electronic health records %K EHR %K vaccine safety %K artificial intelligence %K big data %K population health %K real-world data %K vaccines %D 2022 %7 24.5.2022 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Shoulder injury related to vaccine administration (SIRVA) accounts for more than half of all claims received by the National Vaccine Injury Compensation Program. However, due to the difficulty of finding SIRVA cases in large health care databases, population-based studies are scarce. Objective: The goal of the research was to develop a natural language processing (NLP) method to identify SIRVA cases from clinical notes. Methods: We conducted the study among members of a large integrated health care organization who were vaccinated between April 1, 2016, and December 31, 2017, and had subsequent diagnosis codes indicative of shoulder injury. Based on a training data set with a chart review reference standard of 164 cases, we developed an NLP algorithm to extract shoulder disorder information, including prior vaccination, anatomic location, temporality and causality. The algorithm identified 3 groups of positive SIRVA cases (definite, probable, and possible) based on the strength of evidence. We compared NLP results to a chart review reference standard of 100 vaccinated cases. We then applied the final automated NLP algorithm to a broader cohort of vaccinated persons with a shoulder injury diagnosis code and performed manual chart confirmation on a random sample of NLP-identified definite cases and all NLP-identified probable and possible cases. Results: In the validation sample, the NLP algorithm had 100% accuracy for identifying 4 SIRVA cases and 96 cases without SIRVA. In the broader cohort of 53,585 vaccinations, the NLP algorithm identified 291 definite, 124 probable, and 52 possible SIRVA cases. The chart-confirmation rates for these groups were 95.5% (278/291), 67.7% (84/124), and 17.3% (9/52), respectively. Conclusions: The algorithm performed with high sensitivity and reasonable specificity in identifying positive SIRVA cases. The NLP algorithm can potentially be used in future population-based studies to identify this rare adverse event, avoiding labor-intensive chart review validation. %M 35608886 %R 10.2196/30426 %U https://publichealth.jmir.org/2022/5/e30426 %U https://doi.org/10.2196/30426 %U http://www.ncbi.nlm.nih.gov/pubmed/35608886 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 5 %P e35367 %T Exploring Physician Perspectives on Using Real-world Care Data for the Development of Artificial Intelligence–Based Technologies in Health Care: Qualitative Study %A Kamradt,Martina %A Poß-Doering,Regina %A Szecsenyi,Joachim %+ Department of General Practice and Health Services Research, University Hospital Heidelberg, Im Neuenheimer Feld 130.3, Heidelberg, 69120, Germany, 49 6221 56 8206, martina.kamradt@med.uni-heidelberg.de %K artificial intelligence–based solutions %K data donation %K qualitative research %K Germany %K artificial intelligence %K requirement analysis %K physician perspective %K real-world data %K big data %K data pool %K interview %K qualitative %D 2022 %7 18.5.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Development of artificial intelligence (AI)–based technologies in health care is proceeding rapidly. The sharing and release of real-world data are key practical issues surrounding the implementation of AI solutions into existing clinical practice. However, data derived from daily patient care are necessary for initial training, and continued data supply is needed for the ongoing training, validation, and improvement of AI-based solutions. Data may need to be shared across multiple institutions and settings for the widespread implementation and high-quality use of these solutions. To date, solutions have not been widely implemented in Germany to meet the challenge of providing a sufficient data volume for the development of AI-based technologies for research and third-party entities. The Protected Artificial Intelligence Innovation Environment for Patient-Oriented Digital Health Solutions (pAItient) project aims to meet this challenge by creating a large data pool that feeds on the donation of data derived from daily patient care. Prior to building this data pool, physician perspectives regarding data donation for AI-based solutions should be studied. Objective: This study explores physician perspectives on providing and using real-world care data for the development of AI-based solutions in health care in Germany. Methods: As a part of the requirements analysis preceding the pAItient project, this qualitative study explored physician perspectives and expectations regarding the use of data derived from daily patient care in AI-based solutions. Semistructured, guide-based, and problem-centered interviews were audiorecorded, deidentified, transcribed verbatim, and analyzed inductively in a thematically structured approach. Results: Interviews (N=8) with a mean duration of 24 (SD 7.8) minutes were conducted with 6 general practitioners and 2 hospital-based physicians. The mean participant age was 54 (SD 14.1; range 30-74) years, with an average experience as a physician of 25 (SD 13.9; range 1-45) years. Self-rated affinity toward modern information technology varied from very high to low (5-point Likert scale: mean 3.75, SD 1.1). All participants reported they would support the development of AI-based solutions in research contexts by donating deidentified data derived from daily patient care if subsequent data use was made transparent to them and their patients and the benefits for patient care were clear. Contributing to care optimization and efficiency were cited as motivation for potential data donation. Concerns regarding workflow integration (time and effort), appropriate deidentification, and the involvement of third-party entities with economic interests were discussed. The donation of data in reference to psychosomatic treatment needs was viewed critically. Conclusions: The interviewed physicians reported they would agree to use real-world care data to support the development of AI-based solutions with a clear benefit for daily patient care. Joint ventures with third-party entities were viewed critically and should focus on care optimization and patient benefits rather than financial interests. %M 35583921 %R 10.2196/35367 %U https://formative.jmir.org/2022/5/e35367 %U https://doi.org/10.2196/35367 %U http://www.ncbi.nlm.nih.gov/pubmed/35583921 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 5 %P e27694 %T The Accuracy of Artificial Intelligence in the Endoscopic Diagnosis of Early Gastric Cancer: Pooled Analysis Study %A Chen,Pei-Chin %A Lu,Yun-Ru %A Kang,Yi-No %A Chang,Chun-Chao %+ Division of Gastroenterology and Hepatology, Department of Internal Medicine, Taipei Medical University Hospital, No 252, Wuxing St, Taipei, 110, Taiwan, 886 227372181, chunchao@tmu.edu.tw %K artificial intelligence %K early gastric cancer %K endoscopy %D 2022 %7 16.5.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) for gastric cancer diagnosis has been discussed in recent years. The role of AI in early gastric cancer is more important than in advanced gastric cancer since early gastric cancer is not easily identified in clinical practice. However, to our knowledge, past syntheses appear to have limited focus on the populations with early gastric cancer. Objective: The purpose of this study is to evaluate the diagnostic accuracy of AI in the diagnosis of early gastric cancer from endoscopic images. Methods: We conducted a systematic review from database inception to June 2020 of all studies assessing the performance of AI in the endoscopic diagnosis of early gastric cancer. Studies not concerning early gastric cancer were excluded. The outcome of interest was the diagnostic accuracy (comprising sensitivity, specificity, and accuracy) of AI systems. Study quality was assessed on the basis of the revised Quality Assessment of Diagnostic Accuracy Studies. Meta-analysis was primarily based on a bivariate mixed-effects model. A summary receiver operating curve and a hierarchical summary receiver operating curve were constructed, and the area under the curve was computed. Results: We analyzed 12 retrospective case control studies (n=11,685) in which AI identified early gastric cancer from endoscopic images. The pooled sensitivity and specificity of AI for early gastric cancer diagnosis were 0.86 (95% CI 0.75-0.92) and 0.90 (95% CI 0.84-0.93), respectively. The area under the curve was 0.94. Sensitivity analysis of studies using support vector machines and narrow-band imaging demonstrated more consistent results. Conclusions: For early gastric cancer, to our knowledge, this was the first synthesis study on the use of endoscopic images in AI in diagnosis. AI may support the diagnosis of early gastric cancer. However, the collocation of imaging techniques and optimal algorithms remain unclear. Competing models of AI for the diagnosis of early gastric cancer are worthy of future investigation. Trial Registration: PROSPERO CRD42020193223; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=193223 %M 35576561 %R 10.2196/27694 %U https://www.jmir.org/2022/5/e27694 %U https://doi.org/10.2196/27694 %U http://www.ncbi.nlm.nih.gov/pubmed/35576561 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 5 %P e26801 %T Electronic Medical Record–Based Machine Learning Approach to Predict the Risk of 30-Day Adverse Cardiac Events After Invasive Coronary Treatment: Machine Learning Model Development and Validation %A Kwon,Osung %A Na,Wonjun %A Kang,Heejun %A Jun,Tae Joon %A Kweon,Jihoon %A Park,Gyung-Min %A Cho,YongHyun %A Hur,Cinyoung %A Chae,Jungwoo %A Kang,Do-Yoon %A Lee,Pil Hyung %A Ahn,Jung-Min %A Park,Duk-Woo %A Kang,Soo-Jin %A Lee,Seung-Whan %A Lee,Cheol Whan %A Park,Seong-Wook %A Park,Seung-Jung %A Yang,Dong Hyun %A Kim,Young-Hak %+ Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-ro 43-gil, Songpa-gu, Seoul, 05505, Republic of Korea, 82 2 3010 3995, mdyhkim@amc.seoul.kr %K big data %K electronic medical record %K machine learning %K mortality %K adverse cardiac event %K coronary artery disease %K prediction %D 2022 %7 11.5.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: Although there is a growing interest in prediction models based on electronic medical records (EMRs) to identify patients at risk of adverse cardiac events following invasive coronary treatment, robust models fully utilizing EMR data are limited. Objective: We aimed to develop and validate machine learning (ML) models by using diverse fields of EMR to predict the risk of 30-day adverse cardiac events after percutaneous intervention or bypass surgery. Methods: EMR data of 5,184,565 records of 16,793 patients at a quaternary hospital between 2006 and 2016 were categorized into static basic (eg, demographics), dynamic time-series (eg, laboratory values), and cardiac-specific data (eg, coronary angiography). The data were randomly split into training, tuning, and testing sets in a ratio of 3:1:1. Each model was evaluated with 5-fold cross-validation and with an external EMR-based cohort at a tertiary hospital. Logistic regression (LR), random forest (RF), gradient boosting machine (GBM), and feedforward neural network (FNN) algorithms were applied. The primary outcome was 30-day mortality following invasive treatment. Results: GBM showed the best performance with area under the receiver operating characteristic curve (AUROC) of 0.99; RF had a similar AUROC of 0.98. AUROCs of FNN and LR were 0.96 and 0.93, respectively. GBM had the highest area under the precision-recall curve (AUPRC) of 0.80, and the AUPRCs of RF, LR, and FNN were 0.73, 0.68, and 0.63, respectively. All models showed low Brier scores of <0.1 as well as highly fitted calibration plots, indicating a good fit of the ML-based models. On external validation, the GBM model demonstrated maximal performance with an AUROC of 0.90, while FNN had an AUROC of 0.85. The AUROCs of LR and RF were slightly lower at 0.80 and 0.79, respectively. The AUPRCs of GBM, LR, and FNN were similar at 0.47, 0.43, and 0.41, respectively, while that of RF was lower at 0.33. Among the categories in the GBM model, time-series dynamic data demonstrated a high AUROC of >0.95, contributing majorly to the excellent results. Conclusions: Exploiting the diverse fields of the EMR data set, the ML-based 30-day adverse cardiac event prediction models demonstrated outstanding results, and the applied framework could be generalized for various health care prediction models. %M 35544292 %R 10.2196/26801 %U https://medinform.jmir.org/2022/5/e26801 %U https://doi.org/10.2196/26801 %U http://www.ncbi.nlm.nih.gov/pubmed/35544292 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 5 %P e37092 %T Using Artificial Intelligence to Revolutionise the Patient Care Pathway in Hip and Knee Arthroplasty (ARCHERY): Protocol for the Development of a Clinical Prediction Model %A Farrow,Luke %A Ashcroft,George Patrick %A Zhong,Mingjun %A Anderson,Lesley %+ Institute of Applied Health Sciences, University of Aberdeen, Foresterhill, Aberdeen, AB25 2ZD, United Kingdom, 44 01224552908, luke.farrow@abdn.ac.uk %K orthopedics %K prediction modelling %K machine learning %K artificial intelligence %K imaging %K hip %K knee %K arthroplasty %K health care %K patient care %K arthritis %D 2022 %7 11.5.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: Hip and knee osteoarthritis is substantially prevalent worldwide, with large numbers of older adults undergoing joint replacement (arthroplasty) every year. A backlog of elective surgery due to the COVID-19 pandemic, and an aging population, has led to substantial issues with access to timely arthroplasty surgery. A potential method to improve the efficiency of arthroplasty services is by increasing the percentage of patients who are listed for surgery from primary care referrals. The use of artificial intelligence (AI) techniques, specifically machine learning, provides a potential unexplored solution to correctly and rapidly select suitable patients for arthroplasty surgery. Objective: This study has 2 objectives: (1) develop a cohort of patients with referrals by general practitioners regarding assessment of suitability for hip or knee replacement from National Health Service (NHS) Grampian data via the Grampian Data Safe Haven and (2) determine the demographic, clinical, and imaging characteristics that influence the selection of patients to undergo hip or knee arthroplasty, and develop a tested and validated patient-specific predictive model to guide arthroplasty referral pathways. Methods: The AI to Revolutionise the Patient Care Pathway in Hip and Knee Arthroplasty (ARCHERY) project will be delivered through 2 linked work packages conducted within the Grampian Data Safe Haven and Safe Haven Artificial Intelligence Platform. The data set will include a cohort of individuals aged ≥16 years with referrals for the consideration of elective primary hip or knee replacement from January 2015 to January 2022. Linked pseudo-anonymized NHS Grampian health care data will be acquired including patient demographics, medication records, laboratory data, theatre records, text from clinical letters, and radiological images and reports. Following the creation of the data set, machine learning techniques will be used to develop pattern classification and probabilistic prediction models based on radiological images. Supplemental demographic and clinical data will be used to improve the predictive capabilities of the models. The sample size is predicted to be approximately 2000 patients—a sufficient size for satisfactory assessment of the primary outcome. Cross-validation will be used for development, testing, and internal validation. Evaluation will be performed through standard techniques, such as the C statistic (area under curve) metric, calibration characteristics (Brier score), and a confusion matrix. Results: The study was funded by the Chief Scientist Office Scotland as part of a Clinical Research Fellowship that runs from August 2021 to August 2024. Approval from the North Node Privacy Advisory Committee was confirmed on October 13, 2021. Data collection started in May 2022, with the results expected to be published in the first quarter of 2024. ISRCTN registration has been completed. Conclusions: This project provides a first step toward delivering an automated solution for arthroplasty selection using routinely collected health care data. Following appropriate external validation and clinical testing, this project could substantially improve the proportion of referred patients that are selected to undergo surgery, with a subsequent reduction in waiting time for arthroplasty appointments. Trial Registration: ISRCTN Registry ISRCTN18398037; https://www.isrctn.com/ISRCTN18398037 International Registered Report Identifier (IRRID): PRR1-10.2196/37092 %M 35544289 %R 10.2196/37092 %U https://www.researchprotocols.org/2022/5/e37092 %U https://doi.org/10.2196/37092 %U http://www.ncbi.nlm.nih.gov/pubmed/35544289 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 5 %P e35991 %T Accuracy of an Artificial Intelligence–Based Model for Estimating Leftover Liquid Food in Hospitals: Validation Study %A Tagi,Masato %A Tajiri,Mari %A Hamada,Yasuhiro %A Wakata,Yoshifumi %A Shan,Xiao %A Ozaki,Kazumi %A Kubota,Masanori %A Amano,Sosuke %A Sakaue,Hiroshi %A Suzuki,Yoshiko %A Hirose,Jun %+ Department of Medical Informatics, Institute of Biomedical Sciences, Tokushima University Graduate School, 3-18-15 Kuramoto-cho, Tokushima, 7708503, Japan, 81 88 633 9178, tagi@tokushima-u.ac.jp %K artificial intelligence %K convolutional neural network %K neural network %K machine learning %K malnourished %K malnourishment %K model %K hospital %K patient %K nutrition %K food consumption %K dietary intake %K diet %K food intake %K liquid food %K nutrition management %D 2022 %7 10.5.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: An accurate evaluation of the nutritional status of malnourished hospitalized patients at a higher risk of complications, such as frailty or disability, is crucial. Visual methods of estimating food intake are popular for evaluating the nutritional status in clinical environments. However, from the perspective of accurate measurement, such methods are unreliable. Objective: The accuracy of estimating leftover liquid food in hospitals using an artificial intelligence (AI)–based model was compared to that of visual estimation. Methods: The accuracy of the AI-based model (AI estimation) was compared to that of the visual estimation method for thin rice gruel as staple food and fermented milk and peach juice as side dishes. A total of 576 images of liquid food (432 images of thin rice gruel, 72 of fermented milk, and 72 of peach juice) were used. The mean absolute error, root mean squared error, and coefficient of determination (R2) were used as metrics for determining the accuracy of the evaluation process. Welch t test and the confusion matrix were used to examine the difference of mean absolute error between AI and visual estimation. Results: The mean absolute errors obtained through the AI estimation approach were 0.63 for fermented milk, 0.25 for peach juice, and 0.85 for the total. These were significantly smaller than those obtained using the visual estimation approach, which were 1.40 (P<.001) for fermented milk, 0.90 (P<.001) for peach juice, and 1.03 (P=.009) for the total. By contrast, the mean absolute error for thin rice gruel obtained using the AI estimation method (0.99) did not differ significantly from that obtained using visual estimation (0.99). The confusion matrix for thin rice gruel showed variation in the distribution of errors, indicating that the errors in the AI estimation were biased toward the case of many leftovers. The mean squared error for all liquid foods tended to be smaller for the AI estimation than for the visual estimation. Additionally, the coefficient of determination (R2) for fermented milk and peach juice tended to be larger for the AI estimation than for the visual estimation, and the R2 value for the total was equal in terms of accuracy between the AI and visual estimations. Conclusions: The AI estimation approach achieved a smaller mean absolute error and root mean squared error and a larger coefficient of determination (R2) than the visual estimation approach for the side dishes. Additionally, the AI estimation approach achieved a smaller mean absolute error and root mean squared error compared to the visual estimation method, and the coefficient of determination (R2) was similar to that of the visual estimation method for the total. AI estimation measures liquid food intake in hospitals more precisely than visual estimation, but its accuracy in estimating staple food leftovers requires improvement. %M 35536638 %R 10.2196/35991 %U https://formative.jmir.org/2022/5/e35991 %U https://doi.org/10.2196/35991 %U http://www.ncbi.nlm.nih.gov/pubmed/35536638 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 9 %N 2 %P e35219 %T Determinants of Laypersons’ Trust in Medical Decision Aids: Randomized Controlled Trial %A Kopka,Marvin %A Schmieding,Malte L %A Rieger,Tobias %A Roesler,Eileen %A Balzer,Felix %A Feufel,Markus A %+ Institute of Medical Informatics, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Charitépl 1, Berlin, 10117, Germany, 49 30 450 581 052, marvin.kopka@charite.de %K symptom checkers %K disposition advice %K anthropomorphism %K artificial intelligence %K urgency assessment %K patient-centered care %K human-computer interaction %K consumer health %K information technology %K IT %K mobile phone %D 2022 %7 3.5.2022 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Symptom checker apps are patient-facing decision support systems aimed at providing advice to laypersons on whether, where, and how to seek health care (disposition advice). Such advice can improve laypersons’ self-assessment and ultimately improve medical outcomes. Past research has mainly focused on the accuracy of symptom checker apps’ suggestions. To support decision-making, such apps need to provide not only accurate but also trustworthy advice. To date, only few studies have addressed the question of the extent to which laypersons trust symptom checker app advice or the factors that moderate their trust. Studies on general decision support systems have shown that framing automated systems (anthropomorphic or emphasizing expertise), for example, by using icons symbolizing artificial intelligence (AI), affects users’ trust. Objective: This study aims to identify the factors influencing laypersons’ trust in the advice provided by symptom checker apps. Primarily, we investigated whether designs using anthropomorphic framing or framing the app as an AI increases users’ trust compared with no such framing. Methods: Through a web-based survey, we recruited 494 US residents with no professional medical training. The participants had to first appraise the urgency of a fictitious patient description (case vignette). Subsequently, a decision aid (mock symptom checker app) provided disposition advice contradicting the participants’ appraisal, and they had to subsequently reappraise the vignette. Participants were randomized into 3 groups: 2 experimental groups using visual framing (anthropomorphic, 160/494, 32.4%, vs AI, 161/494, 32.6%) and a neutral group without such framing (173/494, 35%). Results: Most participants (384/494, 77.7%) followed the decision aid’s advice, regardless of its urgency level. Neither anthropomorphic framing (odds ratio 1.120, 95% CI 0.664-1.897) nor framing as AI (odds ratio 0.942, 95% CI 0.565-1.570) increased behavioral or subjective trust (P=.99) compared with the no-frame condition. Even participants who were extremely certain in their own decisions (ie, 100% certain) commonly changed it in favor of the symptom checker’s advice (19/34, 56%). Propensity to trust and eHealth literacy were associated with increased subjective trust in the symptom checker (propensity to trust b=0.25; eHealth literacy b=0.2), whereas sociodemographic variables showed no such link with either subjective or behavioral trust. Conclusions: Contrary to our expectation, neither the anthropomorphic framing nor the emphasis on AI increased trust in symptom checker advice compared with that of a neutral control condition. However, independent of the interface, most participants trusted the mock app’s advice, even when they were very certain of their own assessment. Thus, the question arises as to whether laypersons use such symptom checkers as substitutes rather than as aids in their own decision-making. With trust in symptom checkers already high at baseline, the benefit of symptom checkers depends on interface designs that enable users to adequately calibrate their trust levels during usage. Trial Registration: Deutsches Register Klinischer Studien DRKS00028561; https://tinyurl.com/rv4utcfb (retrospectively registered). %M 35503248 %R 10.2196/35219 %U https://humanfactors.jmir.org/2022/2/e35219 %U https://doi.org/10.2196/35219 %U http://www.ncbi.nlm.nih.gov/pubmed/35503248 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 5 %N 2 %P e32169 %T Benefits of, Barriers to, and Needs for an Artificial Intelligence–Powered Medication Information Voice Chatbot for Older Adults: Interview Study With Geriatrics Experts %A Gudala,Meghana %A Ross,Mary Ellen Trail %A Mogalla,Sunitha %A Lyons,Mandi %A Ramaswamy,Padmavathy %A Roberts,Kirk %+ School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin St #600, Houston, TX, 77030, United States, 1 713 500 3653, Kirk.Roberts@uth.tmc.edu %K medication information %K chatbot %K older adults %K technology capabilities %K mobile phone %D 2022 %7 28.4.2022 %9 Original Paper %J JMIR Aging %G English %X Background: One of the most complicated medical needs of older adults is managing their complex medication regimens. However, the use of technology to aid older adults in this endeavor is impeded by the fact that their technological capabilities are lower than those of much of the rest of the population. What is needed to help manage medications is a technology that seamlessly integrates within their comfort levels, such as artificial intelligence agents. Objective: This study aimed to assess the benefits, barriers, and information needs that can be provided by an artificial intelligence–powered medication information voice chatbot for older adults. Methods: A total of 8 semistructured interviews were conducted with geriatrics experts. All interviews were audio-recorded and transcribed. Each interview was coded by 2 investigators (2 among ML, PR, METR, and KR) using a semiopen coding method for qualitative analysis, and reconciliation was performed by a third investigator. All codes were organized into the benefit/nonbenefit, barrier/nonbarrier, and need categories. Iterative recoding and member checking were performed until convergence was reached for all interviews. Results: The greatest benefits of a medication information voice-based chatbot would be helping to overcome the vision and dexterity hurdles experienced by most older adults, as it uses voice-based technology. It also helps to increase older adults’ medication knowledge and adherence and supports their overall health. The main barriers were technology familiarity and cost, especially in lower socioeconomic older adults, as well as security and privacy concerns. It was noted however that technology familiarity was not an insurmountable barrier for older adults aged 65 to 75 years, who mostly owned smartphones, whereas older adults aged >75 years may have never been major users of technology in the first place. The most important needs were to be usable, to help patients with reminders, and to provide information on medication side effects and use instructions. Conclusions: Our needs analysis results derived from expert interviews clarify that a voice-based chatbot could be beneficial in improving adherence and overall health if it is built to serve the many medication information needs of older adults, such as reminders and instructions. However, the chatbot must be usable and affordable for its widespread use. %M 35482367 %R 10.2196/32169 %U https://aging.jmir.org/2022/2/e32169 %U https://doi.org/10.2196/32169 %U http://www.ncbi.nlm.nih.gov/pubmed/35482367 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 4 %P e30503 %T 6G and Artificial Intelligence Technologies for Dementia Care: Literature Review and Practical Analysis %A Su,Zhaohui %A Bentley,Barry L %A McDonnell,Dean %A Ahmad,Junaid %A He,Jiguang %A Shi,Feng %A Takeuchi,Kazuaki %A Cheshmehzangi,Ali %A da Veiga,Claudimar Pereira %+ Fundação Dom Cabral, 632 Prefeito Lothário Meissner Ave, Av Princesa Diana, 760 Alphaville, Lagoa dos Ingleses, Nova Lima, 34018-006, Brazil, 55 41 3360 4366, claudimar.veiga@gmail.com %K COVID-19 %K 6G %K digital health %K artificial intelligence %K dementia %K first-perspective health solutions %D 2022 %7 27.4.2022 %9 Review %J J Med Internet Res %G English %X Background: The dementia epidemic is progressing fast. As the world’s older population keeps skyrocketing, the traditional incompetent, time-consuming, and laborious interventions are becoming increasingly insufficient to address dementia patients’ health care needs. This is particularly true amid COVID-19. Instead, efficient, cost-effective, and technology-based strategies, such as sixth-generation communication solutions (6G) and artificial intelligence (AI)-empowered health solutions, might be the key to successfully managing the dementia epidemic until a cure becomes available. However, while 6G and AI technologies hold great promise, no research has examined how 6G and AI applications can effectively and efficiently address dementia patients’ health care needs and improve their quality of life. Objective: This study aims to investigate ways in which 6G and AI technologies could elevate dementia care to address this study gap. Methods: A literature review was conducted in databases such as PubMed, Scopus, and PsycINFO. The search focused on three themes: dementia, 6G, and AI technologies. The initial search was conducted on April 25, 2021, complemented by relevant articles identified via a follow-up search on November 11, 2021, and Google Scholar alerts. Results: The findings of the study were analyzed in terms of the interplay between people with dementia’s unique health challenges and the promising capabilities of health technologies, with in-depth and comprehensive analyses of advanced technology-based solutions that could address key dementia care needs, ranging from impairments in memory (eg, Egocentric Live 4D Perception), speech (eg, Project Relate), motor (eg, Avatar Robot Café), cognitive (eg, Affectiva), to social interactions (eg, social robots). Conclusions: To live is to grow old. Yet dementia is neither a proper way to live nor a natural aging process. By identifying advanced health solutions powered by 6G and AI opportunities, our study sheds light on the imperative of leveraging the potential of advanced technologies to elevate dementia patients’ will to live, enrich their daily activities, and help them engage in societies across shapes and forms. %M 35475733 %R 10.2196/30503 %U https://www.jmir.org/2022/4/e30503 %U https://doi.org/10.2196/30503 %U http://www.ncbi.nlm.nih.gov/pubmed/35475733 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 9 %N 2 %P e35671 %T Understanding People With Chronic Pain Who Use a Cognitive Behavioral Therapy–Based Artificial Intelligence Mental Health App (Wysa): Mixed Methods Retrospective Observational Study %A Meheli,Saha %A Sinha,Chaitali %A Kadaba,Madhura %+ Wysa Inc, 131 Dartmouth St, Boston, MA, United States, 1 916 753 7824, chaitali@wysa.io %K chronic pain %K digital mental health %K mobile health %K mHealth %K pain management %K artificial intelligence %K cognitive behavioral therapy %K conversational agent %K software agent %K pain conditions %K depression %K anxiety %D 2022 %7 27.4.2022 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Digital health interventions can bridge barriers in access to treatment among individuals with chronic pain. Objective: This study aimed to evaluate the perceived needs, engagement, and effectiveness of the mental health app Wysa with regard to mental health outcomes among real-world users who reported chronic pain and engaged with the app for support. Methods: Real-world data from users (N=2194) who reported chronic pain and associated health conditions in their conversations with the mental health app were examined using a mixed methods retrospective observational study. An inductive thematic analysis was used to analyze the conversational data of users with chronic pain to assess perceived needs, along with comparative macro-analyses of conversational flows to capture engagement within the app. Additionally, the scores from a subset of users who completed a set of pre-post assessment questionnaires, namely Patient Health Questionnaire-9 (PHQ-9) (n=69) and Generalized Anxiety Disorder Assessment-7 (GAD-7) (n=57), were examined to evaluate the effectiveness of Wysa in providing support for mental health concerns among those managing chronic pain. Results: The themes emerging from the conversations of users with chronic pain included health concerns, socioeconomic concerns, and pain management concerns. Findings from the quantitative analysis indicated that users with chronic pain showed significantly greater app engagement (P<.001) than users without chronic pain, with a large effect size (Vargha and Delaney A=0.76-0.80). Furthermore, users with pre-post assessments during the study period were found to have significant improvements in group means for both PHQ-9 and GAD-7 symptom scores, with a medium effect size (Cohen d=0.60-0.61). Conclusions: The findings indicate that users look for tools that can help them address their concerns related to mental health, pain management, and sleep issues. The study findings also indicate the breadth of the needs of users with chronic pain and the lack of support structures, and suggest that Wysa can provide effective support to bridge the gap. %M 35314422 %R 10.2196/35671 %U https://humanfactors.jmir.org/2022/2/e35671 %U https://doi.org/10.2196/35671 %U http://www.ncbi.nlm.nih.gov/pubmed/35314422 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 8 %N 4 %P e32405 %T Toward Using Twitter for PrEP-Related Interventions: An Automated Natural Language Processing Pipeline for Identifying Gay or Bisexual Men in the United States %A Klein,Ari Z %A Meanley,Steven %A O'Connor,Karen %A Bauermeister,José A %A Gonzalez-Hernandez,Graciela %+ Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Blockley Hall, 4th Floor, 423 Guardian Drive, Philadelphia, PA, 19104, United States, 1 215 746 1101, ariklein@pennmedicine.upenn.edu %K natural language processing %K social media %K data mining %K PrEP %K pre-exposure prophylaxis %K HIV %K AIDS %D 2022 %7 25.4.2022 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Pre-exposure prophylaxis (PrEP) is highly effective at preventing the acquisition of HIV. There is a substantial gap, however, between the number of people in the United States who have indications for PrEP and the number of them who are prescribed PrEP. Although Twitter content has been analyzed as a source of PrEP-related data (eg, barriers), methods have not been developed to enable the use of Twitter as a platform for implementing PrEP-related interventions. Objective: Men who have sex with men (MSM) are the population most affected by HIV in the United States. Therefore, the objectives of this study were to (1) develop an automated natural language processing (NLP) pipeline for identifying men in the United States who have reported on Twitter that they are gay, bisexual, or MSM and (2) assess the extent to which they demographically represent MSM in the United States with new HIV diagnoses. Methods: Between September 2020 and January 2021, we used the Twitter Streaming Application Programming Interface (API) to collect more than 3 million tweets containing keywords that men may include in posts reporting that they are gay, bisexual, or MSM. We deployed handwritten, high-precision regular expressions—designed to filter out noise and identify actual self-reports—on the tweets and their user profile metadata. We identified 10,043 unique users geolocated in the United States and drew upon a validated NLP tool to automatically identify their ages. Results: By manually distinguishing true- and false-positive self-reports in the tweets or profiles of 1000 (10%) of the 10,043 users identified by our automated pipeline, we established that our pipeline has a precision of 0.85. Among the 8756 users for which a US state–level geolocation was detected, 5096 (58.2%) were in the 10 states with the highest numbers of new HIV diagnoses. Among the 6240 users for which a county-level geolocation was detected, 4252 (68.1%) were in counties or states considered priority jurisdictions by the Ending the HIV Epidemic initiative. Furthermore, the age distribution of the users reflected that of MSM in the United States with new HIV diagnoses. Conclusions: Our automated NLP pipeline can be used to identify MSM in the United States who may be at risk of acquiring HIV, laying the groundwork for using Twitter on a large scale to directly target PrEP-related interventions at this population. %M 35468092 %R 10.2196/32405 %U https://publichealth.jmir.org/2022/4/e32405 %U https://doi.org/10.2196/32405 %U http://www.ncbi.nlm.nih.gov/pubmed/35468092 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 5 %N 2 %P e32473 %T Voice-Enabled Intelligent Virtual Agents for People With Amnesia: Systematic Review %A Boumans,Roel %A van de Sande,Yana %A Thill,Serge %A Bosse,Tibor %+ Behavioural Science Institute, Radboud University, Thomas van Aquinostraat 4, Nijmegen, 6525 GD, Netherlands, 31 0622372708, roel.boumans@ru.nl %K intelligent virtual agent %K amnesia %K dementia %K Alzheimer %K systematic review %K mobile phone %D 2022 %7 25.4.2022 %9 Review %J JMIR Aging %G English %X Background: Older adults often have increasing memory problems (amnesia), and approximately 50 million people worldwide have dementia. This syndrome gradually affects a patient over a period of 10-20 years. Intelligent virtual agents may support people with amnesia. Objective: This study aims to identify state-of-the-art experimental studies with virtual agents on a screen capable of verbal dialogues with a target group of older adults with amnesia. Methods: We conducted a systematic search of PubMed, SCOPUS, Microsoft Academic, Google Scholar, Web of Science, and CrossRef on virtual agent and amnesia on papers that describe such experiments. Search criteria were (Virtual Agent OR Virtual Assistant OR Virtual Human OR Conversational Agent OR Virtual Coach OR Chatbot) AND (Amnesia OR Dementia OR Alzheimer OR Mild Cognitive Impairment). Risk of bias was evaluated using the QualSyst tool (University of Alberta), which scores 14 study quality items. Eligible studies are reported in a table including country, study design type, target sample size, controls, study aims, experiment population, intervention details, results, and an image of the agent. Results: A total of 8 studies was included in this meta-analysis. The average number of participants in the studies was 20 (SD 12). The verbal interactions were generally short. The usability was generally reported to be positive. The human utterance was seen in 7 (88%) out of 8 studies based on short words or phrases that were predefined in the agent’s speech recognition algorithm. The average study quality score was 0.69 (SD 0.08) on a scale of 0 to 1. Conclusions: The number of experimental studies on talking about virtual agents that support people with memory problems is still small. The details on the verbal interaction are limited, which makes it difficult to assess the quality of the interaction and the possible effects of confounding parameters. In addition, the derivation of the aggregated data was difficult. Further research with extended and prolonged dialogues is required. %M 35468084 %R 10.2196/32473 %U https://aging.jmir.org/2022/2/e32473 %U https://doi.org/10.2196/32473 %U http://www.ncbi.nlm.nih.gov/pubmed/35468084 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 10 %N 4 %P e36977 %T Fully Automated Wound Tissue Segmentation Using Deep Learning on Mobile Devices: Cohort Study %A Ramachandram,Dhanesh %A Ramirez-GarciaLuna,Jose Luis %A Fraser,Robert D J %A Martínez-Jiménez,Mario Aurelio %A Arriaga-Caballero,Jesus E %A Allport,Justin %+ Swift Medical Inc, Suite 500, 1 Richmond St W, Toronto, ON, M5H 3W4, Canada, 1 888 755 2565, dhanesh@swiftmedical.io %K wound %K tissue segmentation %K automated tissue identification %K deep learning %K mobile imaging %K mobile phone %D 2022 %7 22.4.2022 %9 Original Paper %J JMIR Mhealth Uhealth %G English %X Background: Composition of tissue types within a wound is a useful indicator of its healing progression. Tissue composition is clinically used in wound healing tools (eg, Bates-Jensen Wound Assessment Tool) to assess risk and recommend treatment. However, wound tissue identification and the estimation of their relative composition is highly subjective. Consequently, incorrect assessments could be reported, leading to downstream impacts including inappropriate dressing selection, failure to identify wounds at risk of not healing, or failure to make appropriate referrals to specialists. Objective: This study aimed to measure inter- and intrarater variability in manual tissue segmentation and quantification among a cohort of wound care clinicians and determine if an objective assessment of tissue types (ie, size and amount) can be achieved using deep neural networks. Methods: A data set of 58 anonymized wound images of various types of chronic wounds from Swift Medical’s Wound Database was used to conduct the inter- and intrarater agreement study. The data set was split into 3 subsets with 50% overlap between subsets to measure intrarater agreement. In this study, 4 different tissue types (epithelial, granulation, slough, and eschar) within the wound bed were independently labeled by the 5 wound clinicians at 1-week intervals using a browser-based image annotation tool. In addition, 2 deep convolutional neural network architectures were developed for wound segmentation and tissue segmentation and were used in sequence in the workflow. These models were trained using 465,187 and 17,000 image-label pairs, respectively. This is the largest and most diverse reported data set used for training deep learning models for wound and wound tissue segmentation. The resulting models offer robust performance in diverse imaging conditions, are unbiased toward skin tones, and could execute in near real time on mobile devices. Results: A poor to moderate interrater agreement in identifying tissue types in chronic wound images was reported. A very poor Krippendorff α value of .014 for interrater variability when identifying epithelization was observed, whereas granulation was most consistently identified by the clinicians. The intrarater intraclass correlation (3,1), however, indicates that raters were relatively consistent when labeling the same image multiple times over a period. Our deep learning models achieved a mean intersection over union of 0.8644 and 0.7192 for wound and tissue segmentation, respectively. A cohort of wound clinicians, by consensus, rated 91% (53/58) of the tissue segmentation results to be between fair and good in terms of tissue identification and segmentation quality. Conclusions: The interrater agreement study validates that clinicians exhibit considerable variability when identifying and visually estimating wound tissue proportion. The proposed deep learning technique provides objective tissue identification and measurements to assist clinicians in documenting the wound more accurately and could have a significant impact on wound care when deployed at scale. %M 35451982 %R 10.2196/36977 %U https://mhealth.jmir.org/2022/4/e36977 %U https://doi.org/10.2196/36977 %U http://www.ncbi.nlm.nih.gov/pubmed/35451982 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 9 %N 4 %P e35928 %T Natural Language Processing Methods and Bipolar Disorder: Scoping Review %A Harvey,Daisy %A Lobban,Fiona %A Rayson,Paul %A Warner,Aaron %A Jones,Steven %+ Spectrum Centre for Mental Health Research, Division of Health Research, School of Health and Medicine, Lancaster University, Health Innovation One, Sir John Fisher Drive, Lancaster, LA1 4YG, United Kingdom, 44 152465201, d.harvey4@lancaster.ac.uk %K bipolar disorder %K mental health %K mental illness %K natural language processing %K computational linguistics %D 2022 %7 22.4.2022 %9 Review %J JMIR Ment Health %G English %X Background: Health researchers are increasingly using natural language processing (NLP) to study various mental health conditions using both social media and electronic health records (EHRs). There is currently no published synthesis that relates specifically to the use of NLP methods for bipolar disorder, and this scoping review was conducted to synthesize valuable insights that have been presented in the literature. Objective: This scoping review explored how NLP methods have been used in research to better understand bipolar disorder and identify opportunities for further use of these methods. Methods: A systematic, computerized search of index and free-text terms related to bipolar disorder and NLP was conducted using 5 databases and 1 anthology: MEDLINE, PsycINFO, Academic Search Ultimate, Scopus, Web of Science Core Collection, and the ACL Anthology. Results: Of 507 identified studies, a total of 35 (6.9%) studies met the inclusion criteria. A narrative synthesis was used to describe the data, and the studies were grouped into four objectives: prediction and classification (n=25), characterization of the language of bipolar disorder (n=13), use of EHRs to measure health outcomes (n=3), and use of EHRs for phenotyping (n=2). Ethical considerations were reported in 60% (21/35) of the studies. Conclusions: The current literature demonstrates how language analysis can be used to assist in and improve the provision of care for people living with bipolar disorder. Individuals with bipolar disorder and the medical community could benefit from research that uses NLP to investigate risk-taking, web-based services, social and occupational functioning, and the representation of gender in bipolar disorder populations on the web. Future research that implements NLP methods to study bipolar disorder should be governed by ethical principles, and any decisions regarding the collection and sharing of data sets should ultimately be made on a case-by-case basis, considering the risk to the data participants and whether their privacy can be ensured. %M 35451984 %R 10.2196/35928 %U https://mental.jmir.org/2022/4/e35928 %U https://doi.org/10.2196/35928 %U http://www.ncbi.nlm.nih.gov/pubmed/35451984 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 4 %P e28114 %T Understanding the Research Landscape of Deep Learning in Biomedical Science: Scientometric Analysis %A Nam,Seojin %A Kim,Donghun %A Jung,Woojin %A Zhu,Yongjun %+ Department of Library and Information Science, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea, 82 2 2123 2409, zhu@yonsei.ac.kr %K deep learning %K scientometric analysis %K research publications %K research landscape %K research collaboration %K knowledge diffusion %D 2022 %7 22.4.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Advances in biomedical research using deep learning techniques have generated a large volume of related literature. However, there is a lack of scientometric studies that provide a bird’s-eye view of them. This absence has led to a partial and fragmented understanding of the field and its progress. Objective: This study aimed to gain a quantitative and qualitative understanding of the scientific domain by analyzing diverse bibliographic entities that represent the research landscape from multiple perspectives and levels of granularity. Methods: We searched and retrieved 978 deep learning studies in biomedicine from the PubMed database. A scientometric analysis was performed by analyzing the metadata, content of influential works, and cited references. Results: In the process, we identified the current leading fields, major research topics and techniques, knowledge diffusion, and research collaboration. There was a predominant focus on applying deep learning, especially convolutional neural networks, to radiology and medical imaging, whereas a few studies focused on protein or genome analysis. Radiology and medical imaging also appeared to be the most significant knowledge sources and an important field in knowledge diffusion, followed by computer science and electrical engineering. A coauthorship analysis revealed various collaborations among engineering-oriented and biomedicine-oriented clusters of disciplines. Conclusions: This study investigated the landscape of deep learning research in biomedicine and confirmed its interdisciplinary nature. Although it has been successful, we believe that there is a need for diverse applications in certain areas to further boost the contributions of deep learning in addressing biomedical research problems. We expect the results of this study to help researchers and communities better align their present and future work. %M 35451980 %R 10.2196/28114 %U https://www.jmir.org/2022/4/e28114 %U https://doi.org/10.2196/28114 %U http://www.ncbi.nlm.nih.gov/pubmed/35451980 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 4 %P e35465 %T Contributions of Artificial Intelligence Reported in Obstetrics and Gynecology Journals: Systematic Review %A Dhombres,Ferdinand %A Bonnard,Jules %A Bailly,Kévin %A Maurice,Paul %A Papageorghiou,Aris T %A Jouannic,Jean-Marie %+ Fetal Medicine Department, Armand Trousseau University Hospital, Sorbonne University, 26 Avenue du Dr Arnold Netter, Paris, 75012, France, 33 4473 5117, ferdinand.dhombres@inserm.fr %K artificial intelligence %K systematic review %K knowledge bases %K machine learning %K obstetrics %K gynaecology %K perinatology %K medical informatics %D 2022 %7 20.4.2022 %9 Review %J J Med Internet Res %G English %X Background: The applications of artificial intelligence (AI) processes have grown significantly in all medical disciplines during the last decades. Two main types of AI have been applied in medicine: symbolic AI (eg, knowledge base and ontologies) and nonsymbolic AI (eg, machine learning and artificial neural networks). Consequently, AI has also been applied across most obstetrics and gynecology (OB/GYN) domains, including general obstetrics, gynecology surgery, fetal ultrasound, and assisted reproductive medicine, among others. Objective: The aim of this study was to provide a systematic review to establish the actual contributions of AI reported in OB/GYN discipline journals. Methods: The PubMed database was searched for citations indexed with “artificial intelligence” and at least one of the following medical subject heading (MeSH) terms between January 1, 2000, and April 30, 2020: “obstetrics”; “gynecology”; “reproductive techniques, assisted”; or “pregnancy.” All publications in OB/GYN core disciplines journals were considered. The selection of journals was based on disciplines defined in Web of Science. The publications were excluded if no AI process was used in the study. Review, editorial, and commentary articles were also excluded. The study analysis comprised (1) classification of publications into OB/GYN domains, (2) description of AI methods, (3) description of AI algorithms, (4) description of data sets, (5) description of AI contributions, and (6) description of the validation of the AI process. Results: The PubMed search retrieved 579 citations and 66 publications met the selection criteria. All OB/GYN subdomains were covered: obstetrics (41%, 27/66), gynecology (3%, 2/66), assisted reproductive medicine (33%, 22/66), early pregnancy (2%, 1/66), and fetal medicine (21%, 14/66). Both machine learning methods (39/66) and knowledge base methods (25/66) were represented. Machine learning used imaging, numerical, and clinical data sets. Knowledge base methods used mostly omics data sets. The actual contributions of AI were method/algorithm development (53%, 35/66), hypothesis generation (42%, 28/66), or software development (3%, 2/66). Validation was performed on one data set (86%, 57/66) and no external validation was reported. We observed a general rising trend in publications related to AI in OB/GYN over the last two decades. Most of these publications (82%, 54/66) remain out of the scope of the usual OB/GYN journals. Conclusions: In OB/GYN discipline journals, mostly preliminary work (eg, proof-of-concept algorithm or method) in AI applied to this discipline is reported and clinical validation remains an unmet prerequisite. Improvement driven by new AI research guidelines is expected. However, these guidelines are covering only a part of AI approaches (nonsymbolic) reported in this review; hence, updates need to be considered. %M 35297766 %R 10.2196/35465 %U https://www.jmir.org/2022/4/e35465 %U https://doi.org/10.2196/35465 %U http://www.ncbi.nlm.nih.gov/pubmed/35297766 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 4 %P e33799 %T Research and Application of Artificial Intelligence Based on Electronic Health Records of Patients With Cancer: Systematic Review %A Yang,Xinyu %A Mu,Dongmei %A Peng,Hao %A Li,Hua %A Wang,Ying %A Wang,Ping %A Wang,Yue %A Han,Siqi %+ Division of Clinical Research, The First Hospital of Jilin University, No.1, Xinmin Street, Changchun, 130021, China, 86 0431 81875404, moudm@jlu.edu.cn %K electronic health records %K artificial intelligence %K neoplasms %K machine learning %D 2022 %7 20.4.2022 %9 Review %J JMIR Med Inform %G English %X Background: With the accumulation of electronic health records and the development of artificial intelligence, patients with cancer urgently need new evidence of more personalized clinical and demographic characteristics and more sophisticated treatment and prevention strategies. However, no research has systematically analyzed the application and significance of artificial intelligence based on electronic health records in cancer care. Objective: The aim of this study was to conduct a review to introduce the current state and limitations of artificial intelligence based on electronic health records of patients with cancer and to summarize the performance of artificial intelligence in mining electronic health records and its impact on cancer care. Methods: Three databases were systematically searched to retrieve potentially relevant papers published from January 2009 to October 2020. Four principal reviewers assessed the quality of the papers and reviewed them for eligibility based on the inclusion criteria in the extracted data. The summary measures used in this analysis were the number and frequency of occurrence of the themes. Results: Of the 1034 papers considered, 148 papers met the inclusion criteria. Cancer care, especially cancers of female organs and digestive organs, could benefit from artificial intelligence based on electronic health records through cancer emergencies and prognostic estimates, cancer diagnosis and prediction, tumor stage detection, cancer case detection, and treatment pattern recognition. The models can always achieve an area under the curve of 0.7. Ensemble methods and deep learning are on the rise. In addition, electronic medical records in the existing studies are mainly in English and from private institutional databases. Conclusions: Artificial intelligence based on electronic health records performed well and could be useful for cancer care. Improving the performance of artificial intelligence can help patients receive more scientific-based and accurate treatments. There is a need for the development of new methods and electronic health record data sharing and for increased passion and support from cancer specialists. %M 35442195 %R 10.2196/33799 %U https://medinform.jmir.org/2022/4/e33799 %U https://doi.org/10.2196/33799 %U http://www.ncbi.nlm.nih.gov/pubmed/35442195 %0 Journal Article %@ 2561-6722 %I JMIR Publications %V 5 %N 2 %P e35406 %T Classifying Autism From Crowdsourced Semistructured Speech Recordings: Machine Learning Model Comparison Study %A Chi,Nathan A %A Washington,Peter %A Kline,Aaron %A Husic,Arman %A Hou,Cathy %A He,Chloe %A Dunlap,Kaitlyn %A Wall,Dennis P %+ Division of Systems Medicine, Department of Pediatrics, Stanford University, 3145 Porter Drive, Palo Alto, CA, 94304, United States, 1 650 666 7676, dpwall@stanford.edu %K autism %K mHealth %K machine learning %K artificial intelligence %K speech %K audio %K child %K digital data %K mobile app %K diagnosis %D 2022 %7 14.4.2022 %9 Original Paper %J JMIR Pediatr Parent %G English %X Background: Autism spectrum disorder (ASD) is a neurodevelopmental disorder that results in altered behavior, social development, and communication patterns. In recent years, autism prevalence has tripled, with 1 in 44 children now affected. Given that traditional diagnosis is a lengthy, labor-intensive process that requires the work of trained physicians, significant attention has been given to developing systems that automatically detect autism. We work toward this goal by analyzing audio data, as prosody abnormalities are a signal of autism, with affected children displaying speech idiosyncrasies such as echolalia, monotonous intonation, atypical pitch, and irregular linguistic stress patterns. Objective: We aimed to test the ability for machine learning approaches to aid in detection of autism in self-recorded speech audio captured from children with ASD and neurotypical (NT) children in their home environments. Methods: We considered three methods to detect autism in child speech: (1) random forests trained on extracted audio features (including Mel-frequency cepstral coefficients); (2) convolutional neural networks trained on spectrograms; and (3) fine-tuned wav2vec 2.0—a state-of-the-art transformer-based speech recognition model. We trained our classifiers on our novel data set of cellphone-recorded child speech audio curated from the Guess What? mobile game, an app designed to crowdsource videos of children with ASD and NT children in a natural home environment. Results: The random forest classifier achieved 70% accuracy, the fine-tuned wav2vec 2.0 model achieved 77% accuracy, and the convolutional neural network achieved 79% accuracy when classifying children’s audio as either ASD or NT. We used 5-fold cross-validation to evaluate model performance. Conclusions: Our models were able to predict autism status when trained on a varied selection of home audio clips with inconsistent recording qualities, which may be more representative of real-world conditions. The results demonstrate that machine learning methods offer promise in detecting autism automatically from speech without specialized equipment. %M 35436234 %R 10.2196/35406 %U https://pediatrics.jmir.org/2022/2/e35406 %U https://doi.org/10.2196/35406 %U http://www.ncbi.nlm.nih.gov/pubmed/35436234 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 4 %P e34470 %T Coordinating Health Care With Artificial Intelligence–Supported Technology for Patients With Atrial Fibrillation: Protocol for a Randomized Controlled Trial %A Laranjo,Liliana %A Shaw,Tim %A Trivedi,Ritu %A Thomas,Stuart %A Charlston,Emma %A Klimis,Harry %A Thiagalingam,Aravinda %A Kumar,Saurabh %A Tan,Timothy C %A Nguyen,Tu N %A Marschner,Simone %A Chow,Clara %+ Westmead Applied Research Centre, University of Sydney, Level 6, Block K, Entrance 10, Westmead Hospital, Hawkesbury Road, Westmead, Sydney, 2145, Australia, 61 413461852, liliana.laranjo@sydney.edu.au %K atrial fibrillation %K interactive voice response %K artificial intelligence %K conversational agent %K mobile phone %D 2022 %7 13.4.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: Atrial fibrillation (AF) is an increasingly common chronic health condition for which integrated care that is multidisciplinary and patient-centric is recommended yet challenging to implement. Objective: The aim of Coordinating Health Care With Artificial Intelligence–Supported Technology in AF is to evaluate the feasibility and potential efficacy of a digital intervention (AF-Support) comprising preprogrammed automated telephone calls (artificial intelligence conversational technology), SMS text messages, and emails, as well as an educational website, to support patients with AF in self-managing their condition and coordinate primary and secondary care follow-up. Methods: Coordinating Health Care With Artificial Intelligence–Supported Technology in AF is a 6-month randomized controlled trial of adult patients with AF (n=385), who will be allocated in a ratio of 4:1 to AF-Support or usual care, with postintervention semistructured interviews. The primary outcome is AF-related quality of life, and the secondary outcomes include cardiovascular risk factors, outcomes, and health care use. The 4:1 allocation design enables a detailed examination of the feasibility, uptake, and process of the implementation of AF-Support. Participants with new or ongoing AF will be recruited from hospitals and specialist-led clinics in Sydney, New South Wales, Australia. AF-Support has been co-designed with clinicians, researchers, information technologists, and patients. Automated telephone calls will occur 7 times, with the first call triggered to commence 24 to 48 hours after enrollment. Calls follow a standard flow but are customized to vary depending on patients’ responses. Calls assess AF symptoms, and participants’ responses will trigger different system responses based on prespecified protocols, including the identification of red flags requiring escalation. Randomization will be performed electronically, and allocation concealment will be ensured. Because of the nature of this trial, only outcome assessors and data analysts will be blinded. For the primary outcome, groups will be compared using an analysis of covariance adjusted for corresponding baseline values. Randomized trial data analysis will be performed according to the intention-to-treat principle, and qualitative data will be thematically analyzed. Results: Ethics approval was granted by the Western Sydney Local Health District Human Ethics Research Committee, and recruitment started in December 2020. As of December 2021, a total of 103 patients had been recruited. Conclusions: This study will address the gap in knowledge with respect to the role of postdischarge digital care models for supporting patients with AF. Trial Registration: Australian New Zealand Clinical Trials Registry ACTRN12621000174886; https://www.australianclinicaltrials.gov.au/anzctr/trial/ACTRN12621000174886 International Registered Report Identifier (IRRID): DERR1-10.2196/34470 %M 35416784 %R 10.2196/34470 %U https://www.researchprotocols.org/2022/4/e34470 %U https://doi.org/10.2196/34470 %U http://www.ncbi.nlm.nih.gov/pubmed/35416784 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 8 %N 2 %P e34973 %T Readiness to Embrace Artificial Intelligence Among Medical Doctors and Students: Questionnaire-Based Study %A Boillat,Thomas %A Nawaz,Faisal A %A Rivas,Homero %+ Design Lab, College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Healthcare City 14, Dubai, United Arab Emirates, 971 43838759, Thomas.boillat@mbru.ac.ae %K artificial intelligence in medicine %K health care %K questionnaire %K medical doctors %K medical students %D 2022 %7 12.4.2022 %9 Original Paper %J JMIR Med Educ %G English %X Background: Similar to understanding how blood pressure is measured by a sphygmomanometer, physicians will soon have to understand how an artificial intelligence–based application has come to the conclusion that a patient has hypertension, diabetes, or cancer. Although there are an increasing number of use cases where artificial intelligence is or can be applied to improve medical outcomes, the extent to which medical doctors and students are ready to work and leverage this paradigm is unclear. Objective: This research aims to capture medical students’ and doctors’ level of familiarity toward artificial intelligence in medicine as well as their challenges, barriers, and potential risks linked to the democratization of this new paradigm. Methods: A web-based questionnaire comprising five dimensions—demographics, concepts and definitions, training and education, implementation, and risks—was systematically designed from a literature search. It was completed by 207 participants in total, of which 105 (50.7%) medical doctors and 102 (49.3%) medical students trained in all continents, with most of them in Europe, the Middle East, Asia, and North America. Results: The results revealed no significant difference in the familiarity of artificial intelligence between medical doctors and students (P=.91), except that medical students perceived artificial intelligence in medicine to lead to higher risks for patients and the field of medicine in general (P<.001). We also identified a rather low level of familiarity with artificial intelligence (medical students=2.11/5; medical doctors=2.06/5) as well as a low attendance to education or training. Only 2.9% (3/105) of medical doctors attended a course on artificial intelligence within the previous year, compared with 9.8% (10/102) of medical students. The complexity of the field of medicine was considered one of the biggest challenges (medical doctors=3.5/5; medical students=3.8/5), whereas the reduction of physicians’ skills was the most important risk (medical doctors=3.3; medical students=3.6; P=.03). Conclusions: The question is not whether artificial intelligence will be used in medicine, but when it will become a standard practice for optimizing health care. The low level of familiarity with artificial intelligence identified in this study calls for the implementation of specific education and training in medical schools and hospitals to ensure that medical professionals can leverage this new paradigm and improve health outcomes. %M 35412463 %R 10.2196/34973 %U https://mededu.jmir.org/2022/2/e34973 %U https://doi.org/10.2196/34973 %U http://www.ncbi.nlm.nih.gov/pubmed/35412463 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 4 %P e33680 %T The Impact and Applications of Social Media Platforms for Public Health Responses Before and During the COVID-19 Pandemic: Systematic Literature Review %A Gunasekeran,Dinesh Visva %A Chew,Alton %A Chandrasekar,Eeshwar K %A Rajendram,Priyanka %A Kandarpa,Vasundhara %A Rajendram,Mallika %A Chia,Audrey %A Smith,Helen %A Leong,Choon Kit %+ National University of Singapore, 10 Medical Drive, Singapore, 117597, Singapore, 65 67723737, dineshvg@hotmail.sg %K digital health %K social media %K big data %K population health %K blockchain %K COVID-19 %K review %K benefit %K challenge %K public health %D 2022 %7 11.4.2022 %9 Review %J J Med Internet Res %G English %X Background:  Social media platforms have numerous potential benefits and drawbacks on public health, which have been described in the literature. The COVID-19 pandemic has exposed our limited knowledge regarding the potential health impact of these platforms, which have been detrimental to public health responses in many regions. Objective: This review aims to highlight a brief history of social media in health care and report its potential negative and positive public health impacts, which have been characterized in the literature. Methods:  We searched electronic bibliographic databases including PubMed, including Medline and Institute of Electrical and Electronics Engineers Xplore, from December 10, 2015, to December 10, 2020. We screened the title and abstracts and selected relevant reports for review of full text and reference lists. These were analyzed thematically and consolidated into applications of social media platforms for public health. Results:  The positive and negative impact of social media platforms on public health are catalogued on the basis of recent research in this report. These findings are discussed in the context of improving future public health responses and incorporating other emerging digital technology domains such as artificial intelligence. However, there is a need for more research with pragmatic methodology that evaluates the impact of specific digital interventions to inform future health policy. Conclusions:  Recent research has highlighted the potential negative impact of social media platforms on population health, as well as potentially useful applications for public health communication, monitoring, and predictions. More research is needed to objectively investigate measures to mitigate against its negative impact while harnessing effective applications for the benefit of public health. %M 35129456 %R 10.2196/33680 %U https://www.jmir.org/2022/4/e33680 %U https://doi.org/10.2196/33680 %U http://www.ncbi.nlm.nih.gov/pubmed/35129456 %0 Journal Article %@ 2561-6722 %I JMIR Publications %V 5 %N 2 %P e26760 %T Improved Digital Therapy for Developmental Pediatrics Using Domain-Specific Artificial Intelligence: Machine Learning Study %A Washington,Peter %A Kalantarian,Haik %A Kent,John %A Husic,Arman %A Kline,Aaron %A Leblanc,Emilie %A Hou,Cathy %A Mutlu,Onur Cezmi %A Dunlap,Kaitlyn %A Penev,Yordan %A Varma,Maya %A Stockham,Nate Tyler %A Chrisman,Brianna %A Paskov,Kelley %A Sun,Min Woo %A Jung,Jae-Yoon %A Voss,Catalin %A Haber,Nick %A Wall,Dennis Paul %+ Departments of Pediatrics (Systems Medicine) and Biomedical Data Science, Stanford University, Stanford, CA, United States, 1 5126800926, peterwashington@stanford.edu %K computer vision %K emotion recognition %K affective computing %K autism spectrum disorder %K pediatrics %K mobile health %K digital therapy %K convolutional neural network %K machine learning %K artificial intelligence %D 2022 %7 8.4.2022 %9 Original Paper %J JMIR Pediatr Parent %G English %X Background: Automated emotion classification could aid those who struggle to recognize emotions, including children with developmental behavioral conditions such as autism. However, most computer vision emotion recognition models are trained on adult emotion and therefore underperform when applied to child faces. Objective: We designed a strategy to gamify the collection and labeling of child emotion–enriched images to boost the performance of automatic child emotion recognition models to a level closer to what will be needed for digital health care approaches. Methods: We leveraged our prototype therapeutic smartphone game, GuessWhat, which was designed in large part for children with developmental and behavioral conditions, to gamify the secure collection of video data of children expressing a variety of emotions prompted by the game. Independently, we created a secure web interface to gamify the human labeling effort, called HollywoodSquares, tailored for use by any qualified labeler. We gathered and labeled 2155 videos, 39,968 emotion frames, and 106,001 labels on all images. With this drastically expanded pediatric emotion–centric database (>30 times larger than existing public pediatric emotion data sets), we trained a convolutional neural network (CNN) computer vision classifier of happy, sad, surprised, fearful, angry, disgust, and neutral expressions evoked by children. Results: The classifier achieved a 66.9% balanced accuracy and 67.4% F1-score on the entirety of the Child Affective Facial Expression (CAFE) as well as a 79.1% balanced accuracy and 78% F1-score on CAFE Subset A, a subset containing at least 60% human agreement on emotions labels. This performance is at least 10% higher than all previously developed classifiers evaluated against CAFE, the best of which reached a 56% balanced accuracy even when combining “anger” and “disgust” into a single class. Conclusions: This work validates that mobile games designed for pediatric therapies can generate high volumes of domain-relevant data sets to train state-of-the-art classifiers to perform tasks helpful to precision health efforts. %M 35394438 %R 10.2196/26760 %U https://pediatrics.jmir.org/2022/2/e26760 %U https://doi.org/10.2196/26760 %U http://www.ncbi.nlm.nih.gov/pubmed/35394438 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 4 %P e29535 %T Digital Health–Enabled Community-Centered Care: Scalable Model to Empower Future Community Health Workers Using Human-in-the-Loop Artificial Intelligence %A Rodrigues,Sarah M %A Kanduri,Anil %A Nyamathi,Adeline %A Dutt,Nikil %A Khargonekar,Pramod %A Rahmani,Amir M %+ Sue & Bill Gross School of Nursing, University of California, 802 W Peltason Drive, Irvine, CA, 92697, United States, 1 949 352 4286, sarahmr@uci.edu %K digital health %K community-centered care %K community health worker %K artificial intelligence %K AI %K AI-enabled health delivery %K eHealth %K individualized delivery %K interventions %K collaborative health %K community health %K social care %K digital empowerment %K mobile phone %D 2022 %7 6.4.2022 %9 Viewpoint %J JMIR Form Res %G English %X Digital health–enabled community-centered care (D-CCC) represents a pioneering vision for the future of community-centered care. D-CCC aims to support and amplify the digital footprint of community health workers through a novel artificial intelligence–enabled closed-loop digital health platform designed for, and with, community health workers. By focusing digitalization at the level of the community health worker, D-CCC enables more timely, supported, and individualized community health worker–delivered interventions. D-CCC has the potential to move community-centered care into an expanded, digitally interconnected, and collaborative community-centered health and social care ecosystem of the future, grounded within a robust and digitally empowered community health workforce. %M 35384853 %R 10.2196/29535 %U https://formative.jmir.org/2022/4/e29535 %U https://doi.org/10.2196/29535 %U http://www.ncbi.nlm.nih.gov/pubmed/35384853 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 8 %N 2 %P e35223 %T Artificial Intelligence Education for the Health Workforce: Expert Survey of Approaches and Needs %A Gray,Kathleen %A Slavotinek,John %A Dimaguila,Gerardo Luis %A Choo,Dawn %+ Centre for Digital Transformation of Health, The University of Melbourne, Level 13, VCCC Building, 305 Grattan St, Parkville, 3010, Australia, 61 3 8344 8936, kgray@unimelb.edu.au %K artificial intelligence %K curriculum %K ethics %K human-computer interaction %K interprofessional education %K machine learning %K natural language processing %K professional development %K robotics %D 2022 %7 4.4.2022 %9 Original Paper %J JMIR Med Educ %G English %X Background: The preparation of the current and future health workforce for the possibility of using artificial intelligence (AI) in health care is a growing concern as AI applications emerge in various care settings and specializations. At present, there is no obvious consensus among educators about what needs to be learned or how this learning may be supported or assessed. Objective: Our study aims to explore health care education experts’ ideas and plans for preparing the health workforce to work with AI and identify critical gaps in curriculum and educational resources across a national health care system. Methods: A survey canvassed expert views on AI education for the health workforce in terms of educational strategies, subject matter priorities, meaningful learning activities, desired attitudes, and skills. A total of 39 senior people from different health workforce subgroups across Australia provided ratings and free-text responses in late 2020. Results: The responses highlighted the importance of education on ethical implications, suitability of large data sets for use in AI clinical applications, principles of machine learning, and specific diagnosis and treatment applications of AI as well as alterations to cognitive load during clinical work and the interaction between humans and machines in clinical settings. Respondents also outlined barriers to implementation, such as lack of governance structures and processes, resource constraints, and cultural adjustment. Conclusions: Further work around the world of the kind reported in this survey can assist educators and education authorities who are responsible for preparing the health workforce to minimize the risks and realize the benefits of implementing AI in health care. %M 35249885 %R 10.2196/35223 %U https://mededu.jmir.org/2022/2/e35223 %U https://doi.org/10.2196/35223 %U http://www.ncbi.nlm.nih.gov/pubmed/35249885 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 5 %N 2 %P e35373 %T Predicting Falls in Long-term Care Facilities: Machine Learning Study %A Thapa,Rahul %A Garikipati,Anurag %A Shokouhi,Sepideh %A Hurtado,Myrna %A Barnes,Gina %A Hoffman,Jana %A Calvert,Jacob %A Katzmann,Lynne %A Mao,Qingqing %A Das,Ritankar %+ Dascena Inc., 12333 Sowden Rd Ste B PMB 65148, Houston, TX, 77080-2059, United States, 1 (510) 826 950, sshokouhi@dascena.com %K vital signs %K machine learning %K blood pressure %K skilled nursing facilities %K independent living facilities %K assisted living facilities %K fall prediction %K elderly care %K elderly population %K older adult %K aging %D 2022 %7 1.4.2022 %9 Original Paper %J JMIR Aging %G English %X Background: Short-term fall prediction models that use electronic health records (EHRs) may enable the implementation of dynamic care practices that specifically address changes in individualized fall risk within senior care facilities. Objective: The aim of this study is to implement machine learning (ML) algorithms that use EHR data to predict a 3-month fall risk in residents from a variety of senior care facilities providing different levels of care. Methods: This retrospective study obtained EHR data (2007-2021) from Juniper Communities’ proprietary database of 2785 individuals primarily residing in skilled nursing facilities, independent living facilities, and assisted living facilities across the United States. We assessed the performance of 3 ML-based fall prediction models and the Juniper Communities’ fall risk assessment. Additional analyses were conducted to examine how changes in the input features, training data sets, and prediction windows affected the performance of these models. Results: The Extreme Gradient Boosting model exhibited the highest performance, with an area under the receiver operating characteristic curve of 0.846 (95% CI 0.794-0.894), specificity of 0.848, diagnostic odds ratio of 13.40, and sensitivity of 0.706, while achieving the best trade-off in balancing true positive and negative rates. The number of active medications was the most significant feature associated with fall risk, followed by a resident’s number of active diseases and several variables associated with vital signs, including diastolic blood pressure and changes in weight and respiratory rates. The combination of vital signs with traditional risk factors as input features achieved higher prediction accuracy than using either group of features alone. Conclusions: This study shows that the Extreme Gradient Boosting technique can use a large number of features from EHR data to make short-term fall predictions with a better performance than that of conventional fall risk assessments and other ML models. The integration of routinely collected EHR data, particularly vital signs, into fall prediction models may generate more accurate fall risk surveillance than models without vital signs. Our data support the use of ML models for dynamic, cost-effective, and automated fall predictions in different types of senior care facilities. %M 35363146 %R 10.2196/35373 %U https://aging.jmir.org/2022/2/e35373 %U https://doi.org/10.2196/35373 %U http://www.ncbi.nlm.nih.gov/pubmed/35363146 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 4 %P e33145 %T Stakeholder Perspectives on Clinical Decision Support Tools to Inform Clinical Artificial Intelligence Implementation: Protocol for a Framework Synthesis for Qualitative Evidence %A Al-Zubaidy,Mohaimen %A Hogg,HD Jeffry %A Maniatopoulos,Gregory %A Talks,James %A Teare,Marion Dawn %A Keane,Pearse A %A R Beyer,Fiona %+ Faculty of Medical Sciences, Newcastle University, Framlington Place, Newcastle upon Tyne, NE1 7RU, United Kingdom, 44 2086000 ext 0191, Jeffry.Hogg@newcastle.ac.uk %K artificial intelligence %K clinical decision support tools %K digital health %K implementation %K qualitative evidence synthesis %K stakeholders %K clinical decision %K decision support %D 2022 %7 1.4.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: Quantitative systematic reviews have identified clinical artificial intelligence (AI)-enabled tools with adequate performance for real-world implementation. To our knowledge, no published report or protocol synthesizes the full breadth of stakeholder perspectives. The absence of such a rigorous foundation perpetuates the “AI chasm,” which continues to delay patient benefit. Objective: The aim of this research is to synthesize stakeholder perspectives of computerized clinical decision support tools in any health care setting. Synthesized findings will inform future research and the implementation of AI into health care services. Methods: The search strategy will use MEDLINE (Ovid), Scopus, CINAHL (EBSCO), ACM Digital Library, and Science Citation Index (Web of Science). Following deduplication, title, abstract, and full text screening will be performed by 2 independent reviewers with a third topic expert arbitrating. The quality of included studies will be appraised to support interpretation. Best-fit framework synthesis will be performed, with line-by-line coding completed by 2 independent reviewers. Where appropriate, these findings will be assigned to 1 of 22 a priori themes defined by the Nonadoption, Abandonment, Scale-up, Spread, and Sustainability framework. New domains will be inductively generated for outlying findings. The placement of findings within themes will be reviewed iteratively by a study advisory group including patient and lay representatives. Results: Study registration was obtained from PROSPERO (CRD42021256005) in May 2021. Final searches were executed in April, and screening is ongoing at the time of writing. Full text data analysis is due to be completed in October 2021. We anticipate that the study will be submitted for open-access publication in late 2021. Conclusions: This paper describes the protocol for a qualitative evidence synthesis aiming to define barriers and facilitators to the implementation of computerized clinical decision support tools from all relevant stakeholders. The results of this study are intended to expedite the delivery of patient benefit from AI-enabled clinical tools. Trial Registration: PROSPERO CRD42021256005; https://tinyurl.com/r4x3thvp International Registered Report Identifier (IRRID): DERR1-10.2196/33145 %M 35363141 %R 10.2196/33145 %U https://www.researchprotocols.org/2022/4/e33145 %U https://doi.org/10.2196/33145 %U http://www.ncbi.nlm.nih.gov/pubmed/35363141 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 3 %P e34201 %T Leveraging Large-Scale Electronic Health Records and Interpretable Machine Learning for Clinical Decision Making at the Emergency Department: Protocol for System Development and Validation %A Liu,Nan %A Xie,Feng %A Siddiqui,Fahad Javaid %A Ho,Andrew Fu Wah %A Chakraborty,Bibhas %A Nadarajan,Gayathri Devi %A Tan,Kenneth Boon Kiat %A Ong,Marcus Eng Hock %+ Programme in Health Services and Systems Research, Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore, 65 66016503, liu.nan@duke-nus.edu.sg %K electronic health records %K machine learning %K clinical decision making %K emergency department %D 2022 %7 25.3.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: There is a growing demand globally for emergency department (ED) services. An increase in ED visits has resulted in overcrowding and longer waiting times. The triage process plays a crucial role in assessing and stratifying patients’ risks and ensuring that the critically ill promptly receive appropriate priority and emergency treatment. A substantial amount of research has been conducted on the use of machine learning tools to construct triage and risk prediction models; however, the black box nature of these models has limited their clinical application and interpretation. Objective: In this study, we plan to develop an innovative, dynamic, and interpretable System for Emergency Risk Triage (SERT) for risk stratification in the ED by leveraging large-scale electronic health records (EHRs) and machine learning. Methods: To achieve this objective, we will conduct a retrospective, single-center study based on a large, longitudinal data set obtained from the EHRs of the largest tertiary hospital in Singapore. Study outcomes include adverse events experienced by patients, such as the need for an intensive care unit and inpatient death. With preidentified candidate variables drawn from expert opinions and relevant literature, we will apply an interpretable machine learning–based AutoScore to develop 3 SERT scores. These 3 scores can be used at different times in the ED, that is, on arrival, during ED stay, and at admission. Furthermore, we will compare our novel SERT scores with established clinical scores and previously described black box machine learning models as baselines. Receiver operating characteristic analysis will be conducted on the testing cohorts for performance evaluation. Results: The study is currently being conducted. The extracted data indicate approximately 1.8 million ED visits by over 810,000 unique patients. Modelling results are expected to be published in 2022. Conclusions: The SERT scoring system proposed in this study will be unique and innovative because of its dynamic nature and modelling transparency. If successfully validated, our proposed solution will establish a standard for data processing and modelling by taking advantage of large-scale EHRs and interpretable machine learning tools. International Registered Report Identifier (IRRID): DERR1-10.2196/34201 %M 35333179 %R 10.2196/34201 %U https://www.researchprotocols.org/2022/3/e34201 %U https://doi.org/10.2196/34201 %U http://www.ncbi.nlm.nih.gov/pubmed/35333179 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 9 %N 1 %P e28639 %T Human Factors and Technological Characteristics Influencing the Interaction of Medical Professionals With Artificial Intelligence–Enabled Clinical Decision Support Systems: Literature Review %A Knop,Michael %A Weber,Sebastian %A Mueller,Marius %A Niehaves,Bjoern %+ Department of Information Systems, University of Siegen, Kohlbettstrasse 15, Siegen, 57072, Germany, 49 15755910502, michael.knop@uni-siegen.de %K artificial intelligence %K clinical decision support systems %K CDSS %K decision-making %K diagnostic decision support %K human–computer interaction %K human–AI collaboration %K machine learning %K patient outcomes %K deep learning %K trust %K literature review %D 2022 %7 24.3.2022 %9 Review %J JMIR Hum Factors %G English %X Background: The digitization and automation of diagnostics and treatments promise to alter the quality of health care and improve patient outcomes, whereas the undersupply of medical personnel, high workload on medical professionals, and medical case complexity increase. Clinical decision support systems (CDSSs) have been proven to help medical professionals in their everyday work through their ability to process vast amounts of patient information. However, comprehensive adoption is partially disrupted by specific technological and personal characteristics. With the rise of artificial intelligence (AI), CDSSs have become an adaptive technology with human-like capabilities and are able to learn and change their characteristics over time. However, research has not reflected on the characteristics and factors essential for effective collaboration between human actors and AI-enabled CDSSs. Objective: Our study aims to summarize the factors influencing effective collaboration between medical professionals and AI-enabled CDSSs. These factors are essential for medical professionals, management, and technology designers to reflect on the adoption, implementation, and development of an AI-enabled CDSS. Methods: We conducted a literature review including 3 different meta-databases, screening over 1000 articles and including 101 articles for full-text assessment. Of the 101 articles, 7 (6.9%) met our inclusion criteria and were analyzed for our synthesis. Results: We identified the technological characteristics and human factors that appear to have an essential effect on the collaboration of medical professionals and AI-enabled CDSSs in accordance with our research objective, namely, training data quality, performance, explainability, adaptability, medical expertise, technological expertise, personality, cognitive biases, and trust. Comparing our results with those from research on non-AI CDSSs, some characteristics and factors retain their importance, whereas others gain or lose relevance owing to the uniqueness of human-AI interactions. However, only a few (1/7, 14%) studies have mentioned the theoretical foundations and patient outcomes related to AI-enabled CDSSs. Conclusions: Our study provides a comprehensive overview of the relevant characteristics and factors that influence the interaction and collaboration between medical professionals and AI-enabled CDSSs. Rather limited theoretical foundations currently hinder the possibility of creating adequate concepts and models to explain and predict the interrelations between these characteristics and factors. For an appropriate evaluation of the human-AI collaboration, patient outcomes and the role of patients in the decision-making process should be considered. %M 35323118 %R 10.2196/28639 %U https://humanfactors.jmir.org/2022/1/e28639 %U https://doi.org/10.2196/28639 %U http://www.ncbi.nlm.nih.gov/pubmed/35323118 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 3 %P e29943 %T A Novel Diagnostic Decision Support System for Medical Professionals: Prospective Feasibility Study %A Timiliotis,Joanna %A Blümke,Bibiana %A Serfözö,Peter Daniel %A Gilbert,Stephen %A Ondrésik,Marta %A Türk,Ewelina %A Hirsch,Martin Christian %A Eckstein,Jens %+ CMIO Research Group, Digitalization & ICT Department, University Hospital Basel, Hebelstrasse, 10, Basel, 4031, Switzerland, 41 0613285489, joanna.timiliotis@usb.ch %K diagnostic decision support system %K DDSS %K probabilistic reasoning %K artificial intelligence %K dyspnea %K emergency department %K internal medicine %K symptom checker %D 2022 %7 24.3.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Continuously growing medical knowledge and the increasing amount of data make it difficult for medical professionals to keep track of all new information and to place it in the context of existing information. A variety of digital technologies and artificial intelligence–based methods are currently available as persuasive tools to empower physicians in clinical decision-making and improve health care quality. A novel diagnostic decision support system (DDSS) prototype developed by Ada Health GmbH with a focus on traceability, transparency, and usability will be examined more closely in this study. Objective: The aim of this study is to test the feasibility and functionality of a novel DDSS prototype, exploring its potential and performance in identifying the underlying cause of acute dyspnea in patients at the University Hospital Basel. Methods: A prospective, observational feasibility study was conducted at the emergency department (ED) and internal medicine ward of the University Hospital Basel, Switzerland. A convenience sample of 20 adult patients admitted to the ED with dyspnea as the chief complaint and a high probability of inpatient admission was selected. A study physician followed the patients admitted to the ED throughout the hospitalization without interfering with the routine clinical work. Routinely collected health-related personal data from these patients were entered into the DDSS prototype. The DDSS prototype’s resulting disease probability list was compared with the gold-standard main diagnosis provided by the treating physician. Results: The DDSS presented information with high clarity and had a user-friendly, novel, and transparent interface. The DDSS prototype was not perfectly suited for the ED as case entry was time-consuming (1.5-2 hours per case). It provided accurate decision support in the clinical inpatient setting (average of cases in which the correct diagnosis was the first diagnosis listed: 6/20, 30%, SD 2.10%; average of cases in which the correct diagnosis was listed as one of the top 3: 11/20, 55%, SD 2.39%; average of cases in which the correct diagnosis was listed as one of the top 5: 14/20, 70%, SD 2.26%) in patients with dyspnea as the main presenting complaint. Conclusions: The study of the feasibility and functionality of the tool was successful, with some limitations. Used in the right place, the DDSS has the potential to support physicians in their decision-making process by showing new pathways and unintentionally ignored diagnoses. The DDSS prototype had some limitations regarding the process of data input, diagnostic accuracy, and completeness of the integrated medical knowledge. The results of this study provide a basis for the tool’s further development. In addition, future studies should be conducted with the aim to overcome the current limitations of the tool and study design. Trial Registration: ClinicalTrials.gov NCT04827342; https://clinicaltrials.gov/ct2/show/NCT04827342 %M 35323125 %R 10.2196/29943 %U https://formative.jmir.org/2022/3/e29943 %U https://doi.org/10.2196/29943 %U http://www.ncbi.nlm.nih.gov/pubmed/35323125 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 3 %P e27210 %T A Question-and-Answer System to Extract Data From Free-Text Oncological Pathology Reports (CancerBERT Network): Development Study %A Mitchell,Joseph Ross %A Szepietowski,Phillip %A Howard,Rachel %A Reisman,Phillip %A Jones,Jennie D %A Lewis,Patricia %A Fridley,Brooke L %A Rollison,Dana E %+ Department of Health Data Services, H Lee Moffitt Cancer Center and Research Institute, 12902 Magnolia Drive, Tampa, FL, 33612, United States, 1 813 745 6530, Dana.Rollison@moffitt.org %K natural language processing %K NLP %K BERT %K transformer %K pathology %K ICD-O-3 %K deep learning %K cancer %D 2022 %7 23.3.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Information in pathology reports is critical for cancer care. Natural language processing (NLP) systems used to extract information from pathology reports are often narrow in scope or require extensive tuning. Consequently, there is growing interest in automated deep learning approaches. A powerful new NLP algorithm, bidirectional encoder representations from transformers (BERT), was published in late 2018. BERT set new performance standards on tasks as diverse as question answering, named entity recognition, speech recognition, and more. Objective: The aim of this study is to develop a BERT-based system to automatically extract detailed tumor site and histology information from free-text oncological pathology reports. Methods: We pursued three specific aims: extract accurate tumor site and histology descriptions from free-text pathology reports, accommodate the diverse terminology used to indicate the same pathology, and provide accurate standardized tumor site and histology codes for use by downstream applications. We first trained a base language model to comprehend the technical language in pathology reports. This involved unsupervised learning on a training corpus of 275,605 electronic pathology reports from 164,531 unique patients that included 121 million words. Next, we trained a question-and-answer (Q&A) model that connects a Q&A layer to the base pathology language model to answer pathology questions. Our Q&A system was designed to search for the answers to two predefined questions in each pathology report: What organ contains the tumor? and What is the kind of tumor or carcinoma? This involved supervised training on 8197 pathology reports, each with ground truth answers to these 2 questions determined by certified tumor registrars. The data set included 214 tumor sites and 193 histologies. The tumor site and histology phrases extracted by the Q&A model were used to predict International Classification of Diseases for Oncology, Third Edition (ICD-O-3), site and histology codes. This involved fine-tuning two additional BERT models: one to predict site codes and another to predict histology codes. Our final system includes a network of 3 BERT-based models. We call this CancerBERT network (caBERTnet). We evaluated caBERTnet using a sequestered test data set of 2050 pathology reports with ground truth answers determined by certified tumor registrars. Results: caBERTnet’s accuracies for predicting group-level site and histology codes were 93.53% (1895/2026) and 97.6% (1993/2042), respectively. The top 5 accuracies for predicting fine-grained ICD-O-3 site and histology codes with 5 or more samples each in the training data set were 92.95% (1794/1930) and 96.01% (1853/1930), respectively. Conclusions: We have developed an NLP system that outperforms existing algorithms at predicting ICD-O-3 codes across an extensive range of tumor sites and histologies. Our new system could help reduce treatment delays, increase enrollment in clinical trials of new therapies, and improve patient outcomes. %M 35319481 %R 10.2196/27210 %U https://www.jmir.org/2022/3/e27210 %U https://doi.org/10.2196/27210 %U http://www.ncbi.nlm.nih.gov/pubmed/35319481 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 3 %P e28750 %T Identity Threats as a Reason for Resistance to Artificial Intelligence: Survey Study With Medical Students and Professionals %A Jussupow,Ekaterina %A Spohrer,Kai %A Heinzl,Armin %+ University of Mannheim, L15 1-6, Mannheim, 68313, Germany, 49 621 181 1691, jussupow@uni-mannheim.de %K artificial intelligence %K professional identity %K identity threat %K survey %K resistance %D 2022 %7 23.3.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Information systems based on artificial intelligence (AI) have increasingly spurred controversies among medical professionals as they start to outperform medical experts in tasks that previously required complex human reasoning. Prior research in other contexts has shown that such a technological disruption can result in professional identity threats and provoke negative attitudes and resistance to using technology. However, little is known about how AI systems evoke professional identity threats in medical professionals and under which conditions they actually provoke negative attitudes and resistance. Objective: The aim of this study is to investigate how medical professionals’ resistance to AI can be understood because of professional identity threats and temporal perceptions of AI systems. It examines the following two dimensions of medical professional identity threat: threats to physicians’ expert status (professional recognition) and threats to physicians’ role as an autonomous care provider (professional capabilities). This paper assesses whether these professional identity threats predict resistance to AI systems and change in importance under the conditions of varying professional experience and varying perceived temporal relevance of AI systems. Methods: We conducted 2 web-based surveys with 164 medical students and 42 experienced physicians across different specialties. The participants were provided with a vignette of a general medical AI system. We measured the experienced identity threats, resistance attitudes, and perceived temporal distance of AI. In a subsample, we collected additional data on the perceived identity enhancement to gain a better understanding of how the participants perceived the upcoming technological change as beyond a mere threat. Qualitative data were coded in a content analysis. Quantitative data were analyzed in regression analyses. Results: Both threats to professional recognition and threats to professional capabilities contributed to perceived self-threat and resistance to AI. Self-threat was negatively associated with resistance. Threats to professional capabilities directly affected resistance to AI, whereas the effect of threats to professional recognition was fully mediated through self-threat. Medical students experienced stronger identity threats and resistance to AI than medical professionals. The temporal distance of AI changed the importance of professional identity threats. If AI systems were perceived as relevant only in the distant future, the effect of threats to professional capabilities was weaker, whereas the effect of threats to professional recognition was stronger. The effect of threats remained robust after including perceived identity enhancement. The results show that the distinct dimensions of medical professional identity are affected by the upcoming technological change through AI. Conclusions: Our findings demonstrate that AI systems can be perceived as a threat to medical professional identity. Both threats to professional recognition and threats to professional capabilities contribute to resistance attitudes toward AI and need to be considered in the implementation of AI systems in clinical practice. %M 35319465 %R 10.2196/28750 %U https://formative.jmir.org/2022/3/e28750 %U https://doi.org/10.2196/28750 %U http://www.ncbi.nlm.nih.gov/pubmed/35319465 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 9 %N 1 %P e24680 %T Acceptance of the Use of Artificial Intelligence in Medicine Among Japan’s Doctors and the Public: A Questionnaire Survey %A Tamori,Honoka %A Yamashina,Hiroko %A Mukai,Masami %A Morii,Yasuhiro %A Suzuki,Teppei %A Ogasawara,Katsuhiko %+ Faculty of Health Sciences, Hokkaido University, N12-W5, Kita-ku, Sapporo, 0600812, Japan, 81 11 706 3409, oga@hs.hokudai.ac.jp %K artificial intelligence %K technology acceptance %K surveys and questionnaires %K doctors vs public %D 2022 %7 16.3.2022 %9 Original Paper %J JMIR Hum Factors %G English %X Background: The use of artificial intelligence (AI) in the medical industry promises many benefits, so AI has been introduced to medical practice primarily in developed countries. In Japan, the government is preparing for the rollout of AI in the medical industry. This rollout depends on doctors and the public accepting the technology. Therefore it is necessary to consider acceptance among doctors and among the public. However, little is known about the acceptance of AI in medicine in Japan. Objective: This study aimed to obtain detailed data on the acceptance of AI in medicine by comparing the acceptance among Japanese doctors with that among the Japanese public. Methods: We conducted an online survey, and the responses of doctors and members of the public were compared. AI in medicine was defined as the use of AI to determine diagnosis and treatment without requiring a doctor. A questionnaire was prepared referred to as the unified theory of acceptance and use of technology, a model of behavior toward new technologies. It comprises 20 items, and each item was rated on a five-point scale. Using this questionnaire, we conducted an online survey in 2018 among 399 doctors and 600 members of the public. The sample-wide responses were analyzed, and then the responses of the doctors were compared with those of the public using t tests. Results: Regarding the sample-wide responses (N=999), 653 (65.4%) of the respondents believed, in the future, AI in medicine would be necessary, whereas only 447 (44.7%) expressed an intention to use AI-driven medicine. Additionally, 730 (73.1%) believed that regulatory legislation was necessary, and 734 (73.5%) were concerned about where accountability lies. Regarding the comparison between doctors and the public, doctors (mean 3.43, SD 1.00) were more likely than members of the public (mean 3.23, SD 0.92) to express intention to use AI-driven medicine (P<.001), suggesting that optimism about AI in medicine is greater among doctors compared to the public. Conclusions: Many of the respondents were optimistic about the role of AI in medicine. However, when asked whether they would like to use AI-driven medicine, they tended to give a negative response. This trend suggests that concerns about the lack of regulation and about accountability hindered acceptance. Additionally, the results revealed that doctors were more enthusiastic than members of the public regarding AI-driven medicine. For the successful implementation of AI in medicine, it would be necessary to inform the public and doctors about the relevant laws and to take measures to remove their concerns about them. %M 35293878 %R 10.2196/24680 %U https://humanfactors.jmir.org/2022/1/e24680 %U https://doi.org/10.2196/24680 %U http://www.ncbi.nlm.nih.gov/pubmed/35293878 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 3 %P e28880 %T Using a Convolutional Neural Network and Convolutional Long Short-term Memory to Automatically Detect Aneurysms on 2D Digital Subtraction Angiography Images: Framework Development and Validation %A Liao,JunHua %A Liu,LunXin %A Duan,HaiHan %A Huang,YunZhi %A Zhou,LiangXue %A Chen,LiangYin %A Wang,ChaoHua %+ Department of Neurosurgery, West China Hospital, Sichuan University, No. 37 Guoxue Lane, Wuhou District, Chengdu, 610041, China, 86 18628169123, wangchaohuaHX@163.com %K convolutional neural network %K convolutional long short-term memory %K cerebral aneurysm %K deep learning %D 2022 %7 16.3.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: It is hard to distinguish cerebral aneurysms from overlapping vessels in 2D digital subtraction angiography (DSA) images due to these images’ lack of spatial information. Objective: The aims of this study were to (1) construct a deep learning diagnostic system to improve the ability to detect posterior communicating artery aneurysms on 2D DSA images and (2) validate the efficiency of the deep learning diagnostic system in 2D DSA aneurysm detection. Methods: We proposed a 2-stage detection system. First, we established the region localization stage to automatically locate specific detection regions of raw 2D DSA sequences. Second, in the intracranial aneurysm detection stage, we constructed a bi-input+RetinaNet+convolutional long short-term memory (C-LSTM) framework to compare its performance for aneurysm detection with that of 3 existing frameworks. Each of the frameworks had a 5-fold cross-validation scheme. The receiver operating characteristic curve, the area under the curve (AUC) value, mean average precision, sensitivity, specificity, and accuracy were used to assess the abilities of different frameworks. Results: A total of 255 patients with posterior communicating artery aneurysms and 20 patients without aneurysms were included in this study. The best AUC values of the RetinaNet, RetinaNet+C-LSTM, bi-input+RetinaNet, and bi-input+RetinaNet+C-LSTM frameworks were 0.95, 0.96, 0.92, and 0.97, respectively. The mean sensitivities of the RetinaNet, RetinaNet+C-LSTM, bi-input+RetinaNet, and bi-input+RetinaNet+C-LSTM frameworks and human experts were 89% (range 67.02%-98.43%), 88% (range 65.76%-98.06%), 87% (range 64.53%-97.66%), 89% (range 67.02%-98.43%), and 90% (range 68.30%-98.77%), respectively. The mean specificities of the RetinaNet, RetinaNet+C-LSTM, bi-input+RetinaNet, and bi-input+RetinaNet+C-LSTM frameworks and human experts were 80% (range 56.34%-94.27%), 89% (range 67.02%-98.43%), 86% (range 63.31%-97.24%), 93% (range 72.30%-99.56%), and 90% (range 68.30%-98.77%), respectively. The mean accuracies of the RetinaNet, RetinaNet+C-LSTM, bi-input+RetinaNet, and bi-input+RetinaNet+C-LSTM frameworks and human experts were 84.50% (range 69.57%-93.97%), 88.50% (range 74.44%-96.39%), 86.50% (range 71.97%-95.22%), 91% (range 77.63%-97.72%), and 90% (range 76.34%-97.21%), respectively. Conclusions: According to our results, more spatial and temporal information can help improve the performance of the frameworks. Therefore, the bi-input+RetinaNet+C-LSTM framework had the best performance when compared to that of the other frameworks. Our study demonstrates that our system can assist physicians in detecting intracranial aneurysms on 2D DSA images. %M 35294371 %R 10.2196/28880 %U https://medinform.jmir.org/2022/3/e28880 %U https://doi.org/10.2196/28880 %U http://www.ncbi.nlm.nih.gov/pubmed/35294371 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 3 %P e33560 %T Use of Mobile and Wearable Artificial Intelligence in Child and Adolescent Psychiatry: Scoping Review %A Welch,Victoria %A Wy,Tom Joshua %A Ligezka,Anna %A Hassett,Leslie C %A Croarkin,Paul E %A Athreya,Arjun P %A Romanowicz,Magdalena %+ Department of Psychiatry and Psychology, Mayo Clinic, 200 1st St SW, Rochester, MN, 55905, United States, 1 5072556782, romanowicz.magdalena@mayo.edu %K mobile computing %K artificial intelligence %K wearable technologies %K child psychiatry %D 2022 %7 14.3.2022 %9 Review %J J Med Internet Res %G English %X Background: Mental health disorders are a leading cause of medical disabilities across an individual’s lifespan. This burden is particularly substantial in children and adolescents because of challenges in diagnosis and the lack of precision medicine approaches. However, the widespread adoption of wearable devices (eg, smart watches) that are conducive for artificial intelligence applications to remotely diagnose and manage psychiatric disorders in children and adolescents is promising. Objective: This study aims to conduct a scoping review to study, characterize, and identify areas of innovations with wearable devices that can augment current in-person physician assessments to individualize diagnosis and management of psychiatric disorders in child and adolescent psychiatry. Methods: This scoping review used information from the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. A comprehensive search of several databases from 2011 to June 25, 2021, limited to the English language and excluding animal studies, was conducted. The databases included Ovid MEDLINE and Epub ahead of print, in-process and other nonindexed citations, and daily; Ovid Embase; Ovid Cochrane Central Register of Controlled Trials; Ovid Cochrane Database of Systematic Reviews; Web of Science; and Scopus. Results: The initial search yielded 344 articles, from which 19 (5.5%) articles were left on the final source list for this scoping review. Articles were divided into three main groups as follows: studies with the main focus on autism spectrum disorder, attention-deficit/hyperactivity disorder, and internalizing disorders such as anxiety disorders. Most of the studies used either cardio-fitness chest straps with electrocardiogram sensors or wrist-worn biosensors, such as watches by Fitbit. Both allowed passive data collection of the physiological signals. Conclusions: Our scoping review found a large heterogeneity of methods and findings in artificial intelligence studies in child psychiatry. Overall, the largest gap identified in this scoping review is the lack of randomized controlled trials, as most studies available were pilot studies and feasibility trials. %M 35285812 %R 10.2196/33560 %U https://www.jmir.org/2022/3/e33560 %U https://doi.org/10.2196/33560 %U http://www.ncbi.nlm.nih.gov/pubmed/35285812 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 3 %P e27654 %T Thematic Analysis on User Reviews for Depression and Anxiety Chatbot Apps: Machine Learning Approach %A Ahmed,Arfan %A Aziz,Sarah %A Khalifa,Mohamed %A Shah,Uzair %A Hassan,Asma %A Abd-Alrazaq,Alaa %A Househ,Mowafa %+ Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Liberal Arts and Sciences Building, Education City, Doha Al Luqta St, Ar-Rayyan, Doha, PO Box 5825, Qatar, 974 33223401, mhouseh@hbku.edu.qa %K anxiety %K depression %K chatbots %K conversational agents %K topic modeling %K latent Dirichlet allocation %K thematic analysis %K mobile phone %D 2022 %7 11.3.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Anxiety and depression are among the most commonly prevalent mental health disorders worldwide. Chatbot apps can play an important role in relieving anxiety and depression. Users’ reviews of chatbot apps are considered an important source of data for exploring users’ opinions and satisfaction. Objective: This study aims to explore users’ opinions, satisfaction, and attitudes toward anxiety and depression chatbot apps by conducting a thematic analysis of users’ reviews of 11 anxiety and depression chatbot apps collected from the Google Play Store and Apple App Store. In addition, we propose a workflow to provide a methodological approach for future analysis of app review comments. Methods: We analyzed 205,581 user review comments from chatbots designed for users with anxiety and depression symptoms. Using scraper tools and Google Play Scraper and App Store Scraper Python libraries, we extracted the text and metadata. The reviews were divided into positive and negative meta-themes based on users’ rating per review. We analyzed the reviews using word frequencies of bigrams and words in pairs. A topic modeling technique, latent Dirichlet allocation, was applied to identify topics in the reviews and analyzed to detect themes and subthemes. Results: Thematic analysis was conducted on 5 topics for each sentimental set. Reviews were categorized as positive or negative. For positive reviews, the main themes were confidence and affirmation building, adequate analysis, and consultation, caring as a friend, and ease of use. For negative reviews, the results revealed the following themes: usability issues, update issues, privacy, and noncreative conversations. Conclusions: Using a machine learning approach, we were able to analyze ≥200,000 comments and categorize them into themes, allowing us to observe users’ expectations effectively despite some negative factors. A methodological workflow is provided for the future analysis of review comments. %M 35275069 %R 10.2196/27654 %U https://formative.jmir.org/2022/3/e27654 %U https://doi.org/10.2196/27654 %U http://www.ncbi.nlm.nih.gov/pubmed/35275069 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 3 %P e33006 %T Web-Based Skin Cancer Assessment and Classification Using Machine Learning and Mobile Computerized Adaptive Testing in a Rasch Model: Development Study %A Yang,Ting-Ya %A Chien,Tsair-Wei %A Lai,Feng-Jie %+ Department of Dermatology, Chi-Mei Medical Center, 901, Zhonghua Rd, Yongkang District, Tainan, 710, Taiwan, 886 6 2812811 ext 57109, lai.fengjie@gmail.com %K skin cancer assessment %K computerized adaptive testing %K naïve Bayes %K k-nearest neighbors %K logistic regression %K Rasch partial credit model %K receiver operating characteristic curve %K mobile phone %D 2022 %7 9.3.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: Web-based computerized adaptive testing (CAT) implementation of the skin cancer (SC) risk scale could substantially reduce participant burden without compromising measurement precision. However, the CAT of SC classification has not been reported in academics thus far. Objective: We aim to build a CAT-based model using machine learning to develop an app for automatic classification of SC to help patients assess the risk at an early stage. Methods: We extracted data from a population-based Australian cohort study of SC risk (N=43,794) using the Rasch simulation scheme. All 30 feature items were calibrated using the Rasch partial credit model. A total of 1000 cases following a normal distribution (mean 0, SD 1) based on the item and threshold difficulties were simulated using three techniques of machine learning—naïve Bayes, k-nearest neighbors, and logistic regression—to compare the model accuracy in training and testing data sets with a proportion of 70:30, where the former was used to predict the latter. We calculated the sensitivity, specificity, receiver operating characteristic curve (area under the curve [AUC]), and CIs along with the accuracy and precision across the proposed models for comparison. An app that classifies the SC risk of the respondent was developed. Results: We observed that the 30-item k-nearest neighbors model yielded higher AUC values of 99% and 91% for the 700 training and 300 testing cases, respectively, than its 2 counterparts using the hold-out validation but had lower AUC values of 85% (95% CI 83%-87%) in the k-fold cross-validation and that an app that predicts SC classification for patients was successfully developed and demonstrated in this study. Conclusions: The 30-item SC prediction model, combined with the Rasch web-based CAT, is recommended for classifying SC in patients. An app we developed to help patients self-assess SC risk at an early stage is required for application in the future. %M 35262505 %R 10.2196/33006 %U https://medinform.jmir.org/2022/3/e33006 %U https://doi.org/10.2196/33006 %U http://www.ncbi.nlm.nih.gov/pubmed/35262505 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 3 %P e34920 %T Toward Successful Implementation of Artificial Intelligence in Health Care Practice: Protocol for a Research Program %A Svedberg,Petra %A Reed,Julie %A Nilsen,Per %A Barlow,James %A Macrae,Carl %A Nygren,Jens %+ School of Health and Welfare, Halmstad University, Box 823, Halmstad, 30118, Sweden, 46 035167100, jens.nygren@hh.se %K process evaluation %K complex intervention %K implementation %K knowledge exchange %K health policy %K organizational change %K capacity building %K qualitative methods %K framework analysis %D 2022 %7 9.3.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: The uptake of artificial intelligence (AI) in health care is at an early stage. Recent studies have shown a lack of AI-specific implementation theories, models, or frameworks that could provide guidance for how to translate the potential of AI into daily health care practices. This protocol provides an outline for the first 5 years of a research program seeking to address this knowledge-practice gap through collaboration and co-design between researchers, health care professionals, patients, and industry stakeholders. Objective: The first part of the program focuses on two specific objectives. The first objective is to develop a theoretically informed framework for AI implementation in health care that can be applied to facilitate such implementation in routine health care practice. The second objective is to carry out empirical AI implementation studies, guided by the framework for AI implementation, and to generate learning for enhanced knowledge and operational insights to guide further refinement of the framework. The second part of the program addresses a third objective, which is to apply the developed framework in clinical practice in order to develop regional capacity to provide the practical resources, competencies, and organizational structure required for AI implementation; however, this objective is beyond the scope of this protocol. Methods: This research program will use a logic model to structure the development of a methodological framework for planning and evaluating implementation of AI systems in health care and to support capacity building for its use in practice. The logic model is divided into time-separated stages, with a focus on theory-driven and coproduced framework development. The activities are based on both knowledge development, using existing theory and literature reviews, and method development by means of co-design and empirical investigations. The activities will involve researchers, health care professionals, and other stakeholders to create a multi-perspective understanding. Results: The project started on July 1, 2021, with the Stage 1 activities, including model overview, literature reviews, stakeholder mapping, and impact cases; we will then proceed with Stage 2 activities. Stage 1 and 2 activities will continue until June 30, 2026. Conclusions: There is a need to advance theory and empirical evidence on the implementation requirements of AI systems in health care, as well as an opportunity to bring together insights from research on the development, introduction, and evaluation of AI systems and existing knowledge from implementation research literature. Therefore, with this research program, we intend to build an understanding, using both theoretical and empirical approaches, of how the implementation of AI systems should be approached in order to increase the likelihood of successful and widespread application in clinical practice. International Registered Report Identifier (IRRID): PRR1-10.2196/34920 %M 35262500 %R 10.2196/34920 %U https://www.researchprotocols.org/2022/3/e34920 %U https://doi.org/10.2196/34920 %U http://www.ncbi.nlm.nih.gov/pubmed/35262500 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 3 %P e27691 %T Primary Care: The Actual Intelligence Required for Artificial Intelligence to Advance Health Care and Improve Health %A Liaw,Winston R %A Westfall,John M %A Williamson,Tyler S %A Jabbarpour,Yalda %A Bazemore,Andrew %+ Department of Health Systems and Population Health Sciences, University of Houston, 4349 Martin Luther King Blvd, Houston, TX, 77204, United States, 1 7137439862, winstonrliaw@gmail.com %K artificial intelligence %K primary care %D 2022 %7 8.3.2022 %9 Viewpoint %J JMIR Med Inform %G English %X With conversational agents triaging symptoms, cameras aiding diagnoses, and remote sensors monitoring vital signs, the use of artificial intelligence (AI) outside of hospitals has the potential to improve health, according to a recently released report from the National Academy of Medicine. Despite this promise, the success of AI is not guaranteed, and stakeholders need to be involved with its development to ensure that the resulting tools can be easily used by clinicians, protect patient privacy, and enhance the value of the care delivered. A crucial stakeholder group missing from the conversation is primary care. As the nation’s largest delivery platform, primary care will have a powerful impact on whether AI is adopted and subsequently exacerbates health disparities. To leverage these benefits, primary care needs to serve as a medical home for AI, broaden its teams and training, and build on government initiatives and funding. %M 35258464 %R 10.2196/27691 %U https://medinform.jmir.org/2022/3/e27691 %U https://doi.org/10.2196/27691 %U http://www.ncbi.nlm.nih.gov/pubmed/35258464 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 3 %P e34896 %T Leveraging Artificial Intelligence to Improve the Diversity of Dermatological Skin Color Pathology: Protocol for an Algorithm Development and Validation Study %A Rezk,Eman %A Eltorki,Mohamed %A El-Dakhakhni,Wael %+ School of Computational Science and Engineering, McMaster University, 1280 Main St W, Hamilton, ON, L8S 4L8, Canada, 1 9055259140, rezke@mcmaster.ca %K artificial intelligence %K skin cancer %K skin tone diversity %K people of color %K image blending %K deep learning %K classification %K early diagnosis %D 2022 %7 8.3.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: The paucity of dark skin images in dermatological textbooks and atlases is a reflection of racial injustice in medicine. The underrepresentation of dark skin images makes diagnosing skin pathology in people of color challenging. For conditions such as skin cancer, in which early diagnosis makes a difference between life and death, people of color have worse prognoses and lower survival rates than people with lighter skin tones as a result of delayed or incorrect diagnoses. Recent advances in artificial intelligence, such as deep learning, offer a potential solution that can be achieved by diversifying the mostly light-skin image repositories through generating images for darker skin tones. Thus, facilitating the development of inclusive cancer early diagnosis systems that are trained and tested on diverse images that truly represent human skin tones. Objective: We aim to develop and evaluate an artificial intelligence–based skin cancer early detection system for all skin tones using clinical images. Methods: This study consists of four phases: (1) Publicly available skin image repositories will be analyzed to quantify the underrepresentation of darker skin tones, (2) Images will be generated for the underrepresented skin tones, (3) Generated images will be extensively evaluated for realism and disease presentation with quantitative image quality assessment as well as qualitative human expert and nonexpert ratings, and (4) The images will be utilized with available light-skin images to develop a robust skin cancer early detection model. Results: This study started in September 2020. The first phase of quantifying the underrepresentation of darker skin tones was completed in March 2021. The second phase of generating the images is in progress and will be completed by March 2022. The third phase is expected to be completed by May 2022, and the final phase is expected to be completed by September 2022. Conclusions: This work is the first step toward expanding skin tone diversity in existing image databases to address the current gap in the underrepresentation of darker skin tones. Once validated, the image bank will be a valuable resource that can potentially be utilized in physician education and in research applications. Furthermore, generated images are expected to improve the generalizability of skin cancer detection. When completed, the model will assist family physicians and general practitioners in evaluating skin lesion severity and in efficient triaging for referral to expert dermatologists. In addition, the model can assist dermatologists in diagnosing skin lesions. International Registered Report Identifier (IRRID): DERR1-10.2196/34896 %M 34983017 %R 10.2196/34896 %U https://www.researchprotocols.org/2022/3/e34896 %U https://doi.org/10.2196/34896 %U http://www.ncbi.nlm.nih.gov/pubmed/34983017 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 3 %P e29506 %T How Can Research on Artificial Empathy Be Enhanced by Applying Deepfakes? %A Yang,Hsuan-Chia %A Rahmanti,Annisa Ristya %A Huang,Chih-Wei %A Li,Yu-Chuan Jack %+ Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, No 172-1, Sec 2 Keelung Rd, Taipei, 106, Taiwan, 886 966 546 813, jack@tmu.edu.tw %K artificial empathy %K deepfakes %K doctor-patient relationship %K face emotion recognition %K artificial intelligence %K facial recognition %K facial emotion recognition %K medical images %K patient %K physician %K therapy %D 2022 %7 4.3.2022 %9 Viewpoint %J J Med Internet Res %G English %X We propose the idea of using an open data set of doctor-patient interactions to develop artificial empathy based on facial emotion recognition. Facial emotion recognition allows a doctor to analyze patients' emotions, so that they can reach out to their patients through empathic care. However, face recognition data sets are often difficult to acquire; many researchers struggle with small samples of face recognition data sets. Further, sharing medical images or videos has not been possible, as this approach may violate patient privacy. The use of deepfake technology is a promising approach to deidentifying video recordings of patients’ clinical encounters. Such technology can revolutionize the implementation of facial emotion recognition by replacing a patient's face in an image or video with an unrecognizable face—one with a facial expression that is similar to that of the original. This technology will further enhance the potential use of artificial empathy in helping doctors provide empathic care to achieve good doctor-patient therapeutic relationships, and this may result in better patient satisfaction and adherence to treatment. %M 35254278 %R 10.2196/29506 %U https://www.jmir.org/2022/3/e29506 %U https://doi.org/10.2196/29506 %U http://www.ncbi.nlm.nih.gov/pubmed/35254278 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 3 %P e30956 %T Reporting of Model Performance and Statistical Methods in Studies That Use Machine Learning to Develop Clinical Prediction Models: Protocol for a Systematic Review %A Weaver,Colin George Wyllie %A Basmadjian,Robert B %A Williamson,Tyler %A McBrien,Kerry %A Sajobi,Tolu %A Boyne,Devon %A Yusuf,Mohamed %A Ronksley,Paul Everett %+ Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Teaching, Research, and Wellness Building 3E18B, 3280 Hospital Drive NW, Calgary, AB, T2N 4Z6, Canada, 1 403 220 8820, peronksl@ucalgary.ca %K machine learning %K clinical prediction %K research reporting %K statistics %K research methods %K clinical prediction models %K artificial intelligence %K modeling %K eHealth %K digital medicine %K prediction %D 2022 %7 3.3.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: With the growing excitement of the potential benefits of using machine learning and artificial intelligence in medicine, the number of published clinical prediction models that use these approaches has increased. However, there is evidence (albeit limited) that suggests that the reporting of machine learning–specific aspects in these studies is poor. Further, there are no reviews assessing the reporting quality or broadly accepted reporting guidelines for these aspects. Objective: This paper presents the protocol for a systematic review that will assess the reporting quality of machine learning–specific aspects in studies that use machine learning to develop clinical prediction models. Methods: We will include studies that use a supervised machine learning algorithm to develop a prediction model for use in clinical practice (ie, for diagnosis or prognosis of a condition or identification of candidates for health care interventions). We will search MEDLINE for studies published in 2019, pseudorandomly sort the records, and screen until we obtain 100 studies that meet our inclusion criteria. We will assess reporting quality with a novel checklist developed in parallel with this review, which includes content derived from existing reporting guidelines, textbooks, and consultations with experts. The checklist will cover 4 key areas where the reporting of machine learning studies is unique: modelling steps (order and data used for each step), model performance (eg, reporting the performance of each model compared), statistical methods (eg, describing the tuning approach), and presentation of models (eg, specifying the predictors that contributed to the final model). Results: We completed data analysis in August 2021 and are writing the manuscript. We expect to submit the results to a peer-reviewed journal in early 2022. Conclusions: This review will contribute to more standardized and complete reporting in the field by identifying areas where reporting is poor and can be improved. Trial Registration: PROSPERO International Prospective Register of Systematic Reviews CRD42020206167; https://www.crd.york.ac.uk/PROSPERO/display_record.php?RecordID=206167 International Registered Report Identifier (IRRID): RR1-10.2196/30956 %M 35238322 %R 10.2196/30956 %U https://www.researchprotocols.org/2022/3/e30956 %U https://doi.org/10.2196/30956 %U http://www.ncbi.nlm.nih.gov/pubmed/35238322 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 3 %P e27934 %T Enabling Eating Detection in a Free-living Environment: Integrative Engineering and Machine Learning Study %A Zhang,Bo %A Deng,Kaiwen %A Shen,Jie %A Cai,Lingrui %A Ratitch,Bohdana %A Fu,Haoda %A Guan,Yuanfang %+ University of Michigan, 2044D Palmer Commons, Ann Arbor, MI, 48109, United States, 1 7347440018, gyuanfan@umich.edu %K deep learning %K eating %K digital watch %D 2022 %7 1.3.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Monitoring eating is central to the care of many conditions such as diabetes, eating disorders, heart diseases, and dementia. However, automatic tracking of eating in a free-living environment remains a challenge because of the lack of a mature system and large-scale, reliable training set. Objective: This study aims to fill in this gap by an integrative engineering and machine learning effort and conducting a large-scale study in terms of monitoring hours on wearable-based eating detection. Methods: This prospective, longitudinal, passively collected study, covering 3828 hours of records, was made possible by programming a digital system that streams diary, accelerometer, and gyroscope data from Apple Watches to iPhones and then transfers the data to the cloud. Results: On the basis of this data collection, we developed deep learning models leveraging spatial and time augmentation and inferring eating at an area under the curve (AUC) of 0.825 within 5 minutes in the general population. In addition, the longitudinal follow-up of the study design encouraged us to develop personalized models that detect eating behavior at an AUC of 0.872. When aggregated to individual meals, the AUC is 0.951. We then prospectively collected an independent validation cohort in a different season of the year and validated the robustness of the models (0.941 for meal-level aggregation). Conclusions: The accuracy of this model and the data streaming platform promises immediate deployment for monitoring eating in applications such as diabetic integrative care. %M 35230244 %R 10.2196/27934 %U https://www.jmir.org/2022/3/e27934 %U https://doi.org/10.2196/27934 %U http://www.ncbi.nlm.nih.gov/pubmed/35230244 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 9 %N 3 %P e27244 %T Detecting and Measuring Depression on Social Media Using a Machine Learning Approach: Systematic Review %A Liu,Danxia %A Feng,Xing Lin %A Ahmed,Farooq %A Shahid,Muhammad %A Guo,Jing %+ Department of Health Policy and Management, School of Public Health, Peking University, 38 Xueyuan Road, Beijing, 100191, China, 86 18086471505, jing624218@bjmu.edu.cn %K depression %K machine learning %K social media %D 2022 %7 1.3.2022 %9 Review %J JMIR Ment Health %G English %X Background: Detection of depression gained prominence soon after this troublesome disease emerged as a serious public health concern worldwide. Objective: This systematic review aims to summarize the findings of previous studies concerning applying machine learning (ML) methods to text data from social media to detect depressive symptoms and to suggest directions for future research in this area. Methods: A bibliographic search was conducted for the period of January 1990 to December 2020 in Google Scholar, PubMed, Medline, ERIC, PsycINFO, and BioMed. Two reviewers retrieved and independently assessed the 418 studies consisting of 322 articles identified through database searching and 96 articles identified through other sources; 17 of the studies met the criteria for inclusion. Results: Of the 17 studies, 10 had identified depression based on researcher-inferred mental status, 5 had identified it based on users’ own descriptions of their mental status, and 2 were identified based on community membership. The ML approaches of 13 of the 17 studies were supervised learning approaches, while 3 used unsupervised learning approaches; the remaining 1 study did not describe its ML approach. Challenges in areas such as sampling, optimization of approaches to prediction and their features, generalizability, privacy, and other ethical issues call for further research. Conclusions: ML approaches applied to text data from users on social media can work effectively in depression detection and could serve as complementary tools in public mental health practice. %M 35230252 %R 10.2196/27244 %U https://mental.jmir.org/2022/3/e27244 %U https://doi.org/10.2196/27244 %U http://www.ncbi.nlm.nih.gov/pubmed/35230252 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 2 %P e33063 %T Panic Attack Prediction Using Wearable Devices and Machine Learning: Development and Cohort Study %A Tsai,Chan-Hen %A Chen,Pei-Chen %A Liu,Ding-Shan %A Kuo,Ying-Ying %A Hsieh,Tsung-Ting %A Chiang,Dai-Lun %A Lai,Feipei %A Wu,Chia-Tung %+ Department of Computer Science and Information Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei City, 106319, Taiwan, 886 978006469, tony006469@gmail.com %K panic disorder %K panic attack %K prediction %K wearable device %K machine learning %K lifestyle %D 2022 %7 15.2.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: A panic attack (PA) is an intense form of anxiety accompanied by multiple somatic presentations, leading to frequent emergency department visits and impairing the quality of life. A prediction model for PAs could help clinicians and patients monitor, control, and carry out early intervention for recurrent PAs, enabling more personalized treatment for panic disorder (PD). Objective: This study aims to provide a 7-day PA prediction model and determine the relationship between a future PA and various features, including physiological factors, anxiety and depressive factors, and the air quality index (AQI). Methods: We enrolled 59 participants with PD (Diagnostic and Statistical Manual of Mental Disorders, 5th edition, and the Mini International Neuropsychiatric Interview). Participants used smartwatches (Garmin Vívosmart 4) and mobile apps to collect their sleep, heart rate (HR), activity level, anxiety, and depression scores (Beck Depression Inventory [BDI], Beck Anxiety Inventory [BAI], State-Trait Anxiety Inventory state anxiety [STAI-S], State-Trait Anxiety Inventory trait anxiety [STAI-T], and Panic Disorder Severity Scale Self-Report) in their real life for a duration of 1 year. We also included AQIs from open data. To analyze these data, our team used 6 machine learning methods: random forests, decision trees, linear discriminant analysis, adaptive boosting, extreme gradient boosting, and regularized greedy forests. Results: For 7-day PA predictions, the random forest produced the best prediction rate. Overall, the accuracy of the test set was 67.4%-81.3% for different machine learning algorithms. The most critical variables in the model were questionnaire and physiological features, such as the BAI, BDI, STAI, MINI, average HR, resting HR, and deep sleep duration. Conclusions: It is possible to predict PAs using a combination of data from questionnaires and physiological and environmental data. %M 35166679 %R 10.2196/33063 %U https://medinform.jmir.org/2022/2/e33063 %U https://doi.org/10.2196/33063 %U http://www.ncbi.nlm.nih.gov/pubmed/35166679 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 9 %N 1 %P e29973 %T User-Centered Design of A Novel Risk Prediction Behavior Change Tool Augmented With an Artificial Intelligence Engine (MyDiabetesIQ): A Sociotechnical Systems Approach %A Shields,Cathy %A Cunningham,Scott G %A Wake,Deborah J %A Fioratou,Evridiki %A Brodie,Doogie %A Philip,Sam %A Conway,Nicholas T %+ Division of Population Health and Genomics, School of Medicine, University of Dundee, Mackenzie Building, Kirsty Semple Way, Dundee, DD2 4BF, United Kingdom, 44 01382 381382, c.v.shields@dundee.ac.uk %K diabetes mellitus %K digital health intervention %K eHealth %K artificial intelligence %K user-centred design %K human factors %K think aloud %D 2022 %7 8.2.2022 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Diabetes and its complications account for 10% of annual health care spending in the United Kingdom. Digital health care interventions (DHIs) can provide scalable care, fostering diabetes self-management and reducing the risk of complications. Tailorability (providing personalized interventions) and usability are key to DHI engagement/effectiveness. User-centered design of DHIs (aligning features to end users’ needs) can generate more usable interventions, avoiding unintended consequences and improving user engagement. Objective: MyDiabetesIQ (MDIQ) is an artificial intelligence engine intended to predict users’ diabetes complications risk. It will underpin a user interface in which users will alter lifestyle parameters to see the impact on their future risks. MDIQ will link to an existing DHI, My Diabetes My Way (MDMW). We describe the user-centered design of the user interface of MDIQ as informed by human factors engineering. Methods: Current users of MDMW were invited to take part in focus groups to gather their insights about users being shown their likelihood of developing diabetes-related complications and any risks they perceived from using MDIQ. Findings from focus groups informed the development of a prototype MDIQ interface, which was then user-tested through the “think aloud” method, in which users speak aloud about their thoughts/impressions while performing prescribed tasks. Focus group and think aloud transcripts were analyzed thematically, using a combination of inductive and deductive analysis. For think aloud data, a sociotechnical model was used as a framework for thematic analysis. Results: Focus group participants (n=8) felt that some users could become anxious when shown their future complications risks. They highlighted the importance of easy navigation, jargon avoidance, and the use of positive/encouraging language. User testing of the prototype site through think aloud sessions (n=7) highlighted several usability issues. Issues included confusing visual cues and confusion over whether user-updated information fed back to health care teams. Some issues could be compounded for users with limited digital skills. Results from the focus groups and think aloud workshops were used in the development of a live MDIQ platform. Conclusions: Acting on the input of end users at each iterative stage of a digital tool’s development can help to prioritize users throughout the design process, ensuring the alignment of DHI features with user needs. The use of the sociotechnical framework encouraged the consideration of interactions between different sociotechnical dimensions in finding solutions to issues, for example, avoiding the exclusion of users with limited digital skills. Based on user feedback, the tool could scaffold good goal setting, allowing users to balance their palatable future complications risk against acceptable lifestyle changes. Optimal control of diabetes relies heavily on self-management. Tools such as MDMW/ MDIQ can offer personalized support for self-management alongside access to users’ electronic health records, potentially helping to delay or reduce long-term complications, thereby providing significant reductions in health care costs. %M 35133280 %R 10.2196/29973 %U https://humanfactors.jmir.org/2022/1/e29973 %U https://doi.org/10.2196/29973 %U http://www.ncbi.nlm.nih.gov/pubmed/35133280 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 2 %P e28199 %T Improving Emergency Department Patient-Physician Conversation Through an Artificial Intelligence Symptom-Taking Tool: Mixed Methods Pilot Observational Study %A Scheder-Bieschin,Justus %A Blümke,Bibiana %A de Buijzer,Erwin %A Cotte,Fabienne %A Echterdiek,Fabian %A Nacsa,Júlia %A Ondresik,Marta %A Ott,Matthias %A Paul,Gregor %A Schilling,Tobias %A Schmitt,Anne %A Wicks,Paul %A Gilbert,Stephen %+ The Else Kröner Fresenius Center for Digital Health, University Hospital Carl Gustav Carus Dresden, Technische Universität Dresden, Postfach 151 Fetscherstraße 74, Dresden, 01307, Germany, 49 17680396015, stephen.gilbert@mailbox.tu-dresden.de %K symptom assessment application %K anamnesis %K health care system %K patient history taking %K diagnosis %K emergency department %D 2022 %7 7.2.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Establishing rapport and empathy between patients and their health care provider is important but challenging in the context of a busy and crowded emergency department (ED). Objective: We explore the hypotheses that rapport building, documentation, and time efficiency might be improved in the ED by providing patients a digital tool that uses Bayesian reasoning–based techniques to gather relevant symptoms and history for handover to clinicians. Methods: A 2-phase pilot evaluation was carried out in the ED of a German tertiary referral and major trauma hospital that treats an average of 120 patients daily. Phase 1 observations guided iterative improvement of the digital tool, which was then further evaluated in phase 2. All patients who were willing and able to provide consent were invited to participate, excluding those with severe injury or illness requiring immediate treatment, with traumatic injury, incapable of completing a health assessment, and aged <18 years. Over an 18-day period with 1699 patients presenting to the ED, 815 (47.96%) were eligible based on triage level. With available recruitment staff, 135 were approached, of whom 81 (60%) were included in the study. In a mixed methods evaluation, patients entered information into the tool, accessed by clinicians through a dashboard. All users completed evaluation Likert-scale questionnaires rating the tool’s performance. The feasibility of a larger trial was evaluated through rates of recruitment and questionnaire completion. Results: Respondents strongly endorsed the tool for facilitating conversation (61/81, 75% of patients, 57/78, 73% of physician ratings, and 10/10, 100% of nurse ratings). Most nurses judged the tool as potentially time saving, whereas most physicians only agreed for a subset of medical specialties (eg, surgery). Patients reported high usability and understood the tool’s questions. The tool was recommended by most patients (63/81, 78%), in 53% (41/77) of physician ratings, and in 76% (61/80) of nurse ratings. Questionnaire completion rates were 100% (81/81) by patients and 96% (78/81 enrolled patients) by physicians. Conclusions: This pilot confirmed that a larger study in the setting would be feasible. The tool has clear potential to improve patient–health care provider interaction and could also contribute to ED efficiency savings. Future research and development will extend the range of patients for whom the history-taking tool has clinical utility. Trial Registration: German Clinical Trials Register DRKS00024115; https://drks.de/drks_web/navigate.do?navigationId=trial.HTML&TRIAL_ID=DRKS00024115 %M 35129452 %R 10.2196/28199 %U https://formative.jmir.org/2022/2/e28199 %U https://doi.org/10.2196/28199 %U http://www.ncbi.nlm.nih.gov/pubmed/35129452 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 2 %P e29012 %T The Impact of Artificial Intelligence on Waiting Time for Medical Care in an Urgent Care Service for COVID-19: Single-Center Prospective Study %A Bin,Kaio Jia %A Melo,Adler Araujo Ribeiro %A da Rocha,José Guilherme Moraes Franco %A de Almeida,Renata Pivi %A Cobello Junior,Vilson %A Maia,Fernando Liebhart %A de Faria,Elizabeth %A Pereira,Antonio José %A Battistella,Linamara Rizzo %A Ono,Suzane Kioko %+ Hospital das Clinicas, Faculdade de Medicina, Universidade de Sao Paulo, Rua Dr. Ovidio Pires de Campos 225, Sao Paulo, 05403-110, Brazil, 55 1126616208, kaiobin@gmail.com %K COVID-19 %K artificial intelligence %K robotic process automation %K digital health %K health care management %K pandemic %K waiting time %K queue %K nonvalue-added activities %D 2022 %7 1.2.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: To demonstrate the value of implementation of an artificial intelligence solution in health care service, a winning project of the Massachusetts Institute of Technology Hacking Medicine Brazil competition was implemented in an urgent care service for health care professionals at Hospital das Clínicas of the Faculdade de Medicina da Universidade de São Paulo during the COVID-19 pandemic. Objective: The aim of this study was to determine the impact of implementation of the digital solution in the urgent care service, assessing the reduction of nonvalue-added activities and its effect on the nurses’ time required for screening and the waiting time for patients to receive medical care. Methods: This was a single-center, comparative, prospective study designed according to the Public Health England guide “Evaluating Digital Products for Health.” A total of 38,042 visits were analyzed over 18 months to determine the impact of implementing the digital solution. Medical care registration, health screening, and waiting time for medical care were compared before and after implementation of the digital solution. Results: The digital solution automated 92% of medical care registrations. The time for health screening increased by approximately 16% during the implementation and in the first 3 months after the implementation. The waiting time for medical care after automation with the digital solution was reduced by approximately 12 minutes compared with that required for visits without automation. The total time savings in the 12 months after implementation was estimated to be 2508 hours. Conclusions: The digital solution was able to reduce nonvalue-added activities, without a substantial impact on health screening, and further saved waiting time for medical care in an urgent care service in Brazil during the COVID-19 pandemic. %M 35103611 %R 10.2196/29012 %U https://formative.jmir.org/2022/2/e29012 %U https://doi.org/10.2196/29012 %U http://www.ncbi.nlm.nih.gov/pubmed/35103611 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 8 %N 1 %P e33390 %T Health Care Students’ Perspectives on Artificial Intelligence: Countrywide Survey in Canada %A Teng,Minnie %A Singla,Rohit %A Yau,Olivia %A Lamoureux,Daniel %A Gupta,Aurinjoy %A Hu,Zoe %A Hu,Ricky %A Aissiou,Amira %A Eaton,Shane %A Hamm,Camille %A Hu,Sophie %A Kelly,Dayton %A MacMillan,Kathleen M %A Malik,Shamir %A Mazzoli,Vienna %A Teng,Yu-Wen %A Laricheva,Maria %A Jarus,Tal %A Field,Thalia S %+ School of Occupational Science and Occupational Therapy, Faculty of Medicine, University of British Columbia, 2211 Wesbrook Mall T325, Vancouver, BC, V6T 2B5, Canada, 1 (604) 822 7392, minnie.teng@ubc.ca %K medical education %K artificial intelligence %K allied health education %K medical students %K health care students %K medical curriculum %K education %D 2022 %7 31.1.2022 %9 Original Paper %J JMIR Med Educ %G English %X Background: Artificial intelligence (AI) is no longer a futuristic concept; it is increasingly being integrated into health care. As studies on attitudes toward AI have primarily focused on physicians, there is a need to assess the perspectives of students across health care disciplines to inform future curriculum development. Objective: This study aims to explore and identify gaps in the knowledge that Canadian health care students have regarding AI, capture how health care students in different fields differ in their knowledge and perspectives on AI, and present student-identified ways that AI literacy may be incorporated into the health care curriculum. Methods: The survey was developed from a narrative literature review of topics in attitudinal surveys on AI. The final survey comprised 15 items, including multiple-choice questions, pick-group-rank questions, 11-point Likert scale items, slider scale questions, and narrative questions. We used snowball and convenience sampling methods by distributing an email with a description and a link to the web-based survey to representatives from 18 Canadian schools. Results: A total of 2167 students across 10 different health professions from 18 universities across Canada responded to the survey. Overall, 78.77% (1707/2167) predicted that AI technology would affect their careers within the coming decade and 74.5% (1595/2167) reported a positive outlook toward the emerging role of AI in their respective fields. Attitudes toward AI varied by discipline. Students, even those opposed to AI, identified the need to incorporate a basic understanding of AI into their curricula. Conclusions: We performed a nationwide survey of health care students across 10 different health professions in Canada. The findings would inform student-identified topics within AI and their preferred delivery formats, which would advance education across different health care professions. %M 35099397 %R 10.2196/33390 %U https://mededu.jmir.org/2022/1/e33390 %U https://doi.org/10.2196/33390 %U http://www.ncbi.nlm.nih.gov/pubmed/35099397 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 1 %P e31623 %T Governing Data and Artificial Intelligence for Health Care: Developing an International Understanding %A Morley,Jessica %A Murphy,Lisa %A Mishra,Abhishek %A Joshi,Indra %A Karpathakis,Kassandra %+ Oxford Internet Institute, University of Oxford, 1 St. Giles', Oxford, OX1 3JS, United Kingdom, 44 (0)1865 287210, jessica.morley@phc.ox.ac.uk %K digital health %K artificial intelligence %K health policy %D 2022 %7 31.1.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Although advanced analytical techniques falling under the umbrella heading of artificial intelligence (AI) may improve health care, the use of AI in health raises safety and ethical concerns. There are currently no internationally recognized governance mechanisms (policies, ethical standards, evaluation, and regulation) for developing and using AI technologies in health care. A lack of international consensus creates technical and social barriers to the use of health AI while potentially hampering market competition. Objective: The aim of this study is to review current health data and AI governance mechanisms being developed or used by Global Digital Health Partnership (GDHP) member countries that commissioned this research, identify commonalities and gaps in approaches, identify examples of best practices, and understand the rationale for policies. Methods: Data were collected through a scoping review of academic literature and a thematic analysis of policy documents published by selected GDHP member countries. The findings from this data collection and the literature were used to inform semistructured interviews with key senior policy makers from GDHP member countries exploring their countries’ experience of AI-driven technologies in health care and associated governance and inform a focus group with professionals working in international health and technology to discuss the themes and proposed policy recommendations. Policy recommendations were developed based on the aggregated research findings. Results: As this is an empirical research paper, we primarily focused on reporting the results of the interviews and the focus group. Semistructured interviews (n=10) and a focus group (n=6) revealed 4 core areas for international collaborations: leadership and oversight, a whole systems approach covering the entire AI pipeline from data collection to model deployment and use, standards and regulatory processes, and engagement with stakeholders and the public. There was a broad range of maturity in health AI activity among the participants, with varying data infrastructure, application of standards across the AI life cycle, and strategic approaches to both development and deployment. A demand for further consistency at the international level and policies was identified to support a robust innovation pipeline. In total, 13 policy recommendations were developed to support GDHP member countries in overcoming core AI governance barriers and establishing common ground for international collaboration. Conclusions: AI-driven technology research and development for health care outpaces the creation of supporting AI governance globally. International collaboration and coordination on AI governance for health care is needed to ensure coherent solutions and allow countries to support and benefit from each other’s work. International bodies and initiatives have a leading role to play in the international conversation, including the production of tools and sharing of practical approaches to the use of AI-driven technologies for health care. %M 35099403 %R 10.2196/31623 %U https://formative.jmir.org/2022/1/e31623 %U https://doi.org/10.2196/31623 %U http://www.ncbi.nlm.nih.gov/pubmed/35099403 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 1 %P e34475 %T The Use of a Computerized Cognitive Assessment to Improve the Efficiency of Primary Care Referrals to Memory Services: Protocol for the Accelerating Dementia Pathway Technologies (ADePT) Study %A Kalafatis,Chris %A Modarres,Mohammad Hadi %A Apostolou,Panos %A Tabet,Naji %A Khaligh-Razavi,Seyed-Mahdi %+ Cognetivity Neurosciences Ltd, 3 Waterhouse Square, London, EC1N 2SW, United Kingdom, 44 020 3002 362, seyed@cognetivity.com %K primary health care %K general practice %K dementia %K cognitive assessment %K artificial intelligence %K early diagnosis %K cognition %K assessment %K efficiency %K diagnosis %K COVID-19 %K memory %K mental health %K impairment %K screening %K detection %K efficiency %D 2022 %7 27.1.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: Existing primary care cognitive assessment tools are crude or time-consuming screening instruments which can only detect cognitive impairment when it is well established. Due to the COVID-19 pandemic, memory services have adapted to the new environment by moving to remote patient assessments to continue meeting service user demand. However, the remote use of cognitive assessments has been variable while there has been scant evaluation of the outcome of such a change in clinical practice. Emerging research in remote memory clinics has highlighted computerized cognitive tests, such as the Integrated Cognitive Assessment (ICA), as prominent candidates for adoption in clinical practice both during the pandemic and for post-COVID-19 implementation as part of health care innovation. Objective: The aim of the Accelerating Dementia Pathway Technologies (ADePT) study is to develop a real-world evidence basis to support the adoption of ICA as an inexpensive screening tool for the detection of cognitive impairment to improve the efficiency of the dementia care pathway. Methods: Patients who have been referred to a memory clinic by a general practitioner (GP) are recruited. Participants complete the ICA either at home or in the clinic along with medical history and usability questionnaires. The GP referral and ICA outcome are compared with the specialist diagnosis obtained at the memory clinic. The clinical outcomes as well as National Health Service reference costing data will be used to assess the potential health and economic benefits of the use of the ICA in the dementia diagnosis pathway. Results: The ADePT study was funded in January 2020 by Innovate UK (Project Number 105837). As of September 2021, 86 participants have been recruited in the study, with 23 participants also completing a retest visit. Initially, the study was designed for in-person visits at the memory clinic; however, in light of the COVID-19 pandemic, the study was amended to allow remote as well as face-to-face visits. The study was also expanded from a single site to 4 sites in the United Kingdom. We expect results to be published by the second quarter of 2022. Conclusions: The ADePT study aims to improve the efficiency of the dementia care pathway at its very beginning and supports systems integration at the intersection between primary and secondary care. The introduction of a standardized, self-administered, digital assessment tool for the timely detection of neurodegeneration as part of a decision support system that can signpost accordingly can reduce unnecessary referrals, service backlog, and assessment variability. Trial Registration: ISRCTN 16596456; https://www.isrctn.com/ISRCTN16596456 International Registered Report Identifier (IRRID): DERR1-10.2196/34475 %M 34932495 %R 10.2196/34475 %U https://www.researchprotocols.org/2022/1/e34475 %U https://doi.org/10.2196/34475 %U http://www.ncbi.nlm.nih.gov/pubmed/34932495 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 1 %P e32215 %T Implementation Frameworks for Artificial Intelligence Translation Into Health Care Practice: Scoping Review %A Gama,Fábio %A Tyskbo,Daniel %A Nygren,Jens %A Barlow,James %A Reed,Julie %A Svedberg,Petra %+ School of Business, Innovation and Sustainability, Halmstad University, Kristian IV:s väg 3, Halmstad, 30118, Sweden, 46 0702628937, fabio.gama@hh.se %K implementation framework %K artificial intelligence %K scoping review %D 2022 %7 27.1.2022 %9 Review %J J Med Internet Res %G English %X Background: Significant efforts have been made to develop artificial intelligence (AI) solutions for health care improvement. Despite the enthusiasm, health care professionals still struggle to implement AI in their daily practice. Objective: This paper aims to identify the implementation frameworks used to understand the application of AI in health care practice. Methods: A scoping review was conducted using the Cochrane, Evidence Based Medicine Reviews, Embase, MEDLINE, and PsycINFO databases to identify publications that reported frameworks, models, and theories concerning AI implementation in health care. This review focused on studies published in English and investigating AI implementation in health care since 2000. A total of 2541 unique publications were retrieved from the databases and screened on titles and abstracts by 2 independent reviewers. Selected articles were thematically analyzed against the Nilsen taxonomy of implementation frameworks, and the Greenhalgh framework for the nonadoption, abandonment, scale-up, spread, and sustainability (NASSS) of health care technologies. Results: In total, 7 articles met all eligibility criteria for inclusion in the review, and 2 articles included formal frameworks that directly addressed AI implementation, whereas the other articles provided limited descriptions of elements influencing implementation. Collectively, the 7 articles identified elements that aligned with all the NASSS domains, but no single article comprehensively considered the factors known to influence technology implementation. New domains were identified, including dependency on data input and existing processes, shared decision-making, the role of human oversight, and ethics of population impact and inequality, suggesting that existing frameworks do not fully consider the unique needs of AI implementation. Conclusions: This literature review demonstrates that understanding how to implement AI in health care practice is still in its early stages of development. Our findings suggest that further research is needed to provide the knowledge necessary to develop implementation frameworks to guide the future implementation of AI in clinical practice and highlight the opportunity to draw on existing knowledge from the field of implementation science. %M 35084349 %R 10.2196/32215 %U https://www.jmir.org/2022/1/e32215 %U https://doi.org/10.2196/32215 %U http://www.ncbi.nlm.nih.gov/pubmed/35084349 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 1 %P e35225 %T Incidence of Diagnostic Errors Among Unexpectedly Hospitalized Patients Using an Automated Medical History–Taking System With a Differential Diagnosis Generator: Retrospective Observational Study %A Kawamura,Ren %A Harada,Yukinori %A Sugimoto,Shu %A Nagase,Yuichiro %A Katsukura,Shinichi %A Shimizu,Taro %+ Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, 880 Kitakobayashi, Mibu, 321-0293, Japan, 81 282861111, shimizutaro7@gmail.com %K artificial intelligence %K automated medical history–taking %K diagnostic errors %K outpatient %K Safer Dx %D 2022 %7 27.1.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: Automated medical history–taking systems that generate differential diagnosis lists have been suggested to contribute to improved diagnostic accuracy. However, the effect of these systems on diagnostic errors in clinical practice remains unknown. Objective: This study aimed to assess the incidence of diagnostic errors in an outpatient department, where an artificial intelligence (AI)–driven automated medical history–taking system that generates differential diagnosis lists was implemented in clinical practice. Methods: We conducted a retrospective observational study using data from a community hospital in Japan. We included patients aged 20 years and older who used an AI-driven, automated medical history–taking system that generates differential diagnosis lists in the outpatient department of internal medicine for whom the index visit was between July 1, 2019, and June 30, 2020, followed by unplanned hospitalization within 14 days. The primary endpoint was the incidence of diagnostic errors, which were detected using the Revised Safer Dx Instrument by at least two independent reviewers. To evaluate the effect of differential diagnosis lists from the AI system on the incidence of diagnostic errors, we compared the incidence of these errors between a group where the AI system generated the final diagnosis in the differential diagnosis list and a group where the AI system did not generate the final diagnosis in the list; the Fisher exact test was used for comparison between these groups. For cases with confirmed diagnostic errors, further review was conducted to identify the contributing factors of these errors via discussion among three reviewers, using the Safer Dx Process Breakdown Supplement as a reference. Results: A total of 146 patients were analyzed. A final diagnosis was confirmed for 138 patients and was observed in the differential diagnosis list from the AI system for 69 patients. Diagnostic errors occurred in 16 out of 146 patients (11.0%, 95% CI 6.4%-17.2%). Although statistically insignificant, the incidence of diagnostic errors was lower in cases where the final diagnosis was included in the differential diagnosis list from the AI system than in cases where the final diagnosis was not included in the list (7.2% vs 15.9%, P=.18). Conclusions: The incidence of diagnostic errors among patients in the outpatient department of internal medicine who used an automated medical history–taking system that generates differential diagnosis lists seemed to be lower than the previously reported incidence of diagnostic errors. This result suggests that the implementation of an automated medical history–taking system that generates differential diagnosis lists could be beneficial for diagnostic safety in the outpatient department of internal medicine. %M 35084347 %R 10.2196/35225 %U https://medinform.jmir.org/2022/1/e35225 %U https://doi.org/10.2196/35225 %U http://www.ncbi.nlm.nih.gov/pubmed/35084347 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 1 %P e28916 %T General Practitioners’ Attitudes Toward Artificial Intelligence–Enabled Systems: Interview Study %A Buck,Christoph %A Doctor,Eileen %A Hennrich,Jasmin %A Jöhnk,Jan %A Eymann,Torsten %+ Department of Business & Information Systems Engineering, University of Bayreuth, Universtitätsstraße 30, Bayreuth, 95447, Germany, 49 (0)921 557665, Christoph.Buck@uni-bayreuth.de %K artificial intelligence %K AI %K attitude %K primary care %K general practitioner %K GP %K qualitative interview %K diagnosis %K clinical decision support system %D 2022 %7 27.1.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: General practitioners (GPs) care for a large number of patients with various diseases in very short timeframes under high uncertainty. Thus, systems enabled by artificial intelligence (AI) are promising and time-saving solutions that may increase the quality of care. Objective: This study aims to understand GPs’ attitudes toward AI-enabled systems in medical diagnosis. Methods: We interviewed 18 GPs from Germany between March 2020 and May 2020 to identify determinants of GPs’ attitudes toward AI-based systems in diagnosis. By analyzing the interview transcripts, we identified 307 open codes, which we then further structured to derive relevant attitude determinants. Results: We merged the open codes into 21 concepts and finally into five categories: concerns, expectations, environmental influences, individual characteristics, and minimum requirements of AI-enabled systems. Concerns included all doubts and fears of the participants regarding AI-enabled systems. Expectations reflected GPs’ thoughts and beliefs about expected benefits and limitations of AI-enabled systems in terms of GP care. Environmental influences included influences resulting from an evolving working environment, key stakeholders’ perspectives and opinions, the available information technology hardware and software resources, and the media environment. Individual characteristics were determinants that describe a physician as a person, including character traits, demographic characteristics, and knowledge. In addition, the interviews also revealed the minimum requirements of AI-enabled systems, which were preconditions that must be met for GPs to contemplate using AI-enabled systems. Moreover, we identified relationships among these categories, which we conflate in our proposed model. Conclusions: This study provides a thorough understanding of the perspective of future users of AI-enabled systems in primary care and lays the foundation for successful market penetration. We contribute to the research stream of analyzing and designing AI-enabled systems and the literature on attitudes toward technology and practice by fostering the understanding of GPs and their attitudes toward such systems. Our findings provide relevant information to technology developers, policymakers, and stakeholder institutions of GP care. %M 35084342 %R 10.2196/28916 %U https://www.jmir.org/2022/1/e28916 %U https://doi.org/10.2196/28916 %U http://www.ncbi.nlm.nih.gov/pubmed/35084342 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 1 %P e34038 %T Technology-Enabled, Evidence-Driven, and Patient-Centered: The Way Forward for Regulating Software as a Medical Device %A Carolan,Jane Elizabeth %A McGonigle,John %A Dennis,Andrea %A Lorgelly,Paula %A Banerjee,Amitava %+ Institute of Health Informatics, University College London, Gower Street, London, WC1E 6BT, United Kingdom, 44 07464345635, j.carolan@ucl.ac.uk %K Artificial intelligence %K machine learning %K algorithm %K software %K risk assessment %K informatics %D 2022 %7 27.1.2022 %9 Viewpoint %J JMIR Med Inform %G English %X Artificial intelligence (AI) is a broad discipline that aims to understand and design systems that display properties of intelligence. Machine learning (ML) is a subset of AI that describes how algorithms and models can assist computer systems in progressively improving their performance. In health care, an increasingly common application of AI/ML is software as a medical device (SaMD), which has the intention to diagnose, treat, cure, mitigate, or prevent disease. AI/ML includes either “locked” or “continuous learning” algorithms. Locked algorithms consistently provide the same output for a particular input. Conversely, continuous learning algorithms, in their infancy in terms of SaMD, modify in real-time based on incoming real-world data, without controlled software version releases. This continuous learning has the potential to better handle local population characteristics, but with the risk of reinforcing existing structural biases. Continuous learning algorithms pose the greatest regulatory complexity, requiring seemingly continuous oversight in the form of special controls to ensure ongoing safety and effectiveness. We describe the challenges of continuous learning algorithms, then highlight the new evidence standards and frameworks under development, and discuss the need for stakeholder engagement. The paper concludes with 2 key steps that regulators need to address in order to optimize and realize the benefits of SaMD: first, international standards and guiding principles addressing the uniqueness of SaMD with a continuous learning algorithm are required and second, throughout the product life cycle and appropriate to the SaMD risk classification, there needs to be continuous communication between regulators, developers, and SaMD end users to ensure vigilance and an accurate understanding of the technology. %M 35084352 %R 10.2196/34038 %U https://medinform.jmir.org/2022/1/e34038 %U https://doi.org/10.2196/34038 %U http://www.ncbi.nlm.nih.gov/pubmed/35084352 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 1 %P e28036 %T Energy Efficiency of Inference Algorithms for Clinical Laboratory Data Sets: Green Artificial Intelligence Study %A Yu,Jia-Ruei %A Chen,Chun-Hsien %A Huang,Tsung-Wei %A Lu,Jang-Jih %A Chung,Chia-Ru %A Lin,Ting-Wei %A Wu,Min-Hsien %A Tseng,Yi-Ju %A Wang,Hsin-Yao %+ Department of Laboratory Medicine, Chang Gung Memorial Hospital at Linkou, No 5 Fuxing Street, Guishan District, Taoyuan City, 333, Taiwan, 886 978112962, mdhsinyaowang@gmail.com %K medical informatics %K machine learning %K algorithms %K energy consumption %K artificial intelligence %K energy efficient %K medical domain %K medical data sets %K informatics %D 2022 %7 25.1.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: The use of artificial intelligence (AI) in the medical domain has attracted considerable research interest. Inference applications in the medical domain require energy-efficient AI models. In contrast to other types of data in visual AI, data from medical laboratories usually comprise features with strong signals. Numerous energy optimization techniques have been developed to relieve the burden on the hardware required to deploy a complex learning model. However, the energy efficiency levels of different AI models used for medical applications have not been studied. Objective: The aim of this study was to explore and compare the energy efficiency levels of commonly used machine learning algorithms—logistic regression (LR), k-nearest neighbor, support vector machine, random forest (RF), and extreme gradient boosting (XGB) algorithms, as well as four different variants of neural network (NN) algorithms—when applied to clinical laboratory datasets. Methods: We applied the aforementioned algorithms to two distinct clinical laboratory data sets: a mass spectrometry data set regarding Staphylococcus aureus for predicting methicillin resistance (3338 cases; 268 features) and a urinalysis data set for predicting Trichomonas vaginalis infection (839,164 cases; 9 features). We compared the performance of the nine inference algorithms in terms of accuracy, area under the receiver operating characteristic curve (AUROC), time consumption, and power consumption. The time and power consumption levels were determined using performance counter data from Intel Power Gadget 3.5. Results: The experimental results indicated that the RF and XGB algorithms achieved the two highest AUROC values for both data sets (84.7% and 83.9%, respectively, for the mass spectrometry data set; 91.1% and 91.4%, respectively, for the urinalysis data set). The XGB and LR algorithms exhibited the shortest inference time for both data sets (0.47 milliseconds for both in the mass spectrometry data set; 0.39 and 0.47 milliseconds, respectively, for the urinalysis data set). Compared with the RF algorithm, the XGB and LR algorithms exhibited a 45% and 53%-60% reduction in inference time for the mass spectrometry and urinalysis data sets, respectively. In terms of energy efficiency, the XGB algorithm exhibited the lowest power consumption for the mass spectrometry data set (9.42 Watts) and the LR algorithm exhibited the lowest power consumption for the urinalysis data set (9.98 Watts). Compared with a five-hidden-layer NN, the XGB and LR algorithms achieved 16%-24% and 9%-13% lower power consumption levels for the mass spectrometry and urinalysis data sets, respectively. In all experiments, the XGB algorithm exhibited the best performance in terms of accuracy, run time, and energy efficiency. Conclusions: The XGB algorithm achieved balanced performance levels in terms of AUROC, run time, and energy efficiency for the two clinical laboratory data sets. Considering the energy constraints in real-world scenarios, the XGB algorithm is ideal for medical AI applications. %M 35076405 %R 10.2196/28036 %U https://www.jmir.org/2022/1/e28036 %U https://doi.org/10.2196/28036 %U http://www.ncbi.nlm.nih.gov/pubmed/35076405 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 11 %N 1 %P e28366 %T Machine Learning Approaches for Predicting Difficult Airway and First-Pass Success in the Emergency Department: Multicenter Prospective Observational Study %A Yamanaka,Syunsuke %A Goto,Tadahiro %A Morikawa,Koji %A Watase,Hiroko %A Okamoto,Hiroshi %A Hagiwara,Yusuke %A Hasegawa,Kohei %+ Department of Clinical Epidemiology & Health Economics, School of Public Health, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan, 81 3 5841 1887, tag695@mail.harvard.edu %K intubation %K machine learning %K difficult airway %K first-pass success %D 2022 %7 25.1.2022 %9 Original Paper %J Interact J Med Res %G English %X Background: There is still room for improvement in the modified LEMON (look, evaluate, Mallampati, obstruction, neck mobility) criteria for difficult airway prediction and no prediction tool for first-pass success in the emergency department (ED). Objective: We applied modern machine learning approaches to predict difficult airways and first-pass success. Methods: In a multicenter prospective study that enrolled consecutive patients who underwent tracheal intubation in 13 EDs, we developed 7 machine learning models (eg, random forest model) using routinely collected data (eg, demographics, initial airway assessment). The outcomes were difficult airway and first-pass success. Model performance was evaluated using c-statistics, calibration slopes, and association measures (eg, sensitivity) in the test set (randomly selected 20% of the data). Their performance was compared with the modified LEMON criteria for difficult airway success and a logistic regression model for first-pass success. Results: Of 10,741 patients who underwent intubation, 543 patients (5.1%) had a difficult airway, and 7690 patients (71.6%) had first-pass success. In predicting a difficult airway, machine learning models—except for k-point nearest neighbor and multilayer perceptron—had higher discrimination ability than the modified LEMON criteria (all, P≤.001). For example, the ensemble method had the highest c-statistic (0.74 vs 0.62 with the modified LEMON criteria; P<.001). Machine learning models—except k-point nearest neighbor and random forest models—had higher discrimination ability for first-pass success. In particular, the ensemble model had the highest c-statistic (0.81 vs 0.76 with the reference regression; P<.001). Conclusions: Machine learning models demonstrated greater ability for predicting difficult airway and first-pass success in the ED. %M 35076398 %R 10.2196/28366 %U https://www.i-jmr.org/2022/1/e28366 %U https://doi.org/10.2196/28366 %U http://www.ncbi.nlm.nih.gov/pubmed/35076398 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 9 %N 1 %P e34333 %T Automatic Assessment of Emotion Dysregulation in American, French, and Tunisian Adults and New Developments in Deep Multimodal Fusion: Cross-sectional Study %A Parra,Federico %A Benezeth,Yannick %A Yang,Fan %+ LE2I EA 7508, Université Bourgogne Franche-Comté, UFR Sciences et techniques, avenue Alain Savary, Dijon, 21000, France, 33 782132695, federico.parra@hotmail.com %K emotion dysregulation %K deep multimodal fusion %K small data %K psychometrics %D 2022 %7 24.1.2022 %9 Original Paper %J JMIR Ment Health %G English %X Background: Emotion dysregulation is a key dimension of adult psychological functioning. There is an interest in developing a computer-based, multimodal, and automatic measure. Objective: We wanted to train a deep multimodal fusion model to estimate emotion dysregulation in adults based on their responses to the Multimodal Developmental Profile, a computer-based psychometric test, using only a small training sample and without transfer learning. Methods: Two hundred and forty-eight participants from 3 different countries took the Multimodal Developmental Profile test, which exposed them to 14 picture and music stimuli and asked them to express their feelings about them, while the software extracted the following features from the video and audio signals: facial expressions, linguistic and paralinguistic characteristics of speech, head movements, gaze direction, and heart rate variability derivatives. Participants also responded to the brief version of the Difficulties in Emotional Regulation Scale. We separated and averaged the feature signals that corresponded to the responses to each stimulus, building a structured data set. We transformed each person’s per-stimulus structured data into a multimodal codex, a grayscale image created by projecting each feature’s normalized intensity value onto a cartesian space, deriving each pixel’s position by applying the Uniform Manifold Approximation and Projection method. The codex sequence was then fed to 2 network types. First, 13 convolutional neural networks dealt with the spatial aspect of the problem, estimating emotion dysregulation by analyzing each of the codified responses. These convolutional estimations were then fed to a transformer network that decoded the temporal aspect of the problem, estimating emotional dysregulation based on the succession of responses. We introduce a Feature Map Average Pooling layer, which computes the mean of the convolved feature maps produced by our convolution layers, dramatically reducing the number of learnable weights and increasing regularization through an ensembling effect. We implemented 8-fold cross-validation to provide a good enough estimation of the generalization ability to unseen samples. Most of the experiments mentioned in this paper are easily replicable using the associated Google Colab system. Results: We found an average Pearson correlation (r) of 0.55 (with an average P value of <.001) between ground truth emotion dysregulation and our system’s estimation of emotion dysregulation. An average mean absolute error of 0.16 and a mean concordance correlation coefficient of 0.54 were also found. Conclusions: In psychometry, our results represent excellent evidence of convergence validity, suggesting that the Multimodal Developmental Profile could be used in conjunction with this methodology to provide a valid measure of emotion dysregulation in adults. Future studies should replicate our findings using a hold-out test sample. Our methodology could be implemented more generally to train deep neural networks where only small training samples are available. %M 35072643 %R 10.2196/34333 %U https://mental.jmir.org/2022/1/e34333 %U https://doi.org/10.2196/34333 %U http://www.ncbi.nlm.nih.gov/pubmed/35072643 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 1 %P e28659 %T A Clinical Decision Support System for Sleep Staging Tasks With Explanations From Artificial Intelligence: User-Centered Design and Evaluation Study %A Hwang,Jeonghwan %A Lee,Taeheon %A Lee,Honggu %A Byun,Seonjeong %+ Department of Neuropsychiatry, Uijeongbu St Mary's Hospital, College of Medicine, The Catholic University of Korea, 271, Chenbo-ro, Uijeongbu-si, 11765, Republic of Korea, 82 31 820 3946, sunjung.byun@gmail.com %K sleep staging %K clinical decision support %K user-centered design %K medical artificial intelligence %D 2022 %7 19.1.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Despite the unprecedented performance of deep learning algorithms in clinical domains, full reviews of algorithmic predictions by human experts remain mandatory. Under these circumstances, artificial intelligence (AI) models are primarily designed as clinical decision support systems (CDSSs). However, from the perspective of clinical practitioners, the lack of clinical interpretability and user-centered interfaces hinders the adoption of these AI systems in practice. Objective: This study aims to develop an AI-based CDSS for assisting polysomnographic technicians in reviewing AI-predicted sleep staging results. This study proposed and evaluated a CDSS that provides clinically sound explanations for AI predictions in a user-centered manner. Methods: Our study is based on a user-centered design framework for developing explanations in a CDSS that identifies why explanations are needed, what information should be contained in explanations, and how explanations can be provided in the CDSS. We conducted user interviews, user observation sessions, and an iterative design process to identify three key aspects for designing explanations in the CDSS. After constructing the CDSS, the tool was evaluated to investigate how the CDSS explanations helped technicians. We measured the accuracy of sleep staging and interrater reliability with macro-F1 and Cohen κ scores to assess quantitative improvements after our tool was adopted. We assessed qualitative improvements through participant interviews that established how participants perceived and used the tool. Results: The user study revealed that technicians desire explanations that are relevant to key electroencephalogram (EEG) patterns for sleep staging when assessing the correctness of AI predictions. Here, technicians wanted explanations that could be used to evaluate whether the AI models properly locate and use these patterns during prediction. On the basis of this, information that is closely related to sleep EEG patterns was formulated for the AI models. In the iterative design phase, we developed a different visualization strategy for each pattern based on how technicians interpreted the EEG recordings with these patterns during their workflows. Our evaluation study on 9 polysomnographic technicians quantitatively and qualitatively investigated the helpfulness of the tool. For technicians with <5 years of work experience, their quantitative sleep staging performance improved significantly from 56.75 to 60.59 with a P value of .05. Qualitatively, participants reported that the information provided effectively supported them, and they could develop notable adoption strategies for the tool. Conclusions: Our findings indicate that formulating clinical explanations for automated predictions using the information in the AI with a user-centered design process is an effective strategy for developing a CDSS for sleep staging. %M 35044311 %R 10.2196/28659 %U https://www.jmir.org/2022/1/e28659 %U https://doi.org/10.2196/28659 %U http://www.ncbi.nlm.nih.gov/pubmed/35044311 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 1 %P e28858 %T Harnessing Artificial Intelligence for Health Message Generation: The Folic Acid Message Engine %A Schmälzle,Ralf %A Wilcox,Shelby %+ Department of Communication, Michigan State University, 404 Wilson Rd, East Lansing, MI, 48824, United States, 1 (517) 353 ext 6629, schmaelz@msu.edu %K human-centered AI %K campaigns %K health communication %K NLP %K health promotion %D 2022 %7 18.1.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Communication campaigns using social media can raise public awareness; however, they are difficult to sustain. A barrier is the need to generate and constantly post novel but on-topic messages, which creates a resource-intensive bottleneck. Objective: In this study, we aim to harness the latest advances in artificial intelligence (AI) to build a pilot system that can generate many candidate messages, which could be used for a campaign to suggest novel, on-topic candidate messages. The issue of folic acid, a B-vitamin that helps prevent major birth defects, serves as an example; however, the system can work with other issues that could benefit from higher levels of public awareness. Methods: We used the Generative Pretrained Transformer-2 architecture, a machine learning model trained on a large natural language corpus, and fine-tuned it using a data set of autodownloaded tweets about #folicacid. The fine-tuned model was then used as a message engine, that is, to create new messages about this topic. We conducted a web-based study to gauge how human raters evaluate AI-generated tweet messages compared with original, human-crafted messages. Results: We found that the Folic Acid Message Engine can easily create several hundreds of new messages that appear natural to humans. Web-based raters evaluated the clarity and quality of a human-curated sample of AI-generated messages as on par with human-generated ones. Overall, these results showed that it is feasible to use such a message engine to suggest messages for web-based campaigns that focus on promoting awareness. Conclusions: The message engine can serve as a starting point for more sophisticated AI-guided message creation systems for health communication. Beyond the practical potential of such systems for campaigns in the age of social media, they also hold great scientific potential for the quantitative analysis of message characteristics that promote successful communication. We discuss future developments and obvious ethical challenges that need to be addressed as AI technologies for health persuasion enter the stage. %M 35040800 %R 10.2196/28858 %U https://www.jmir.org/2022/1/e28858 %U https://doi.org/10.2196/28858 %U http://www.ncbi.nlm.nih.gov/pubmed/35040800 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 1 %P e32939 %T Perceptions and Needs of Artificial Intelligence in Health Care to Increase Adoption: Scoping Review %A Chew,Han Shi Jocelyn %A Achananuparp,Palakorn %+ Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Level 3, Clinical Research Centre, Block MD11, 10 Medical Drive, Singapore, 117597, Singapore, 65 65168687, jocelyn.chew.hs@nus.edu.sg %K artificial intelligence %K health care %K service delivery %K perceptions %K needs %K scoping %K review %D 2022 %7 14.1.2022 %9 Review %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) has the potential to improve the efficiency and effectiveness of health care service delivery. However, the perceptions and needs of such systems remain elusive, hindering efforts to promote AI adoption in health care. Objective: This study aims to provide an overview of the perceptions and needs of AI to increase its adoption in health care. Methods: A systematic scoping review was conducted according to the 5-stage framework by Arksey and O’Malley. Articles that described the perceptions and needs of AI in health care were searched across nine databases: ACM Library, CINAHL, Cochrane Central, Embase, IEEE Xplore, PsycINFO, PubMed, Scopus, and Web of Science for studies that were published from inception until June 21, 2021. Articles that were not specific to AI, not research studies, and not written in English were omitted. Results: Of the 3666 articles retrieved, 26 (0.71%) were eligible and included in this review. The mean age of the participants ranged from 30 to 72.6 years, the proportion of men ranged from 0% to 73.4%, and the sample sizes for primary studies ranged from 11 to 2780. The perceptions and needs of various populations in the use of AI were identified for general, primary, and community health care; chronic diseases self-management and self-diagnosis; mental health; and diagnostic procedures. The use of AI was perceived to be positive because of its availability, ease of use, and potential to improve efficiency and reduce the cost of health care service delivery. However, concerns were raised regarding the lack of trust in data privacy, patient safety, technological maturity, and the possibility of full automation. Suggestions for improving the adoption of AI in health care were highlighted: enhancing personalization and customizability; enhancing empathy and personification of AI-enabled chatbots and avatars; enhancing user experience, design, and interconnectedness with other devices; and educating the public on AI capabilities. Several corresponding mitigation strategies were also identified in this study. Conclusions: The perceptions and needs of AI in its use in health care are crucial in improving its adoption by various stakeholders. Future studies and implementations should consider the points highlighted in this study to enhance the acceptability and adoption of AI in health care. This would facilitate an increase in the effectiveness and efficiency of health care service delivery to improve patient outcomes and satisfaction. %M 35029538 %R 10.2196/32939 %U https://www.jmir.org/2022/1/e32939 %U https://doi.org/10.2196/32939 %U http://www.ncbi.nlm.nih.gov/pubmed/35029538 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 1 %P e29969 %T An Artificial Intelligence Chatbot for Young People’s Sexual and Reproductive Health in India (SnehAI): Instrumental Case Study %A Wang,Hua %A Gupta,Sneha %A Singhal,Arvind %A Muttreja,Poonam %A Singh,Sanghamitra %A Sharma,Poorva %A Piterova,Alice %+ Department of Communication, University at Buffalo, The State University of New York, 359 Baldy Hall, Buffalo, NY, 14260, United States, 1 7166451501, hwang23@buffalo.edu %K artificial intelligence %K chatbot %K Facebook %K affordance %K sex education %K sexual and reproductive health %K contraception %K case study %K young people %K India %K transmedia %K mobile apps %K mobile health %K technology design %K user engagement %K digital health %K mobile phone %D 2022 %7 3.1.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Leveraging artificial intelligence (AI)–driven apps for health education and promotion can help in the accomplishment of several United Nations sustainable development goals. SnehAI, developed by the Population Foundation of India, is the first Hinglish (Hindi + English) AI chatbot, deliberately designed for social and behavioral changes in India. It provides a private, nonjudgmental, and safe space to spur conversations about taboo topics (such as safe sex and family planning) and offers accurate, relatable, and trustworthy information and resources. Objective: This study aims to use the Gibson theory of affordances to examine SnehAI and offer scholarly guidance on how AI chatbots can be used to educate adolescents and young adults, promote sexual and reproductive health, and advocate for the health entitlements of women and girls in India. Methods: We adopted an instrumental case study approach that allowed us to explore SnehAI from the perspectives of technology design, program implementation, and user engagement. We also used a mix of qualitative insights and quantitative analytics data to triangulate our findings. Results: SnehAI demonstrated strong evidence across fifteen functional affordances: accessibility, multimodality, nonlinearity, compellability, queriosity, editability, visibility, interactivity, customizability, trackability, scalability, glocalizability, inclusivity, connectivity, and actionability. SnehAI also effectively engaged its users, especially young men, with 8.2 million messages exchanged across a 5-month period. Almost half of the incoming user messages were texts of deeply personal questions and concerns about sexual and reproductive health, as well as allied topics. Overall, SnehAI successfully presented itself as a trusted friend and mentor; the curated content was both entertaining and educational, and the natural language processing system worked effectively to personalize the chatbot response and optimize user experience. Conclusions: SnehAI represents an innovative, engaging, and educational intervention that enables vulnerable and hard-to-reach population groups to talk and learn about sensitive and important issues. SnehAI is a powerful testimonial of the vital potential that lies in AI technologies for social good. %M 34982034 %R 10.2196/29969 %U https://www.jmir.org/2022/1/e29969 %U https://doi.org/10.2196/29969 %U http://www.ncbi.nlm.nih.gov/pubmed/34982034 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 1 %P e34415 %T A Deep Residual U-Net Algorithm for Automatic Detection and Quantification of Ascites on Abdominopelvic Computed Tomography Images Acquired in the Emergency Department: Model Development and Validation %A Ko,Hoon %A Huh,Jimi %A Kim,Kyung Won %A Chung,Heewon %A Ko,Yousun %A Kim,Jai Keun %A Lee,Jei Hee %A Lee,Jinseok %+ Department of Biomedical Engineering, Kyung Hee University, 1732, Deogyeong-daero, Giheung-gu, Yongin-si, 17104, Republic of Korea, 82 312012570, gonasago@khu.ac.kr %K ascites %K computed tomography %K deep residual U-Net %K artificial intelligence %D 2022 %7 3.1.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Detection and quantification of intra-abdominal free fluid (ie, ascites) on computed tomography (CT) images are essential processes for finding emergent or urgent conditions in patients. In an emergency department, automatic detection and quantification of ascites will be beneficial. Objective: We aimed to develop an artificial intelligence (AI) algorithm for the automatic detection and quantification of ascites simultaneously using a single deep learning model (DLM). Methods: We developed 2D DLMs based on deep residual U-Net, U-Net, bidirectional U-Net, and recurrent residual U-Net (R2U-Net) algorithms to segment areas of ascites on abdominopelvic CT images. Based on segmentation results, the DLMs detected ascites by classifying CT images into ascites images and nonascites images. The AI algorithms were trained using 6337 CT images from 160 subjects (80 with ascites and 80 without ascites) and tested using 1635 CT images from 40 subjects (20 with ascites and 20 without ascites). The performance of the AI algorithms was evaluated for diagnostic accuracy of ascites detection and for segmentation accuracy of ascites areas. Of these DLMs, we proposed an AI algorithm with the best performance. Results: The segmentation accuracy was the highest for the deep residual U-Net model with a mean intersection over union (mIoU) value of 0.87, followed by U-Net, bidirectional U-Net, and R2U-Net models (mIoU values of 0.80, 0.77, and 0.67, respectively). The detection accuracy was the highest for the deep residual U-Net model (0.96), followed by U-Net, bidirectional U-Net, and R2U-Net models (0.90, 0.88, and 0.82, respectively). The deep residual U-Net model also achieved high sensitivity (0.96) and high specificity (0.96). Conclusions: We propose a deep residual U-Net–based AI algorithm for automatic detection and quantification of ascites on abdominopelvic CT scans, which provides excellent performance. %M 34982041 %R 10.2196/34415 %U https://www.jmir.org/2022/1/e34415 %U https://doi.org/10.2196/34415 %U http://www.ncbi.nlm.nih.gov/pubmed/34982041 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 12 %P e27008 %T A Novel Deep Learning–Based System for Triage in the Emergency Department Using Electronic Medical Records: Retrospective Cohort Study %A Yao,Li-Hung %A Leung,Ka-Chun %A Tsai,Chu-Lin %A Huang,Chien-Hua %A Fu,Li-Chen %+ Department of Computer Science and Information Engineering, National Taiwan University, CSIE Der Tian Hall, No. 1, Sec. 4, Roosevelt Road, Taipei, 10617, Taiwan, 886 0935545846, lichen@ntu.edu.tw %K emergency department %K triage system %K deep learning %K hospital admission %K data to text %K electronic health record %D 2021 %7 27.12.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Emergency department (ED) crowding has resulted in delayed patient treatment and has become a universal health care problem. Although a triage system, such as the 5-level emergency severity index, somewhat improves the process of ED treatment, it still heavily relies on the nurse’s subjective judgment and triages too many patients to emergency severity index level 3 in current practice. Hence, a system that can help clinicians accurately triage a patient’s condition is imperative. Objective: This study aims to develop a deep learning–based triage system using patients’ ED electronic medical records to predict clinical outcomes after ED treatments. Methods: We conducted a retrospective study using data from an open data set from the National Hospital Ambulatory Medical Care Survey from 2012 to 2016 and data from a local data set from the National Taiwan University Hospital from 2009 to 2015. In this study, we transformed structured data into text form and used convolutional neural networks combined with recurrent neural networks and attention mechanisms to accomplish the classification task. We evaluated our performance using area under the receiver operating characteristic curve (AUROC). Results: A total of 118,602 patients from the National Hospital Ambulatory Medical Care Survey were included in this study for predicting hospitalization, and the accuracy and AUROC were 0.83 and 0.87, respectively. On the other hand, an external experiment was to use our own data set from the National Taiwan University Hospital that included 745,441 patients, where the accuracy and AUROC were similar, that is, 0.83 and 0.88, respectively. Moreover, to effectively evaluate the prediction quality of our proposed system, we also applied the model to other clinical outcomes, including mortality and admission to the intensive care unit, and the results showed that our proposed method was approximately 3% to 5% higher in accuracy than other conventional methods. Conclusions: Our proposed method achieved better performance than the traditional method, and its implementation is relatively easy, it includes commonly used variables, and it is better suited for real-world clinical settings. It is our future work to validate our novel deep learning–based triage algorithm with prospective clinical trials, and we hope to use it to guide resource allocation in a busy ED once the validation succeeds. %M 34958305 %R 10.2196/27008 %U https://www.jmir.org/2021/12/e27008 %U https://doi.org/10.2196/27008 %U http://www.ncbi.nlm.nih.gov/pubmed/34958305 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 12 %P e25328 %T Can Real-time Computer-Aided Detection Systems Diminish the Risk of Postcolonoscopy Colorectal Cancer? %A Madalinski,Mariusz %A Prudham,Roger %+ Northern Care Alliance, Royal Oldham Hospital, Rochdale Rd, Oldham, OL1 2JH, United Kingdom, 44 01616240420, mariusz.madalinski@googlemail.com %K artificial intelligence %K colonoscopy %K adenoma %K real-time computer-aided detection %K colonic polyp %D 2021 %7 24.12.2021 %9 Viewpoint %J JMIR Med Inform %G English %X The adenoma detection rate is the constant subject of research and the main marker of quality in bowel cancer screening. However, by improving the quality of endoscopy via artificial intelligence methods, all polyps, including those with the potential for malignancy, can be removed, thereby reducing interval colorectal cancer rates. As such, the removal of all polyps may become the best marker of endoscopy quality. Thus, we present a viewpoint on integrating the computer-aided detection (CADe) of polyps with high-accuracy, real-time colonoscopy to challenge quality improvements in the performance of colonoscopy. Colonoscopy for bowel cancer screening involving the integration of a deep learning methodology (ie, integrating artificial intelligence with CADe systems) has been assessed in an effort to increase the adenoma detection rate. In this viewpoint, a few studies are described, and their results show that CADe systems are able to increase screening sensitivity. The detection of adenomatous polyps, which are associated with a potential risk of progression to colorectal cancer, and their removal are expected to reduce cancer incidence and mortality rates. However, so far, artificial intelligence methods do not increase the detection of cancer or large adenomatous polyps but contribute to the detection of small precancerous polyps. %M 34571490 %R 10.2196/25328 %U https://medinform.jmir.org/2021/12/e25328 %U https://doi.org/10.2196/25328 %U http://www.ncbi.nlm.nih.gov/pubmed/34571490 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 12 %P e19250 %T Artificial Intelligence–Based Framework for Analyzing Health Care Staff Security Practice: Mapping Review and Simulation Study %A Yeng,Prosper Kandabongee %A Nweke,Livinus Obiora %A Yang,Bian %A Ali Fauzi,Muhammad %A Snekkenes,Einar Arthur %+ Department of Information Security and Communication Technology, Norwegian University of Science and Technology, Teknologivegen 22, Gjovik, 2815, Norway, 47 61135400, prosper.yeng@ntnu.no %K artificial intelligence %K machine learning %K health care %K security practice %K framework %K security %K modeling %K analysis %D 2021 %7 22.12.2021 %9 Review %J JMIR Med Inform %G English %X Background: Blocklisting malicious activities in health care is challenging in relation to access control in health care security practices due to the fear of preventing legitimate access for therapeutic reasons. Inadvertent prevention of legitimate access can contravene the availability trait of the confidentiality, integrity, and availability triad, and may result in worsening health conditions, leading to serious consequences, including deaths. Therefore, health care staff are often provided with a wide range of access such as a “breaking-the-glass” or “self-authorization” mechanism for emergency access. However, this broad access can undermine the confidentiality and integrity of sensitive health care data because breaking-the-glass can lead to vast unauthorized access, which could be problematic when determining illegitimate access in security practices. Objective: A review was performed to pinpoint appropriate artificial intelligence (AI) methods and data sources that can be used for effective modeling and analysis of health care staff security practices. Based on knowledge obtained from the review, a framework was developed and implemented with simulated data to provide a comprehensive approach toward effective modeling and analyzing security practices of health care staff in real access logs. Methods: The flow of our approach was a mapping review to provide AI methods, data sources and their attributes, along with other categories as input for framework development. To assess implementation of the framework, electronic health record (EHR) log data were simulated and analyzed, and the performance of various approaches in the framework was compared. Results: Among the total 130 articles initially identified, 18 met the inclusion and exclusion criteria. A thorough assessment and analysis of the included articles revealed that K-nearest neighbor, Bayesian network, and decision tree (C4.5) algorithms were predominantly applied to EHR and network logs with varying input features of health care staff security practices. Based on the review results, a framework was developed and implemented with simulated logs. The decision tree obtained the best precision of 0.655, whereas the best recall was achieved by the support vector machine (SVM) algorithm at 0.977. However, the best F1-score was obtained by random forest at 0.775. In brief, three classifiers (random forest, decision tree, and SVM) in the two-class approach achieved the best precision of 0.998. Conclusions: The security practices of health care staff can be effectively analyzed using a two-class approach to detect malicious and nonmalicious security practices. Based on our comparative study, the algorithms that can effectively be used in related studies include random forest, decision tree, and SVM. Deviations of security practices from required health care staff’s security behavior in the big data context can be analyzed with real access logs to define appropriate incentives for improving conscious care security practice. %M 34941549 %R 10.2196/19250 %U https://medinform.jmir.org/2021/12/e19250 %U https://doi.org/10.2196/19250 %U http://www.ncbi.nlm.nih.gov/pubmed/34941549 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 12 %P e30798 %T Artificial Intelligence in Predicting Cardiac Arrest: Scoping Review %A Alamgir,Asma %A Mousa,Osama %A Shah,Zubair %+ College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Education City, PO BOX 34110, Street 2731, Al Luqta St, Ar-Rayyan, Doha, Qatar, 974 5074 4851, zshah@hbku.edu.qa %K artificial intelligence %K machine learning %K deep learning %K cardiac arrest %K predict %D 2021 %7 17.12.2021 %9 Review %J JMIR Med Inform %G English %X Background: Cardiac arrest is a life-threatening cessation of activity in the heart. Early prediction of cardiac arrest is important, as it allows for the necessary measures to be taken to prevent or intervene during the onset. Artificial intelligence (AI) technologies and big data have been increasingly used to enhance the ability to predict and prepare for the patients at risk. Objective: This study aims to explore the use of AI technology in predicting cardiac arrest as reported in the literature. Methods: A scoping review was conducted in line with the guidelines of the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) extension for scoping reviews. Scopus, ScienceDirect, Embase, the Institute of Electrical and Electronics Engineers, and Google Scholar were searched to identify relevant studies. Backward reference list checks of the included studies were also conducted. Study selection and data extraction were independently conducted by 2 reviewers. Data extracted from the included studies were synthesized narratively. Results: Out of 697 citations retrieved, 41 studies were included in the review, and 6 were added after backward citation checking. The included studies reported the use of AI in the prediction of cardiac arrest. Of the 47 studies, we were able to classify the approaches taken by the studies into 3 different categories: 26 (55%) studies predicted cardiac arrest by analyzing specific parameters or variables of the patients, whereas 16 (34%) studies developed an AI-based warning system. The remaining 11% (5/47) of studies focused on distinguishing patients at high risk of cardiac arrest from patients who were not at risk. Two studies focused on the pediatric population, and the rest focused on adults (45/47, 96%). Most of the studies used data sets with a size of <10,000 samples (32/47, 68%). Machine learning models were the most prominent branch of AI used in the prediction of cardiac arrest in the studies (38/47, 81%), and the most used algorithm was the neural network (23/47, 49%). K-fold cross-validation was the most used algorithm evaluation tool reported in the studies (24/47, 51%). Conclusions: AI is extensively used to predict cardiac arrest in different patient settings. Technology is expected to play an integral role in improving cardiac medicine. There is a need for more reviews to learn the obstacles to the implementation of AI technologies in clinical settings. Moreover, research focusing on how to best provide clinicians with support to understand, adapt, and implement this technology in their practice is also necessary. %M 34927595 %R 10.2196/30798 %U https://medinform.jmir.org/2021/12/e30798 %U https://doi.org/10.2196/30798 %U http://www.ncbi.nlm.nih.gov/pubmed/34927595 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 12 %P e33540 %T How Clinicians Perceive Artificial Intelligence–Assisted Technologies in Diagnostic Decision Making: Mixed Methods Approach %A Hah,Hyeyoung %A Goldin,Deana Shevit %+ Information Systems and Business Analytics, College of Business, Florida International University, 11200 SW 8th Street, Miami, FL, 33199, United States, 1 3053484342, hhah@fiu.edu %K artificial intelligence algorithms %K AI %K diagnostic capability %K virtual care %K multilevel modeling %K human-AI teaming %K natural language understanding %D 2021 %7 16.12.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: With the rapid development of artificial intelligence (AI) and related technologies, AI algorithms are being embedded into various health information technologies that assist clinicians in clinical decision making. Objective: This study aimed to explore how clinicians perceive AI assistance in diagnostic decision making and suggest the paths forward for AI-human teaming for clinical decision making in health care. Methods: This study used a mixed methods approach, utilizing hierarchical linear modeling and sentiment analysis through natural language understanding techniques. Results: A total of 114 clinicians participated in online simulation surveys in 2020 and 2021. These clinicians studied family medicine and used AI algorithms to aid in patient diagnosis. Their overall sentiment toward AI-assisted diagnosis was positive and comparable with diagnoses made without the assistance of AI. However, AI-guided decision making was not congruent with the way clinicians typically made decisions in diagnosing illnesses. In a quantitative survey, clinicians reported perceiving current AI assistance as not likely to enhance diagnostic capability and negatively influenced their overall performance (β=–0.421, P=.02). Instead, clinicians’ diagnostic capabilities tended to be associated with well-known parameters, such as education, age, and daily habit of technology use on social media platforms. Conclusions: This study elucidated clinicians’ current perceptions and sentiments toward AI-enabled diagnosis. Although the sentiment was positive, the current form of AI assistance may not be linked with efficient decision making, as AI algorithms are not well aligned with subjective human reasoning in clinical diagnosis. Developers and policy makers in health could gather behavioral data from clinicians in various disciplines to help align AI algorithms with the unique subjective patterns of reasoning that humans employ in clinical diagnosis. %M 34924356 %R 10.2196/33540 %U https://www.jmir.org/2021/12/e33540 %U https://doi.org/10.2196/33540 %U http://www.ncbi.nlm.nih.gov/pubmed/34924356 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 12 %P e33267 %T Computer-Aided Diagnosis of Gastrointestinal Ulcer and Hemorrhage Using Wireless Capsule Endoscopy: Systematic Review and Diagnostic Test Accuracy Meta-analysis %A Bang,Chang Seok %A Lee,Jae Jun %A Baik,Gwang Ho %+ Department of Internal Medicine, Hallym University College of Medicine, 77 Sakju-ro, Chuncheon, 24253, Republic of Korea, 82 33 240 5821, csbang@hallym.ac.kr %K artificial intelligence %K computer-aided diagnosis %K capsule endoscopy %K ulcer %K hemorrhage %K gastrointestinal %K endoscopy %K review %K accuracy %K meta-analysis %K diagnostic %K performance %K machine learning %K prediction models %D 2021 %7 14.12.2021 %9 Review %J J Med Internet Res %G English %X Background: Interpretation of capsule endoscopy images or movies is operator-dependent and time-consuming. As a result, computer-aided diagnosis (CAD) has been applied to enhance the efficacy and accuracy of the review process. Two previous meta-analyses reported the diagnostic performance of CAD models for gastrointestinal ulcers or hemorrhage in capsule endoscopy. However, insufficient systematic reviews have been conducted, which cannot determine the real diagnostic validity of CAD models. Objective: To evaluate the diagnostic test accuracy of CAD models for gastrointestinal ulcers or hemorrhage using wireless capsule endoscopic images. Methods: We conducted core databases searching for studies based on CAD models for the diagnosis of ulcers or hemorrhage using capsule endoscopy and presenting data on diagnostic performance. Systematic review and diagnostic test accuracy meta-analysis were performed. Results: Overall, 39 studies were included. The pooled area under the curve, sensitivity, specificity, and diagnostic odds ratio of CAD models for the diagnosis of ulcers (or erosions) were .97 (95% confidence interval, .95–.98), .93 (.89–.95), .92 (.89–.94), and 138 (79–243), respectively. The pooled area under the curve, sensitivity, specificity, and diagnostic odds ratio of CAD models for the diagnosis of hemorrhage (or angioectasia) were .99 (.98–.99), .96 (.94–0.97), .97 (.95–.99), and 888 (343–2303), respectively. Subgroup analyses showed robust results. Meta-regression showed that published year, number of training images, and target disease (ulcers vs erosions, hemorrhage vs angioectasia) was found to be the source of heterogeneity. No publication bias was detected. Conclusions: CAD models showed high performance for the optical diagnosis of gastrointestinal ulcer and hemorrhage in wireless capsule endoscopy. %M 34904949 %R 10.2196/33267 %U https://www.jmir.org/2021/12/e33267 %U https://doi.org/10.2196/33267 %U http://www.ncbi.nlm.nih.gov/pubmed/34904949 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 12 %P e26611 %T Population Preferences for Performance and Explainability of Artificial Intelligence in Health Care: Choice-Based Conjoint Survey %A Ploug,Thomas %A Sundby,Anna %A Moeslund,Thomas B %A Holm,Søren %+ Department of Communication and Psychology, Aalborg University, A C Meyers Vænge 15, Copenhagen, 2450, Denmark, 45 99402533, ploug@hum.aau.dk %K artificial Intelligence %K performance %K transparency %K explainability %K population preferences %K public policy %D 2021 %7 13.12.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Certain types of artificial intelligence (AI), that is, deep learning models, can outperform health care professionals in particular domains. Such models hold considerable promise for improved diagnostics, treatment, and prevention, as well as more cost-efficient health care. They are, however, opaque in the sense that their exact reasoning cannot be fully explicated. Different stakeholders have emphasized the importance of the transparency/explainability of AI decision making. Transparency/explainability may come at the cost of performance. There is need for a public policy regulating the use of AI in health care that balances the societal interests in high performance as well as in transparency/explainability. A public policy should consider the wider public’s interests in such features of AI. Objective: This study elicited the public’s preferences for the performance and explainability of AI decision making in health care and determined whether these preferences depend on respondent characteristics, including trust in health and technology and fears and hopes regarding AI. Methods: We conducted a choice-based conjoint survey of public preferences for attributes of AI decision making in health care in a representative sample of the adult Danish population. Initial focus group interviews yielded 6 attributes playing a role in the respondents’ views on the use of AI decision support in health care: (1) type of AI decision, (2) level of explanation, (3) performance/accuracy, (4) responsibility for the final decision, (5) possibility of discrimination, and (6) severity of the disease to which the AI is applied. In total, 100 unique choice sets were developed using fractional factorial design. In a 12-task survey, respondents were asked about their preference for AI system use in hospitals in relation to 3 different scenarios. Results: Of the 1678 potential respondents, 1027 (61.2%) participated. The respondents consider the physician having the final responsibility for treatment decisions the most important attribute, with 46.8% of the total weight of attributes, followed by explainability of the decision (27.3%) and whether the system has been tested for discrimination (14.8%). Other factors, such as gender, age, level of education, whether respondents live rurally or in towns, respondents’ trust in health and technology, and respondents’ fears and hopes regarding AI, do not play a significant role in the majority of cases. Conclusions: The 3 factors that are most important to the public are, in descending order of importance, (1) that physicians are ultimately responsible for diagnostics and treatment planning, (2) that the AI decision support is explainable, and (3) that the AI system has been tested for discrimination. Public policy on AI system use in health care should give priority to such AI system use and ensure that patients are provided with information. %M 34898454 %R 10.2196/26611 %U https://www.jmir.org/2021/12/e26611 %U https://doi.org/10.2196/26611 %U http://www.ncbi.nlm.nih.gov/pubmed/34898454 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 7 %N 4 %P e31043 %T Artificial Intelligence Education Programs for Health Care Professionals: Scoping Review %A Charow,Rebecca %A Jeyakumar,Tharshini %A Younus,Sarah %A Dolatabadi,Elham %A Salhia,Mohammad %A Al-Mouaswas,Dalia %A Anderson,Melanie %A Balakumar,Sarmini %A Clare,Megan %A Dhalla,Azra %A Gillan,Caitlin %A Haghzare,Shabnam %A Jackson,Ethan %A Lalani,Nadim %A Mattson,Jane %A Peteanu,Wanda %A Tripp,Tim %A Waldorf,Jacqueline %A Williams,Spencer %A Tavares,Walter %A Wiljer,David %+ University Health Network, 190 Elizabeth Street, R. Fraser Elliott Building RFE 3S-441, Toronto, ON, M5G 2C4, Canada, 1 416 340 4800 ext 6322, david.wiljer@uhn.ca %K machine learning %K deep learning %K health care providers %K education %K learning %K patient care %D 2021 %7 13.12.2021 %9 Review %J JMIR Med Educ %G English %X Background: As the adoption of artificial intelligence (AI) in health care increases, it will become increasingly crucial to involve health care professionals (HCPs) in developing, validating, and implementing AI-enabled technologies. However, because of a lack of AI literacy, most HCPs are not adequately prepared for this revolution. This is a significant barrier to adopting and implementing AI that will affect patients. In addition, the limited existing AI education programs face barriers to development and implementation at various levels of medical education. Objective: With a view to informing future AI education programs for HCPs, this scoping review aims to provide an overview of the types of current or past AI education programs that pertains to the programs’ curricular content, modes of delivery, critical implementation factors for education delivery, and outcomes used to assess the programs’ effectiveness. Methods: After the creation of a search strategy and keyword searches, a 2-stage screening process was conducted by 2 independent reviewers to determine study eligibility. When consensus was not reached, the conflict was resolved by consulting a third reviewer. This process consisted of a title and abstract scan and a full-text review. The articles were included if they discussed an actual training program or educational intervention, or a potential training program or educational intervention and the desired content to be covered, focused on AI, and were designed or intended for HCPs (at any stage of their career). Results: Of the 10,094 unique citations scanned, 41 (0.41%) studies relevant to our eligibility criteria were identified. Among the 41 included studies, 10 (24%) described 13 unique programs and 31 (76%) discussed recommended curricular content. The curricular content of the unique programs ranged from AI use, AI interpretation, and cultivating skills to explain results derived from AI algorithms. The curricular topics were categorized into three main domains: cognitive, psychomotor, and affective. Conclusions: This review provides an overview of the current landscape of AI in medical education and highlights the skills and competencies required by HCPs to effectively use AI in enhancing the quality of care and optimizing patient outcomes. Future education efforts should focus on the development of regulatory strategies, a multidisciplinary approach to curriculum redesign, a competency-based curriculum, and patient-clinician interaction. %M 34898458 %R 10.2196/31043 %U https://mededu.jmir.org/2021/4/e31043 %U https://doi.org/10.2196/31043 %U http://www.ncbi.nlm.nih.gov/pubmed/34898458 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 12 %P e28120 %T Transporting an Artificial Intelligence Model to Predict Emergency Cesarean Delivery: Overcoming Challenges Posed by Interfacility Variation %A Guedalia,Joshua %A Lipschuetz,Michal %A Cohen,Sarah M %A Sompolinsky,Yishai %A Walfisch,Asnat %A Sheiner,Eyal %A Sergienko,Ruslan %A Rosenbloom,Joshua %A Unger,Ron %A Yagel,Simcha %A Hochler,Hila %+ Division of Obstetrics & Gynecology, Hadassah Medical Organization and Faculty of Medicine, Hebrew University of Jerusalem, Churchill Avn. 8, Jerusalem, 9765415, Israel, 972 25841111, michal.lipschuetz@gmail.com %K machine learning %K algorithm transport %K health outcomes %K health care facilities %K artificial intelligence %K AI %K ML %K pregnancy %K birth %K pediatrics %K neonatal %K prenatal %D 2021 %7 10.12.2021 %9 Viewpoint %J J Med Internet Res %G English %X Research using artificial intelligence (AI) in medicine is expected to significantly influence the practice of medicine and the delivery of health care in the near future. However, for successful deployment, the results must be transported across health care facilities. We present a cross-facilities application of an AI model that predicts the need for an emergency caesarean during birth. The transported model showed benefit; however, there can be challenges associated with interfacility variation in reporting practices. %M 34890352 %R 10.2196/28120 %U https://www.jmir.org/2021/12/e28120 %U https://doi.org/10.2196/28120 %U http://www.ncbi.nlm.nih.gov/pubmed/34890352 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 12 %P e33049 %T Differential Biases and Variabilities of Deep Learning–Based Artificial Intelligence and Human Experts in Clinical Diagnosis: Retrospective Cohort and Survey Study %A Cha,Dongchul %A Pae,Chongwon %A Lee,Se A %A Na,Gina %A Hur,Young Kyun %A Lee,Ho Young %A Cho,A Ra %A Cho,Young Joon %A Han,Sang Gil %A Kim,Sung Huhn %A Choi,Jae Young %A Park,Hae-Jeong %+ Center for Systems and Translational Brain Sciences, Institute of Human Complexity and Systems Science, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seoul, 03722, Republic of Korea, 82 2 2228 2363, parkhj@yuhs.ac %K human-machine cooperation %K convolutional neural network %K deep learning, class imbalance problem %K otoscopy %K eardrum %K artificial intelligence %K otology %K computer-aided diagnosis %D 2021 %7 8.12.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Deep learning (DL)–based artificial intelligence may have different diagnostic characteristics than human experts in medical diagnosis. As a data-driven knowledge system, heterogeneous population incidence in the clinical world is considered to cause more bias to DL than clinicians. Conversely, by experiencing limited numbers of cases, human experts may exhibit large interindividual variability. Thus, understanding how the 2 groups classify given data differently is an essential step for the cooperative usage of DL in clinical application. Objective: This study aimed to evaluate and compare the differential effects of clinical experience in otoendoscopic image diagnosis in both computers and physicians exemplified by the class imbalance problem and guide clinicians when utilizing decision support systems. Methods: We used digital otoendoscopic images of patients who visited the outpatient clinic in the Department of Otorhinolaryngology at Severance Hospital, Seoul, South Korea, from January 2013 to June 2019, for a total of 22,707 otoendoscopic images. We excluded similar images, and 7500 otoendoscopic images were selected for labeling. We built a DL-based image classification model to classify the given image into 6 disease categories. Two test sets of 300 images were populated: balanced and imbalanced test sets. We included 14 clinicians (otolaryngologists and nonotolaryngology specialists including general practitioners) and 13 DL-based models. We used accuracy (overall and per-class) and kappa statistics to compare the results of individual physicians and the ML models. Results: Our ML models had consistently high accuracies (balanced test set: mean 77.14%, SD 1.83%; imbalanced test set: mean 82.03%, SD 3.06%), equivalent to those of otolaryngologists (balanced: mean 71.17%, SD 3.37%; imbalanced: mean 72.84%, SD 6.41%) and far better than those of nonotolaryngologists (balanced: mean 45.63%, SD 7.89%; imbalanced: mean 44.08%, SD 15.83%). However, ML models suffered from class imbalance problems (balanced test set: mean 77.14%, SD 1.83%; imbalanced test set: mean 82.03%, SD 3.06%). This was mitigated by data augmentation, particularly for low incidence classes, but rare disease classes still had low per-class accuracies. Human physicians, despite being less affected by prevalence, showed high interphysician variability (ML models: kappa=0.83, SD 0.02; otolaryngologists: kappa=0.60, SD 0.07). Conclusions: Even though ML models deliver excellent performance in classifying ear disease, physicians and ML models have their own strengths. ML models have consistent and high accuracy while considering only the given image and show bias toward prevalence, whereas human physicians have varying performance but do not show bias toward prevalence and may also consider extra information that is not images. To deliver the best patient care in the shortage of otolaryngologists, our ML model can serve a cooperative role for clinicians with diverse expertise, as long as it is kept in mind that models consider only images and could be biased toward prevalent diseases even after data augmentation. %M 34889764 %R 10.2196/33049 %U https://medinform.jmir.org/2021/12/e33049 %U https://doi.org/10.2196/33049 %U http://www.ncbi.nlm.nih.gov/pubmed/34889764 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 8 %N 12 %P e30439 %T Language, Speech, and Facial Expression Features for Artificial Intelligence–Based Detection of Cancer Survivors’ Depression: Scoping Meta-Review %A Smrke,Urška %A Mlakar,Izidor %A Lin,Simon %A Musil,Bojan %A Plohl,Nejc %+ Faculty of Electrical Engineering and Computer Science, University of Maribor, Koroška cesta 46, Maribor, 2000, Slovenia, 386 31262861, urska.smrke@um.si %K artificial intelligence %K cancer %K depression %K facial expression %K language %K oncology %K review %K screening %K speech %K symptom %D 2021 %7 6.12.2021 %9 Review %J JMIR Ment Health %G English %X Background: Cancer survivors often experience disorders from the depressive spectrum that remain largely unrecognized and overlooked. Even though screening for depression is recognized as essential, several barriers prevent its successful implementation. It is possible that better screening options can be developed. New possibilities have been opening up with advances in artificial intelligence and increasing knowledge on the connection of observable cues and psychological states. Objective: The aim of this scoping meta-review was to identify observable features of depression that can be intercepted using artificial intelligence in order to provide a stepping stone toward better recognition of depression among cancer survivors. Methods: We followed a methodological framework for scoping reviews. We searched SCOPUS and Web of Science for relevant papers on the topic, and data were extracted from the papers that met inclusion criteria. We used thematic analysis within 3 predefined categories of depression cues (ie, language, speech, and facial expression cues) to analyze the papers. Results: The search yielded 1023 papers, of which 9 met the inclusion criteria. Analysis of their findings resulted in several well-supported cues of depression in language, speech, and facial expression domains, which provides a comprehensive list of observable features that are potentially suited to be intercepted by artificial intelligence for early detection of depression. Conclusions: This review provides a synthesis of behavioral features of depression while translating this knowledge into the context of artificial intelligence–supported screening for depression in cancer survivors. %M 34874883 %R 10.2196/30439 %U https://mental.jmir.org/2021/12/e30439 %U https://doi.org/10.2196/30439 %U http://www.ncbi.nlm.nih.gov/pubmed/34874883 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 12 %P e29812 %T Analyzing Patient Trajectories With Artificial Intelligence %A Allam,Ahmed %A Feuerriegel,Stefan %A Rebhan,Michael %A Krauthammer,Michael %+ Ludwig Maximilian University of Munich, Geschwister-Scholl-Platz 1, Munich, 80539, Germany, 49 8921806790, feuerriegel@lmu.de %K patient trajectories %K longitudinal data %K digital medicine %K artificial intelligence %K machine learning %D 2021 %7 3.12.2021 %9 Viewpoint %J J Med Internet Res %G English %X In digital medicine, patient data typically record health events over time (eg, through electronic health records, wearables, or other sensing technologies) and thus form unique patient trajectories. Patient trajectories are highly predictive of the future course of diseases and therefore facilitate effective care. However, digital medicine often uses only limited patient data, consisting of health events from only a single or small number of time points while ignoring additional information encoded in patient trajectories. To analyze such rich longitudinal data, new artificial intelligence (AI) solutions are needed. In this paper, we provide an overview of the recent efforts to develop trajectory-aware AI solutions and provide suggestions for future directions. Specifically, we examine the implications for developing disease models from patient trajectories along the typical workflow in AI: problem definition, data processing, modeling, evaluation, and interpretation. We conclude with a discussion of how such AI solutions will allow the field to build robust models for personalized risk scoring, subtyping, and disease pathway discovery. %M 34870606 %R 10.2196/29812 %U https://www.jmir.org/2021/12/e29812 %U https://doi.org/10.2196/29812 %U http://www.ncbi.nlm.nih.gov/pubmed/34870606 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 5 %N 12 %P e31053 %T Assessing the Views of Professionals, Patients, and Care Partners Concerning the Use of Computer Tools in Memory Clinics: International Survey Study %A van Gils,Aniek M %A Visser,Leonie NC %A Hendriksen,Heleen MA %A Georges,Jean %A Muller,Majon %A Bouwman,Femke H %A van der Flier,Wiesje M %A Rhodius-Meester,Hanneke FM %+ Department of Neurology, Alzheimer Center Amsterdam, Amsterdam Neuroscience, Amsterdam UMC, Location VUmc, De Boelelaan 1118, Amsterdam, 1081 HZ, Netherlands, 31 204440685, a.vangils@amsterdamumc.nl %K artificial intelligence %K clinical decision support systems %K dementia %K diagnostic testing %K diagnosis %K prognosis %K communication %D 2021 %7 3.12.2021 %9 Original Paper %J JMIR Form Res %G English %X Background: Computer tools based on artificial intelligence could aid clinicians in memory clinics in several ways, such as by supporting diagnostic decision-making, web-based cognitive testing, and the communication of diagnosis and prognosis. Objective: This study aims to identify the preferences as well as the main barriers and facilitators related to using computer tools in memory clinics for all end users, that is, clinicians, patients, and care partners. Methods: Between July and October 2020, we sent out invitations to a web-based survey to clinicians using the European Alzheimer’s Disease Centers network and the Dutch Memory Clinic network, and 109 clinicians participated (mean age 45 years, SD 10; 53/109, 48.6% female). A second survey was created for patients and care partners. They were invited via Alzheimer Europe, Alzheimer’s Society United Kingdom, Amsterdam Dementia Cohort, and Amsterdam Aging Cohort. A total of 50 patients with subjective cognitive decline, mild cognitive impairment, or dementia (mean age 73 years, SD 8; 17/34, 34% female) and 46 care partners (mean age 65 years, SD 12; 25/54, 54% female) participated in this survey. Results: Most clinicians reported a willingness to use diagnostic (88/109, 80.7%) and prognostic (83/109, 76.1%) computer tools. User-friendliness (71/109, 65.1%); Likert scale mean 4.5, SD 0.7), and increasing diagnostic accuracy (76/109, 69.7%; mean 4.3, SD 0.7) were reported as the main factors stimulating the adoption of a tool. Tools should also save time and provide clear information on reliability and validity. Inadequate integration with electronic patient records (46/109, 42.2%; mean 3.8, SD 1.0) and fear of losing important clinical information (48/109, 44%; mean 3.7, SD 1.2) were most frequently indicated as barriers. Patients and care partners were equally positive about the use of computer tools by clinicians, both for diagnosis (69/96, 72%) and prognosis (73/96, 76%). In addition, most of them thought favorably regarding the possibility of using the tools themselves. Conclusions: This study showed that computer tools in memory clinics are positively valued by most end users. For further development and implementation, it is essential to overcome the technical and practical barriers of a tool while paying utmost attention to its reliability and validity. %M 34870612 %R 10.2196/31053 %U https://formative.jmir.org/2021/12/e31053 %U https://doi.org/10.2196/31053 %U http://www.ncbi.nlm.nih.gov/pubmed/34870612 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 12 %P e22798 %T Deep Learning–Assisted Burn Wound Diagnosis: Diagnostic Model Development Study %A Chang,Che Wei %A Lai,Feipei %A Christian,Mesakh %A Chen,Yu Chun %A Hsu,Ching %A Chen,Yo Shen %A Chang,Dun Hao %A Roan,Tyng Luen %A Yu,Yen Che %+ Graduate Institute of Biomedical Electronics & Bioinformatics, National Taiwan University, Room 419, Computer Science and Information Engineering-Der Tian Hall, No 1, Roosevelt Road, Sec 4, Taipei, 106319, Taiwan, 886 2 3366 4888 ext 419, flai@ntu.edu.tw %K deep learning %K semantic segmentation %K instance segmentation %K burn wounds %K percentage total body surface area %D 2021 %7 2.12.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Accurate assessment of the percentage total body surface area (%TBSA) of burn wounds is crucial in the management of burn patients. The resuscitation fluid and nutritional needs of burn patients, their need for intensive unit care, and probability of mortality are all directly related to %TBSA. It is difficult to estimate a burn area of irregular shape by inspection. Many articles have reported discrepancies in estimating %TBSA by different doctors. Objective: We propose a method, based on deep learning, for burn wound detection, segmentation, and calculation of %TBSA on a pixel-to-pixel basis. Methods: A 2-step procedure was used to convert burn wound diagnosis into %TBSA. In the first step, images of burn wounds were collected from medical records and labeled by burn surgeons, and the data set was then input into 2 deep learning architectures, U-Net and Mask R-CNN, each configured with 2 different backbones, to segment the burn wounds. In the second step, we collected and labeled images of hands to create another data set, which was also input into U-Net and Mask R-CNN to segment the hands. The %TBSA of burn wounds was then calculated by comparing the pixels of mask areas on images of the burn wound and hand of the same patient according to the rule of hand, which states that one’s hand accounts for 0.8% of TBSA. Results: A total of 2591 images of burn wounds were collected and labeled to form the burn wound data set. The data set was randomly split into training, validation, and testing sets in a ratio of 8:1:1. Four hundred images of volar hands were collected and labeled to form the hand data set, which was also split into 3 sets using the same method. For the images of burn wounds, Mask R-CNN with ResNet101 had the best segmentation result with a Dice coefficient (DC) of 0.9496, while U-Net with ResNet101 had a DC of 0.8545. For the hand images, U-Net and Mask R-CNN had similar performance with DC values of 0.9920 and 0.9910, respectively. Lastly, we conducted a test diagnosis in a burn patient. Mask R-CNN with ResNet101 had on average less deviation (0.115% TBSA) from the ground truth than burn surgeons. Conclusions: This is one of the first studies to diagnose all depths of burn wounds and convert the segmentation results into %TBSA using different deep learning models. We aimed to assist medical staff in estimating burn size more accurately, thereby helping to provide precise care to burn victims. %M 34860674 %R 10.2196/22798 %U https://medinform.jmir.org/2021/12/e22798 %U https://doi.org/10.2196/22798 %U http://www.ncbi.nlm.nih.gov/pubmed/34860674 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 5 %N 12 %P e30053 %T A Conversational Artificial Intelligence Agent for a Mental Health Care App: Evaluation Study of Its Participatory Design %A Danieli,Morena %A Ciulli,Tommaso %A Mousavi,Seyed Mahed %A Riccardi,Giuseppe %+ Speech and Interactive Signal Lab, Department of Engineering and Computer Science, Università degli Studi di Trento, Via Sommarive 5, Povo di Trento, Trento, 38123, Italy, 39 0461282087, morena.danieli@unitn.it %K mental health care %K conversational AI %K mHealth %K personal health care agents %K participatory design %K psychotherapy %D 2021 %7 1.12.2021 %9 Original Paper %J JMIR Form Res %G English %X Background: Mobile apps for mental health are available on the market. Although they seem to be promising for improving the accessibility of mental health care, little is known about their acceptance, design methodology, evaluation, and integration into psychotherapy protocols. This makes it difficult for health care professionals to judge whether these apps may help them and their patients. Objective: Our aim is to describe and evaluate a protocol for the participatory design of mobile apps for mental health. In this study, participants and psychotherapists are engaged in the early phases of the design and development of the app empowered by conversational artificial intelligence (AI). The app supports interventions for stress management training based on cognitive behavioral theory. Methods: A total of 21 participants aged 33-61 years with mild to moderate levels of stress, anxiety, and depression (assessed by administering the Italian versions of the Symptom Checklist-90-Revised, Occupational Stress Indicator, and Perceived Stress Scale) were assigned randomly to 2 groups, A and B. Both groups received stress management training sessions along with cognitive behavioral treatment, but only participants assigned to group A received support through a mobile personal health care agent, designed for mental care and empowered by AI techniques. Psychopathological outcomes were assessed at baseline (T1), after 8 weeks of treatment (T2), and 3 months after treatment (T3). Focus groups with psychotherapists who administered the therapy were held after treatment to collect their impressions and suggestions. Results: Although the intergroup statistical analysis showed that group B participants could rely on better coping strategies, group A participants reported significant improvements in obsessivity and compulsivity and positive distress symptom assessment. The psychotherapists’ acceptance of the protocol was good. In particular, they were in favor of integrating an AI-based mental health app into their practice because they could appreciate the increased engagement of patients in pursuing their therapy goals. Conclusions: The integration into practice of an AI-based mobile app for mental health was shown to be acceptable to both mental health professionals and users. Although it was not possible in this experiment to show that the integration of AI-based conversational technologies into traditional remote psychotherapy significantly decreased the participants’ levels of stress and anxiety, the experimental results showed significant trends of reduction of symptoms in group A and their persistence over time. The mental health professionals involved in the experiment reported interest in, and acceptance of, the proposed technology as a promising tool to be included in a blended model of psychotherapy. %M 34855607 %R 10.2196/30053 %U https://formative.jmir.org/2021/12/e30053 %U https://doi.org/10.2196/30053 %U http://www.ncbi.nlm.nih.gov/pubmed/34855607 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 11 %P e32507 %T Assessing the Performance of a New Artificial Intelligence–Driven Diagnostic Support Tool Using Medical Board Exam Simulations: Clinical Vignette Study %A Ben-Shabat,Niv %A Sloma,Ariel %A Weizman,Tomer %A Kiderman,David %A Amital,Howard %+ Department of Medicine ‘B’, Sheba Medical Center, Sheba Road 2, Ramat Gan, 52621, Israel, 972 3 530 2652, nivben7@gmail.com %K diagnostic decision support systems %K diagnostic support %K medical decision-making %K medical informatics %K artificial intelligence %K Kahun %K decision support %D 2021 %7 30.11.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Diagnostic decision support systems (DDSS) are computer programs aimed to improve health care by supporting clinicians in the process of diagnostic decision-making. Previous studies on DDSS demonstrated their ability to enhance clinicians’ diagnostic skills, prevent diagnostic errors, and reduce hospitalization costs. Despite the potential benefits, their utilization in clinical practice is limited, emphasizing the need for new and improved products. Objective: The aim of this study was to conduct a preliminary analysis of the diagnostic performance of “Kahun,” a new artificial intelligence-driven diagnostic tool. Methods: Diagnostic performance was evaluated based on the program’s ability to “solve” clinical cases from the United States Medical Licensing Examination Step 2 Clinical Skills board exam simulations that were drawn from the case banks of 3 leading preparation companies. Each case included 3 expected differential diagnoses. The cases were entered into the Kahun platform by 3 blinded junior physicians. For each case, the presence and the rank of the correct diagnoses within the generated differential diagnoses list were recorded. Each diagnostic performance was measured in two ways: first, as diagnostic sensitivity, and second, as case-specific success rates that represent diagnostic comprehensiveness. Results: The study included 91 clinical cases with 78 different chief complaints and a mean number of 38 (SD 8) findings for each case. The total number of expected diagnoses was 272, of which 174 were different (some appeared more than once). Of the 272 expected diagnoses, 231 (87.5%; 95% CI 76-99) diagnoses were suggested within the top 20 listed diagnoses, 209 (76.8%; 95% CI 66-87) were suggested within the top 10, and 168 (61.8%; 95% CI 52-71) within the top 5. The median rank of correct diagnoses was 3 (IQR 2-6). Of the 91 expected diagnoses, 62 (68%; 95% CI 59-78) of the cases were suggested within the top 20 listed diagnoses, 44 (48%; 95% CI 38-59) within the top 10, and 24 (26%; 95% CI 17-35) within the top 5. Of the 91 expected diagnoses, in 87 (96%; 95% CI 91-100), at least 2 out of 3 of the cases’ expected diagnoses were suggested within the top 20 listed diagnoses; 78 (86%; 95% CI 79-93) were suggested within the top 10; and 61 (67%; 95% CI 57-77) within the top 5. Conclusions: The diagnostic support tool evaluated in this study demonstrated good diagnostic accuracy and comprehensiveness; it also had the ability to manage a wide range of clinical findings. %M 34672262 %R 10.2196/32507 %U https://medinform.jmir.org/2021/11/e32507 %U https://doi.org/10.2196/32507 %U http://www.ncbi.nlm.nih.gov/pubmed/34672262 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 11 %P e26522 %T Application Scenarios for Artificial Intelligence in Nursing Care: Rapid Review %A Seibert,Kathrin %A Domhoff,Dominik %A Bruch,Dominik %A Schulte-Althoff,Matthias %A Fürstenau,Daniel %A Biessmann,Felix %A Wolf-Ostermann,Karin %+ Institute of Public Health and Nursing Research, High Profile Area Health Sciences, University of Bremen, Grazer Str. 4, Bremen, 28359, Germany, 49 42121868903, kseibert@uni-bremen.de %K nursing care %K artificial intelligence %K machine learning %K expert system %K hybrid system %D 2021 %7 29.11.2021 %9 Review %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) holds the promise of supporting nurses’ clinical decision-making in complex care situations or conducting tasks that are remote from direct patient interaction, such as documentation processes. There has been an increase in the research and development of AI applications for nursing care, but there is a persistent lack of an extensive overview covering the evidence base for promising application scenarios. Objective: This study synthesizes literature on application scenarios for AI in nursing care settings as well as highlights adjacent aspects in the ethical, legal, and social discourse surrounding the application of AI in nursing care. Methods: Following a rapid review design, PubMed, CINAHL, Association for Computing Machinery Digital Library, Institute of Electrical and Electronics Engineers Xplore, Digital Bibliography & Library Project, and Association for Information Systems Library, as well as the libraries of leading AI conferences, were searched in June 2020. Publications of original quantitative and qualitative research, systematic reviews, discussion papers, and essays on the ethical, legal, and social implications published in English were included. Eligible studies were analyzed on the basis of predetermined selection criteria. Results: The titles and abstracts of 7016 publications and 704 full texts were screened, and 292 publications were included. Hospitals were the most prominent study setting, followed by independent living at home; fewer application scenarios were identified for nursing homes or home care. Most studies used machine learning algorithms, whereas expert or hybrid systems were entailed in less than every 10th publication. The application context of focusing on image and signal processing with tracking, monitoring, or the classification of activity and health followed by care coordination and communication, as well as fall detection, was the main purpose of AI applications. Few studies have reported the effects of AI applications on clinical or organizational outcomes, lacking particularly in data gathered outside laboratory conditions. In addition to technological requirements, the reporting and inclusion of certain requirements capture more overarching topics, such as data privacy, safety, and technology acceptance. Ethical, legal, and social implications reflect the discourse on technology use in health care but have mostly not been discussed in meaningful and potentially encompassing detail. Conclusions: The results highlight the potential for the application of AI systems in different nursing care settings. Considering the lack of findings on the effectiveness and application of AI systems in real-world scenarios, future research should reflect on a more nursing care–specific perspective toward objectives, outcomes, and benefits. We identify that, crucially, an advancement in technological-societal discourse that surrounds the ethical and legal implications of AI applications in nursing care is a necessary next step. Further, we outline the need for greater participation among all of the stakeholders involved. %M 34847057 %R 10.2196/26522 %U https://www.jmir.org/2021/11/e26522 %U https://doi.org/10.2196/26522 %U http://www.ncbi.nlm.nih.gov/pubmed/34847057 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 7 %N 4 %P e27850 %T Chatbot for Health Care and Oncology Applications Using Artificial Intelligence and Machine Learning: Systematic Review %A Xu,Lu %A Sanders,Leslie %A Li,Kay %A Chow,James C L %+ Department of Medical Physics, Radiation Medicine Program, Princess Margaret Cancer Centre, University Health Network, 7/F, 700 University Avenue, Toronto, ON, M5G 1X6, Canada, 1 9464501 ext 5089, james.chow@rmp.uhn.ca %K chatbot %K artificial intelligence %K machine learning %K health %K medicine %K communication %K diagnosis %K cancer therapy %K ethics %K medical biophysics %K mobile phone %D 2021 %7 29.11.2021 %9 Review %J JMIR Cancer %G English %X Background: Chatbot is a timely topic applied in various fields, including medicine and health care, for human-like knowledge transfer and communication. Machine learning, a subset of artificial intelligence, has been proven particularly applicable in health care, with the ability for complex dialog management and conversational flexibility. Objective: This review article aims to report on the recent advances and current trends in chatbot technology in medicine. A brief historical overview, along with the developmental progress and design characteristics, is first introduced. The focus will be on cancer therapy, with in-depth discussions and examples of diagnosis, treatment, monitoring, patient support, workflow efficiency, and health promotion. In addition, this paper will explore the limitations and areas of concern, highlighting ethical, moral, security, technical, and regulatory standards and evaluation issues to explain the hesitancy in implementation. Methods: A search of the literature published in the past 20 years was conducted using the IEEE Xplore, PubMed, Web of Science, Scopus, and OVID databases. The screening of chatbots was guided by the open-access Botlist directory for health care components and further divided according to the following criteria: diagnosis, treatment, monitoring, support, workflow, and health promotion. Results: Even after addressing these issues and establishing the safety or efficacy of chatbots, human elements in health care will not be replaceable. Therefore, chatbots have the potential to be integrated into clinical practice by working alongside health practitioners to reduce costs, refine workflow efficiencies, and improve patient outcomes. Other applications in pandemic support, global health, and education are yet to be fully explored. Conclusions: Further research and interdisciplinary collaboration could advance this technology to dramatically improve the quality of care for patients, rebalance the workload for clinicians, and revolutionize the practice of medicine. %M 34847056 %R 10.2196/27850 %U https://cancer.jmir.org/2021/4/e27850 %U https://doi.org/10.2196/27850 %U http://www.ncbi.nlm.nih.gov/pubmed/34847056 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 11 %P e25466 %T Electrophysiological Brain Changes Associated With Cognitive Improvement in a Pediatric Attention Deficit Hyperactivity Disorder Digital Artificial Intelligence-Driven Intervention: Randomized Controlled Trial %A Medina,Rafael %A Bouhaben,Jaime %A de Ramón,Ignacio %A Cuesta,Pablo %A Antón-Toro,Luis %A Pacios,Javier %A Quintero,Javier %A Ramos-Quiroga,Josep Antoni %A Maestú,Fernando %+ Sincrolab Ltd, Prensa 7, Madrid, 28033, Spain, 34 630 364 425, nacho@sincrolab.es %K ADHD %K cognitive stimulation %K magnetoencephalography %K artificial intelligence %K Conners continuous performance test %K KAD_SCL_01 %K AI %K cognitive impairment %K attention deficit hyperactivity disorder %K pediatrics %K children %K rehabilitation %D 2021 %7 26.11.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Cognitive stimulation therapy appears to show promising results in the rehabilitation of impaired cognitive processes in attention deficit hyperactivity disorder. Objective: Encouraged by this evidence and the ever-increasing use of technology and artificial intelligence for therapeutic purposes, we examined whether cognitive stimulation therapy implemented on a mobile device and controlled by an artificial intelligence engine can be effective in the neurocognitive rehabilitation of these patients. Methods: In this randomized study, 29 child participants (25 males) underwent training with a smart, digital, cognitive stimulation program (KAD_SCL_01) or with 3 commercial video games for 12 weeks, 3 days a week, 15 minutes a day. Participants completed a neuropsychological assessment and a preintervention and postintervention magnetoencephalography study in a resting state with their eyes closed. In addition, information on clinical symptoms was collected from the child´s legal guardians. Results: In line with our main hypothesis, we found evidence that smart, digital, cognitive treatment results in improvements in inhibitory control performance. Improvements were also found in visuospatial working memory performance and in the cognitive flexibility, working memory, and behavior and general executive functioning behavioral clinical indexes in this group of participants. Finally, the improvements found in inhibitory control were related to increases in alpha-band power in all participants in the posterior regions, including 2 default mode network regions of the interest: the bilateral precuneus and the bilateral posterior cingulate cortex. However, only the participants who underwent cognitive stimulation intervention (KAD_SCL_01) showed a significant increase in this relationship. Conclusions: The results seem to indicate that smart, digital treatment can be effective in the inhibitory control and visuospatial working memory rehabilitation in patients with attention deficit hyperactivity disorder. Furthermore, the relation of the inhibitory control with alpha-band power changes could mean that these changes are a product of plasticity mechanisms or changes in the neuromodulatory dynamics. Trial Registration: ISRCTN Registry ISRCTN71041318; https://www.isrctn.com/ISRCTN71041318 %M 34842533 %R 10.2196/25466 %U https://www.jmir.org/2021/11/e25466 %U https://doi.org/10.2196/25466 %U http://www.ncbi.nlm.nih.gov/pubmed/34842533 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 8 %N 4 %P e26964 %T Clinician Preimplementation Perspectives of a Decision-Support Tool for the Prediction of Cardiac Arrhythmia Based on Machine Learning: Near-Live Feasibility and Qualitative Study %A Matthiesen,Stina %A Diederichsen,Søren Zöga %A Hansen,Mikkel Klitzing Hartmann %A Villumsen,Christina %A Lassen,Mats Christian Højbjerg %A Jacobsen,Peter Karl %A Risum,Niels %A Winkel,Bo Gregers %A Philbert,Berit T %A Svendsen,Jesper Hastrup %A Andersen,Tariq Osman %+ Department of Computer Science, Faculty of Science, University of Copenhagen, Universitetsparken 5, Copenhagen, 2100, Denmark, 45 21231008, matthiesen@di.ku.dk %K cardiac arrhythmia %K short-term prediction %K clinical decision support systems %K machine learning %K artificial intelligence %K preimplementation %K qualitative study %K implantable cardioverter defibrillator %K remote follow-up %K sociotechnical %D 2021 %7 26.11.2021 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Artificial intelligence (AI), such as machine learning (ML), shows great promise for improving clinical decision-making in cardiac diseases by outperforming statistical-based models. However, few AI-based tools have been implemented in cardiology clinics because of the sociotechnical challenges during transitioning from algorithm development to real-world implementation. Objective: This study explored how an ML-based tool for predicting ventricular tachycardia and ventricular fibrillation (VT/VF) could support clinical decision-making in the remote monitoring of patients with an implantable cardioverter defibrillator (ICD). Methods: Seven experienced electrophysiologists participated in a near-live feasibility and qualitative study, which included walkthroughs of 5 blinded retrospective patient cases, use of the prediction tool, and questionnaires and interview questions. All sessions were video recorded, and sessions evaluating the prediction tool were transcribed verbatim. Data were analyzed through an inductive qualitative approach based on grounded theory. Results: The prediction tool was found to have potential for supporting decision-making in ICD remote monitoring by providing reassurance, increasing confidence, acting as a second opinion, reducing information search time, and enabling delegation of decisions to nurses and technicians. However, the prediction tool did not lead to changes in clinical action and was found less useful in cases where the quality of data was poor or when VT/VF predictions were found to be irrelevant for evaluating the patient. Conclusions: When transitioning from AI development to testing its feasibility for clinical implementation, we need to consider the following: expectations must be aligned with the intended use of AI; trust in the prediction tool is likely to emerge from real-world use; and AI accuracy is relational and dependent on available information and local workflows. Addressing the sociotechnical gap between the development and implementation of clinical decision-support tools based on ML in cardiac care is essential for succeeding with adoption. It is suggested to include clinical end-users, clinical contexts, and workflows throughout the overall iterative approach to design, development, and implementation. %M 34842528 %R 10.2196/26964 %U https://humanfactors.jmir.org/2021/4/e26964 %U https://doi.org/10.2196/26964 %U http://www.ncbi.nlm.nih.gov/pubmed/34842528 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 11 %P e25856 %T Patients’ Perceptions Toward Human–Artificial Intelligence Interaction in Health Care: Experimental Study %A Esmaeilzadeh,Pouyan %A Mirzaei,Tala %A Dharanikota,Spurthy %+ Department of Information Systems and Business Analytics, College of Business, Florida International University, Modesto A Maidique Campus, 11200 SW 8th Street, Miami, FL, 33199, United States, 1 305 348 330, pesmaeil@fiu.edu %K AI clinical applications %K collective intelligence %K in-person examinations %K perceived benefits %K perceived risks %D 2021 %7 25.11.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: It is believed that artificial intelligence (AI) will be an integral part of health care services in the near future and will be incorporated into several aspects of clinical care such as prognosis, diagnostics, and care planning. Thus, many technology companies have invested in producing AI clinical applications. Patients are one of the most important beneficiaries who potentially interact with these technologies and applications; thus, patients’ perceptions may affect the widespread use of clinical AI. Patients should be ensured that AI clinical applications will not harm them, and that they will instead benefit from using AI technology for health care purposes. Although human-AI interaction can enhance health care outcomes, possible dimensions of concerns and risks should be addressed before its integration with routine clinical care. Objective: The main objective of this study was to examine how potential users (patients) perceive the benefits, risks, and use of AI clinical applications for their health care purposes and how their perceptions may be different if faced with three health care service encounter scenarios. Methods: We designed a 2×3 experiment that crossed a type of health condition (ie, acute or chronic) with three different types of clinical encounters between patients and physicians (ie, AI clinical applications as substituting technology, AI clinical applications as augmenting technology, and no AI as a traditional in-person visit). We used an online survey to collect data from 634 individuals in the United States. Results: The interactions between the types of health care service encounters and health conditions significantly influenced individuals’ perceptions of privacy concerns, trust issues, communication barriers, concerns about transparency in regulatory standards, liability risks, benefits, and intention to use across the six scenarios. We found no significant differences among scenarios regarding perceptions of performance risk and social biases. Conclusions: The results imply that incompatibility with instrumental, technical, ethical, or regulatory values can be a reason for rejecting AI applications in health care. Thus, there are still various risks associated with implementing AI applications in diagnostics and treatment recommendations for patients with both acute and chronic illnesses. The concerns are also evident if the AI applications are used as a recommendation system under physician experience, wisdom, and control. Prior to the widespread rollout of AI, more studies are needed to identify the challenges that may raise concerns for implementing and using AI applications. This study could provide researchers and managers with critical insights into the determinants of individuals’ intention to use AI clinical applications. Regulatory agencies should establish normative standards and evaluation guidelines for implementing AI in health care in cooperation with health care institutions. Regular audits and ongoing monitoring and reporting systems can be used to continuously evaluate the safety, quality, transparency, and ethical factors of AI clinical applications. %M 34842535 %R 10.2196/25856 %U https://www.jmir.org/2021/11/e25856 %U https://doi.org/10.2196/25856 %U http://www.ncbi.nlm.nih.gov/pubmed/34842535 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 5 %N 11 %P e31366 %T Contemporary English Pain Descriptors as Detected on Social Media Using Artificial Intelligence and Emotion Analytics Algorithms: Cross-sectional Study %A Tan,Ming Yi %A Goh,Charlene Enhui %A Tan,Hee Hon %+ Faculty of Dentistry, National University of Singapore, 9 Lower Kent Ridge Road, Singapore, 119085, Singapore, 65 67725340, dentmy@nus.edu.sg %K pain descriptors %K social media %K artificial intelligence %K emotion analytics %K McGill Pain Questionnaire %D 2021 %7 25.11.2021 %9 Original Paper %J JMIR Form Res %G English %X Background: Pain description is fundamental to health care. The McGill Pain Questionnaire (MPQ) has been validated as a tool for the multidimensional measurement of pain; however, its use relies heavily on language proficiency. Although the MPQ has remained unchanged since its inception, the English language has evolved significantly since then. The advent of the internet and social media has allowed for the generation of a staggering amount of publicly available data, allowing linguistic analysis at a scale never seen before. Objective: The aim of this study is to use social media data to examine the relevance of pain descriptors from the existing MPQ, identify novel contemporary English descriptors for pain among users of social media, and suggest a modification for a new MPQ for future validation and testing. Methods: All posts from social media platforms from January 1, 2019, to December 31, 2019, were extracted. Artificial intelligence and emotion analytics algorithms (Crystalace and CrystalFeel) were used to measure the emotional properties of the text, including sarcasm, anger, fear, sadness, joy, and valence. Word2Vec was used to identify new pain descriptors associated with the original descriptors from the MPQ. Analysis of count and pain intensity formed the basis for proposing new pain descriptors and determining the order of pain descriptors within each subclass. Results: A total of 118 new associated words were found via Word2Vec. Of these 118 words, 49 (41.5%) words had a count of at least 110, which corresponded to the count of the bottom 10% (8/78) of the original MPQ pain descriptors. The count and intensity of pain descriptors were used to formulate the inclusion criteria for a new pain questionnaire. For the suggested new pain questionnaire, 11 existing pain descriptors were removed, 13 new descriptors were added to existing subclasses, and a new Psychological subclass comprising 9 descriptors was added. Conclusions: This study presents a novel methodology using social media data to identify new pain descriptors and can be repeated at regular intervals to ensure the relevance of pain questionnaires. The original MPQ contains several potentially outdated pain descriptors and is inadequate for reporting the psychological aspects of pain. Further research is needed to examine the reliability and validity of the revised MPQ. %M 34842554 %R 10.2196/31366 %U https://formative.jmir.org/2021/11/e31366 %U https://doi.org/10.2196/31366 %U http://www.ncbi.nlm.nih.gov/pubmed/34842554 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 11 %P e22934 %T Artificial Intelligence for Skin Cancer Detection: Scoping Review %A Takiddin,Abdulrahman %A Schneider,Jens %A Yang,Yin %A Abd-Alrazaq,Alaa %A Househ,Mowafa %+ Department of Electrical and Computer Engineering, Texas A&M University, 188 Bizzell St, College Station, TX, 77843, United States, 974 44230425, abdulrahman.takiddin@tamu.edu %K artificial intelligence %K skin cancer %K skin lesion %K machine learning %K deep neural networks %D 2021 %7 24.11.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Skin cancer is the most common cancer type affecting humans. Traditional skin cancer diagnosis methods are costly, require a professional physician, and take time. Hence, to aid in diagnosing skin cancer, artificial intelligence (AI) tools are being used, including shallow and deep machine learning–based methodologies that are trained to detect and classify skin cancer using computer algorithms and deep neural networks. Objective: The aim of this study was to identify and group the different types of AI-based technologies used to detect and classify skin cancer. The study also examined the reliability of the selected papers by studying the correlation between the data set size and the number of diagnostic classes with the performance metrics used to evaluate the models. Methods: We conducted a systematic search for papers using Institute of Electrical and Electronics Engineers (IEEE) Xplore, Association for Computing Machinery Digital Library (ACM DL), and Ovid MEDLINE databases following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR) guidelines. The studies included in this scoping review had to fulfill several selection criteria: being specifically about skin cancer, detecting or classifying skin cancer, and using AI technologies. Study selection and data extraction were independently conducted by two reviewers. Extracted data were narratively synthesized, where studies were grouped based on the diagnostic AI techniques and their evaluation metrics. Results: We retrieved 906 papers from the 3 databases, of which 53 were eligible for this review. Shallow AI-based techniques were used in 14 studies, and deep AI-based techniques were used in 39 studies. The studies used up to 11 evaluation metrics to assess the proposed models, where 39 studies used accuracy as the primary evaluation metric. Overall, studies that used smaller data sets reported higher accuracy. Conclusions: This paper examined multiple AI-based skin cancer detection models. However, a direct comparison between methods was hindered by the varied use of different evaluation metrics and image types. Performance scores were affected by factors such as data set size, number of diagnostic classes, and techniques. Hence, the reliability of shallow and deep models with higher accuracy scores was questionable since they were trained and tested on relatively small data sets of a few diagnostic classes. %M 34821566 %R 10.2196/22934 %U https://www.jmir.org/2021/11/e22934 %U https://doi.org/10.2196/22934 %U http://www.ncbi.nlm.nih.gov/pubmed/34821566 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 8 %N 11 %P e29838 %T Machine Learning Methods for Predicting Postpartum Depression: Scoping Review %A Saqib,Kiran %A Khan,Amber Fozia %A Butt,Zahid Ahmad %+ School of Public Health Sciences, University of Waterloo, 200 University Avenue West, Waterloo, ON, N2L 3G1, Canada, 1 5198884567 ext 45107, zahid.butt@uwaterloo.ca %K machine learning %K postpartum depression %K big data %K mobile phone %D 2021 %7 24.11.2021 %9 Review %J JMIR Ment Health %G English %X Background: Machine learning (ML) offers vigorous statistical and probabilistic techniques that can successfully predict certain clinical conditions using large volumes of data. A review of ML and big data research analytics in maternal depression is pertinent and timely, given the rapid technological developments in recent years. Objective: This study aims to synthesize the literature on ML and big data analytics for maternal mental health, particularly the prediction of postpartum depression (PPD). Methods: We used a scoping review methodology using the Arksey and O’Malley framework to rapidly map research activity in ML for predicting PPD. Two independent researchers searched PsycINFO, PubMed, IEEE Xplore, and the ACM Digital Library in September 2020 to identify relevant publications in the past 12 years. Data were extracted from the articles’ ML model, data type, and study results. Results: A total of 14 studies were identified. All studies reported the use of supervised learning techniques to predict PPD. Support vector machine and random forest were the most commonly used algorithms in addition to Naive Bayes, regression, artificial neural network, decision trees, and XGBoost (Extreme Gradient Boosting). There was considerable heterogeneity in the best-performing ML algorithm across the selected studies. The area under the receiver operating characteristic curve values reported for different algorithms were support vector machine (range 0.78-0.86), random forest method (0.88), XGBoost (0.80), and logistic regression (0.93). Conclusions: ML algorithms can analyze larger data sets and perform more advanced computations, which can significantly improve the detection of PPD at an early stage. Further clinical research collaborations are required to fine-tune ML algorithms for prediction and treatment. ML might become part of evidence-based practice in addition to clinical knowledge and existing research evidence. %M 34822337 %R 10.2196/29838 %U https://mental.jmir.org/2021/11/e29838 %U https://doi.org/10.2196/29838 %U http://www.ncbi.nlm.nih.gov/pubmed/34822337 %0 Journal Article %@ 2563-6316 %I JMIR Publications %V 2 %N 4 %P e26993 %T Machine Learning and Medication Adherence: Scoping Review %A Bohlmann,Aaron %A Mostafa,Javed %A Kumar,Manish %+ Carolina Population Center, University of North Carolina at Chapel Hill, 216 Lenoir Drive CB #3360, Chapel Hill, NC, 27599-3360, United States, 1 (919) 962 8366, aaronjbohlmann@gmail.com %K machine learning %K medication adherence %K adherence monitoring %K adherence prediction %K medication compliance %K health technology %D 2021 %7 24.11.2021 %9 Review %J JMIRx Med %G English %X Background: This is the first scoping review to focus broadly on the topics of machine learning and medication adherence. Objective: This review aims to categorize, summarize, and analyze literature focused on using machine learning for actions related to medication adherence. Methods: PubMed, Scopus, ACM Digital Library, IEEE, and Web of Science were searched to find works that meet the inclusion criteria. After full-text review, 43 works were included in the final analysis. Information of interest was systematically charted before inclusion in the final draft. Studies were placed into natural categories for additional analysis dependent upon the combination of actions related to medication adherence. The protocol for this scoping review was created using the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. Results: Publications focused on predicting medication adherence have uncovered 20 strong predictors that were significant in two or more studies. A total of 13 studies that predicted medication adherence used either self-reported questionnaires or pharmacy claims data to determine medication adherence status. In addition, 13 studies that predicted medication adherence did so using either logistic regression, artificial neural networks, random forest, or support vector machines. Of the 15 studies that predicted medication adherence, 6 reported predictor accuracy, the lowest of which was 77.6%. Of 13 monitoring systems, 12 determined medication administration using medication container sensors or sensors in consumer electronics, like smartwatches or smartphones. A total of 11 monitoring systems used logistic regression, artificial neural networks, support vector machines, or random forest algorithms to determine medication administration. The 4 systems that monitored inhaler administration reported a classification accuracy of 93.75% or higher. The 2 systems that monitored medication status in patients with Parkinson disease reported a classification accuracy of 78% or higher. A total of 3 studies monitored medication administration using only smartwatch sensors and reported a classification accuracy of 78.6% or higher. Two systems that provided context-aware medication reminders helped patients to achieve an adherence level of 92% or higher. Two conversational artificial intelligence reminder systems significantly improved adherence rates when compared against traditional reminder systems. Conclusions: Creation of systems that accurately predict medication adherence across multiple data sets may be possible due to predictors remaining strong across multiple studies. Higher quality measures of adherence should be adopted when possible so that prediction algorithms are based on accurate information. Currently, medication adherence can be predicted with a good level of accuracy, potentially allowing for the development of interventions aimed at preventing nonadherence. Monitoring systems that track inhaler use currently classify inhaler-related actions with an excellent level of accuracy, allowing for tracking of adherence and potentially proper inhaler technique. Systems that monitor medication states in patients with Parkinson disease can currently achieve a good level of classification accuracy and have the potential to inform medication therapy changes in the future. Medication administration monitoring systems that only use motion sensors in smartwatches can currently achieve a good level of classification accuracy but only when differentiating between a small number of possible activities. Context-aware reminder systems can help patients achieve high levels of medication adherence but are also intrusive, which may not be acceptable to users. Conversational artificial intelligence reminder systems can significantly improve adherence. %M 37725549 %R 10.2196/26993 %U https://med.jmirx.org/2021/4/e26993 %U https://doi.org/10.2196/26993 %U http://www.ncbi.nlm.nih.gov/pubmed/37725549 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 11 %P e29554 %T A Markerless 2D Video, Facial Feature Recognition–Based, Artificial Intelligence Model to Assist With Screening for Parkinson Disease: Development and Usability Study %A Hou,Xinyao %A Zhang,Yu %A Wang,Yanping %A Wang,Xinyi %A Zhao,Jiahao %A Zhu,Xiaobo %A Su,Jianbo %+ Department of Automation, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai, 200240, China, 86 21 34204276, jbsu@sjtu.edu.cn %K Parkinson disease %K facial features %K artificial intelligence %K diagnosis %D 2021 %7 19.11.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Masked face is a characteristic clinical manifestation of Parkinson disease (PD), but subjective evaluations from different clinicians often show low consistency owing to a lack of accurate detection technology. Hence, it is of great significance to develop methods to make monitoring easier and more accessible. Objective: The study aimed to develop a markerless 2D video, facial feature recognition–based, artificial intelligence (AI) model to assess facial features of PD patients and investigate how AI could help neurologists improve the performance of early PD diagnosis. Methods: We collected 140 videos of facial expressions from 70 PD patients and 70 matched controls from 3 hospitals using a single 2D video camera. We developed and tested an AI model that performs masked face recognition of PD patients based on the acquisition and evaluation of facial features including geometric and texture features. Random forest, support vector machines, and k-nearest neighbor were used to train the model. The diagnostic performance of the AI model was compared with that of 5 neurologists. Results: The experimental results showed that our AI models can achieve feasible and effective facial feature recognition ability to assist with PD diagnosis. The accuracy of PD diagnosis can reach 83% using geometric features. And with the model trained by random forest, the accuracy of texture features is up to 86%. When these 2 features are combined, an F1 value of 88% can be reached, where the random forest algorithm is used. Further, the facial features of patients with PD were not associated with the motor and nonmotor symptoms of PD. Conclusions: PD patients commonly exhibit masked facial features. Videos of a facial feature recognition–based AI model can provide a valuable tool to assist with PD diagnosis and the potential of realizing remote monitoring of the patient’s condition, especially during the COVID-19 pandemic. %M 34806994 %R 10.2196/29554 %U https://www.jmir.org/2021/11/e29554 %U https://doi.org/10.2196/29554 %U http://www.ncbi.nlm.nih.gov/pubmed/34806994 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 11 %P e30079 %T Prediction Model of Osteonecrosis of the Femoral Head After Femoral Neck Fracture: Machine Learning–Based Development and Validation Study %A Wang,Huan %A Wu,Wei %A Han,Chunxia %A Zheng,Jiaqi %A Cai,Xinyu %A Chang,Shimin %A Shi,Junlong %A Xu,Nan %A Ai,Zisheng %+ Department of Medical Statistics, Tongji University School of Medicine, No. 1239 Singping Road, Shanghai, 200092, China, 86 1 377 438 0743, azs1966@126.com %K femoral neck fracture %K osteonecrosis of the femoral head %K machine learning %K interpretability %D 2021 %7 19.11.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: The absolute number of femoral neck fractures (FNFs) is increasing; however, the prediction of traumatic femoral head necrosis remains difficult. Machine learning algorithms have the potential to be superior to traditional prediction methods for the prediction of traumatic femoral head necrosis. Objective: The aim of this study is to use machine learning to construct a model for the analysis of risk factors and prediction of osteonecrosis of the femoral head (ONFH) in patients with FNF after internal fixation. Methods: We retrospectively collected preoperative, intraoperative, and postoperative clinical data of patients with FNF in 4 hospitals in Shanghai and followed up the patients for more than 2.5 years. A total of 259 patients with 43 variables were included in the study. The data were randomly divided into a training set (181/259, 69.8%) and a validation set (78/259, 30.1%). External data (n=376) were obtained from a retrospective cohort study of patients with FNF in 3 other hospitals. Least absolute shrinkage and selection operator regression and the support vector machine algorithm were used for variable selection. Logistic regression, random forest, support vector machine, and eXtreme Gradient Boosting (XGBoost) were used to develop the model on the training set. The validation set was used to tune the model hyperparameters to determine the final prediction model, and the external data were used to compare and evaluate the model performance. We compared the accuracy, discrimination, and calibration of the models to identify the best machine learning algorithm for predicting ONFH. Shapley additive explanations and local interpretable model-agnostic explanations were used to determine the interpretability of the black box model. Results: A total of 11 variables were selected for the models. The XGBoost model performed best on the validation set and external data. The accuracy, sensitivity, and area under the receiver operating characteristic curve of the model on the validation set were 0.987, 0.929, and 0.992, respectively. The accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve of the model on the external data were 0.907, 0.807, 0.935, and 0.933, respectively, and the log-loss was 0.279. The calibration curve demonstrated good agreement between the predicted probability and actual risk. The interpretability of the features and individual predictions were realized using the Shapley additive explanations and local interpretable model-agnostic explanations algorithms. In addition, the XGBoost model was translated into a self-made web-based risk calculator to estimate an individual’s probability of ONFH. Conclusions: Machine learning performs well in predicting ONFH after internal fixation of FNF. The 6-variable XGBoost model predicted the risk of ONFH well and had good generalization ability on the external data, which can be used for the clinical prediction of ONFH after internal fixation of FNF. %M 34806984 %R 10.2196/30079 %U https://medinform.jmir.org/2021/11/e30079 %U https://doi.org/10.2196/30079 %U http://www.ncbi.nlm.nih.gov/pubmed/34806984 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 11 %P e29447 %T Patient Interactions With an Automated Conversational Agent Delivering Pretest Genetics Education: Descriptive Study %A Chavez-Yenter,Daniel %A Kimball,Kadyn E %A Kohlmann,Wendy %A Lorenz Chambers,Rachelle %A Bradshaw,Richard L %A Espinel,Whitney F %A Flynn,Michael %A Gammon,Amanda %A Goldberg,Eric %A Hagerty,Kelsi J %A Hess,Rachel %A Kessler,Cecilia %A Monahan,Rachel %A Temares,Danielle %A Tobik,Katie %A Mann,Devin M %A Kawamoto,Kensaku %A Del Fiol,Guilherme %A Buys,Saundra S %A Ginsburg,Ophira %A Kaphingst,Kimberly A %+ Department of Communication, University of Utah, 255 S Central Campus Drive, Salt Lake City, UT, 84112, United States, 1 801 213 5724, daniel.chavez-yenter@utah.edu %K cancer %K genetic testing %K virtual conversational agent %K user interaction %K smartphone %K mobile phone %D 2021 %7 18.11.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Cancer genetic testing to assess an individual’s cancer risk and to enable genomics-informed cancer treatment has grown exponentially in the past decade. Because of this continued growth and a shortage of health care workers, there is a need for automated strategies that provide high-quality genetics services to patients to reduce the clinical demand for genetics providers. Conversational agents have shown promise in managing mental health, pain, and other chronic conditions and are increasingly being used in cancer genetic services. However, research on how patients interact with these agents to satisfy their information needs is limited. Objective: Our primary aim is to assess user interactions with a conversational agent for pretest genetics education. Methods: We conducted a feasibility study of user interactions with a conversational agent who delivers pretest genetics education to primary care patients without cancer who are eligible for cancer genetic evaluation. The conversational agent provided scripted content similar to that delivered in a pretest genetic counseling visit for cancer genetic testing. Outside of a core set of information delivered to all patients, users were able to navigate within the chat to request additional content in their areas of interest. An artificial intelligence–based preprogrammed library was also established to allow users to ask open-ended questions to the conversational agent. Transcripts of the interactions were recorded. Here, we describe the information selected, time spent to complete the chat, and use of the open-ended question feature. Descriptive statistics were used for quantitative measures, and thematic analyses were used for qualitative responses. Results: We invited 103 patients to participate, of which 88.3% (91/103) were offered access to the conversational agent, 39% (36/91) started the chat, and 32% (30/91) completed the chat. Most users who completed the chat indicated that they wanted to continue with genetic testing (21/30, 70%), few were unsure (9/30, 30%), and no patient declined to move forward with testing. Those who decided to test spent an average of 10 (SD 2.57) minutes on the chat, selected an average of 1.87 (SD 1.2) additional pieces of information, and generally did not ask open-ended questions. Those who were unsure spent 4 more minutes on average (mean 14.1, SD 7.41; P=.03) on the chat, selected an average of 3.67 (SD 2.9) additional pieces of information, and asked at least one open-ended question. Conclusions: The pretest chat provided enough information for most patients to decide on cancer genetic testing, as indicated by the small number of open-ended questions. A subset of participants were still unsure about receiving genetic testing and may require additional education or interpersonal support before making a testing decision. Conversational agents have the potential to become a scalable alternative for pretest genetics education, reducing the clinical demand on genetics providers. %M 34792472 %R 10.2196/29447 %U https://www.jmir.org/2021/11/e29447 %U https://doi.org/10.2196/29447 %U http://www.ncbi.nlm.nih.gov/pubmed/34792472 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 11 %P e30066 %T Deep Learning Techniques for Fatty Liver Using Multi-View Ultrasound Images Scanned by Different Scanners: Development and Validation Study %A Kim,Taewoo %A Lee,Dong Hyun %A Park,Eun-Kee %A Choi,Sanghun %+ School of Mechanical Engineering, Kyungpook National University, 80 Daehak-ro, Buk-gu, Daegu, 41566, Republic of Korea, 82 53 950 5578, s-choi@knu.ac.kr %K fatty liver %K deep learning %K transfer learning %K classification %K regression %K magnetic resonance imaging–proton density fat fraction %K multi-view ultrasound images %K artificial intelligence %K machine imaging %K imaging %K informatics %K fatty liver disease %K detection %K diagnosis %D 2021 %7 18.11.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Fat fraction values obtained from magnetic resonance imaging (MRI) can be used to obtain an accurate diagnosis of fatty liver diseases. However, MRI is expensive and cannot be performed for everyone. Objective: In this study, we aim to develop multi-view ultrasound image–based convolutional deep learning models to detect fatty liver disease and yield fat fraction values. Methods: We extracted 90 ultrasound images of the right intercostal view and 90 ultrasound images of the right intercostal view containing the right renal cortex from 39 cases of fatty liver (MRI–proton density fat fraction [MRI–PDFF] ≥ 5%) and 51 normal subjects (MRI–PDFF < 5%), with MRI–PDFF values obtained from Good Gang-An Hospital. We obtained combined liver and kidney-liver (CLKL) images to train the deep learning models and developed classification and regression models based on the VGG19 model to classify fatty liver disease and yield fat fraction values. We employed the data augmentation techniques such as flip and rotation to prevent the deep learning model from overfitting. We determined the deep learning model with performance metrics such as accuracy, sensitivity, specificity, and coefficient of determination (R2). Results: In demographic information, all metrics such as age and sex were similar between the two groups—fatty liver disease and normal subjects. In classification, the model trained on CLKL images achieved 80.1% accuracy, 86.2% precision, and 80.5% specificity to detect fatty liver disease. In regression, the predicted fat fraction values of the regression model trained on CLKL images correlated with MRI–PDFF values (R2=0.633), indicating that the predicted fat fraction values were moderately estimated. Conclusions: With deep learning techniques and multi-view ultrasound images, it is potentially possible to replace MRI–PDFF values with deep learning predictions for detecting fatty liver disease and estimating fat fraction values. %M 34792476 %R 10.2196/30066 %U https://medinform.jmir.org/2021/11/e30066 %U https://doi.org/10.2196/30066 %U http://www.ncbi.nlm.nih.gov/pubmed/34792476 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 11 %P e32662 %T Machine Learning–Based Hospital Discharge Prediction for Patients With Cardiovascular Diseases: Development and Usability Study %A Ahn,Imjin %A Gwon,Hansle %A Kang,Heejun %A Kim,Yunha %A Seo,Hyeram %A Choi,Heejung %A Cho,Ha Na %A Kim,Minkyoung %A Jun,Tae Joon %A Kim,Young-Hak %+ Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympic-ro 43-gil, Seoul, 05505, Republic of Korea, 82 2 3010 3955, mdyhkim@amc.seoul.kr %K electronic health records %K cardiovascular diseases %K discharge prediction %K bed management %K explainable artificial intelligence %D 2021 %7 17.11.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Effective resource management in hospitals can improve the quality of medical services by reducing labor-intensive burdens on staff, decreasing inpatient waiting time, and securing the optimal treatment time. The use of hospital processes requires effective bed management; a stay in the hospital that is longer than the optimal treatment time hinders bed management. Therefore, predicting a patient’s hospitalization period may support the making of judicious decisions regarding bed management. Objective: First, this study aims to develop a machine learning (ML)–based predictive model for predicting the discharge probability of inpatients with cardiovascular diseases (CVDs). Second, we aim to assess the outcome of the predictive model and explain the primary risk factors of inpatients for patient-specific care. Finally, we aim to evaluate whether our ML-based predictive model helps manage bed scheduling efficiently and detects long-term inpatients in advance to improve the use of hospital processes and enhance the quality of medical services. Methods: We set up the cohort criteria and extracted the data from CardioNet, a manually curated database that specializes in CVDs. We processed the data to create a suitable data set by reindexing the date-index, integrating the present features with past features from the previous 3 years, and imputing missing values. Subsequently, we trained the ML-based predictive models and evaluated them to find an elaborate model. Finally, we predicted the discharge probability within 3 days and explained the outcomes of the model by identifying, quantifying, and visualizing its features. Results: We experimented with 5 ML-based models using 5 cross-validations. Extreme gradient boosting, which was selected as the final model, accomplished an average area under the receiver operating characteristic curve score that was 0.865 higher than that of the other models (ie, logistic regression, random forest, support vector machine, and multilayer perceptron). Furthermore, we performed feature reduction, represented the feature importance, and assessed prediction outcomes. One of the outcomes, the individual explainer, provides a discharge score during hospitalization and a daily feature influence score to the medical team and patients. Finally, we visualized simulated bed management to use the outcomes. Conclusions: In this study, we propose an individual explainer based on an ML-based predictive model, which provides the discharge probability and relative contributions of individual features. Our model can assist medical teams and patients in identifying individual and common risk factors in CVDs and can support hospital administrators in improving the management of hospital beds and other resources. %M 34787584 %R 10.2196/32662 %U https://medinform.jmir.org/2021/11/e32662 %U https://doi.org/10.2196/32662 %U http://www.ncbi.nlm.nih.gov/pubmed/34787584 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 11 %P e31337 %T Predicting COVID-19–Related Health Care Resource Utilization Across a Statewide Patient Population: Model Development Study %A Kasturi,Suranga N %A Park,Jeremy %A Wild,David %A Khan,Babar %A Haggstrom,David A %A Grannis,Shaun %+ Regenstrief Institute, 1101 W 10th St, Indianapolis, IN, 46202, United States, 1 (317) 274 9000, snkasthu@iu.edu %K COVID-19 %K machine learning %K population health %K health care utilization %K health disparities %K health information %K epidemiology %K public health %K digital health %K health data %K pandemic %K decision models %K health informatics %K healthcare resources %D 2021 %7 15.11.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: The COVID-19 pandemic has highlighted the inability of health systems to leverage existing system infrastructure in order to rapidly develop and apply broad analytical tools that could inform state- and national-level policymaking, as well as patient care delivery in hospital settings. The COVID-19 pandemic has also led to highlighted systemic disparities in health outcomes and access to care based on race or ethnicity, gender, income-level, and urban-rural divide. Although the United States seems to be recovering from the COVID-19 pandemic owing to widespread vaccination efforts and increased public awareness, there is an urgent need to address the aforementioned challenges. Objective: This study aims to inform the feasibility of leveraging broad, statewide datasets for population health–driven decision-making by developing robust analytical models that predict COVID-19–related health care resource utilization across patients served by Indiana’s statewide Health Information Exchange. Methods: We leveraged comprehensive datasets obtained from the Indiana Network for Patient Care to train decision forest-based models that can predict patient-level need of health care resource utilization. To assess these models for potential biases, we tested model performance against subpopulations stratified by age, race or ethnicity, gender, and residence (urban vs rural). Results: For model development, we identified a cohort of 96,026 patients from across 957 zip codes in Indiana, United States. We trained the decision models that predicted health care resource utilization by using approximately 100 of the most impactful features from a total of 1172 features created. Each model and stratified subpopulation under test reported precision scores >70%, accuracy and area under the receiver operating curve scores >80%, and sensitivity scores approximately >90%. We noted statistically significant variations in model performance across stratified subpopulations identified by age, race or ethnicity, gender, and residence (urban vs rural). Conclusions: This study presents the possibility of developing decision models capable of predicting patient-level health care resource utilization across a broad, statewide region with considerable predictive performance. However, our models present statistically significant variations in performance across stratified subpopulations of interest. Further efforts are necessary to identify root causes of these biases and to rectify them. %M 34581671 %R 10.2196/31337 %U https://www.jmir.org/2021/11/e31337 %U https://doi.org/10.2196/31337 %U http://www.ncbi.nlm.nih.gov/pubmed/34581671 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 5 %N 11 %P e30313 %T Development of a Severity Score and Comparison With Validated Measures for Depression and Anxiety: Validation Study %A Lynch,William %A Platt,Michael L %A Pardes,Adam %+ NeuroFlow, Inc, 111 S Independence Mall E, Suite 701, Philadelphia, PA, United States, 1 267 671 7316, adam@neuroflow.com %K PHQ-9 %K GAD-7 %K depression assessment %K anxiety assessment %K measurement-based care %K integrated behavioral health %D 2021 %7 10.11.2021 %9 Original Paper %J JMIR Form Res %G English %X Background: Less than 10% of the individuals seeking behavioral health care receive measurement-based care (MBC). Technology has the potential to implement MBC in a secure and efficient manner. To test this idea, a mobile health (mHealth) platform was developed with the goal of making MBC easier to deliver by clinicians and more accessible to patients within integrated behavioral health care. Data from over 3000 users of the mHealth platform were used to develop an output severity score, a robust screening measure for depression and anxiety. Objective: The aim of this study is to compare severity scores with scores from validated assessments for depression and anxiety and scores from clinician review to evaluate the potential added value of this new measure. Methods: The severity score uses patient-reported and passively collected data related to behavioral health on an mHealth platform. An artificial intelligence–derived algorithm was developed that condenses behavioral health data into a single, quantifiable measure for longitudinal tracking of an individual’s depression and anxiety symptoms. Linear regression and Bland-Altman analyses were used to evaluate the relationships and differences between severity scores and Personal Health Questionnaire-9 (PHQ-9) or Generalized Anxiety Disorder-7 (GAD-7) scores from over 35,000 mHealth platform users. The severity score was also compared with a review by a panel of expert clinicians for a subset of 250 individuals. Results: Linear regression results showed a strong correlation between the severity score and PHQ-9 (r=0.74; P<.001) and GAD-7 (r=0.80; P<.001) changes. A strong positive correlation was also found between the severity score and expert panel clinical review (r=0.80-0.84; P<.001). However, Bland-Altman analysis and the evaluation of outliers on regression analysis showed that the severity score was significantly different from the PHQ-9. Conclusions: Clinicians can reliably use the mHealth severity score as a proxy measure for screening and monitoring behavioral health symptoms longitudinally. The severity score may identify at-risk individuals who are not identified by the PHQ-9. Further research is warranted to evaluate the sensitivity and specificity of the severity score. %M 34757319 %R 10.2196/30313 %U https://formative.jmir.org/2021/11/e30313 %U https://doi.org/10.2196/30313 %U http://www.ncbi.nlm.nih.gov/pubmed/34757319 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 11 %P e28946 %T Using Artificial Intelligence With Natural Language Processing to Combine Electronic Health Record’s Structured and Free Text Data to Identify Nonvalvular Atrial Fibrillation to Decrease Strokes and Death: Evaluation and Case-Control Study %A Elkin,Peter L %A Mullin,Sarah %A Mardekian,Jack %A Crowner,Christopher %A Sakilay,Sylvester %A Sinha,Shyamashree %A Brady,Gary %A Wright,Marcia %A Nolen,Kimberly %A Trainer,JoAnn %A Koppel,Ross %A Schlegel,Daniel %A Kaushik,Sashank %A Zhao,Jane %A Song,Buer %A Anand,Edwin %+ Department of Biomedical Informatics, University at Buffalo, 77 Goodell St, Suite 5t40, Buffalo, NY, 14203, United States, 1 5073581341, elkinp@buffalo.edu %K afib %K atrial fibrillation %K artificial intelligence %K NVAF %K natural language processing %K stroke risk %K bleed risk %K CHA2DS2-VASc %K HAS-BLED %K bio-surveillance %D 2021 %7 9.11.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Nonvalvular atrial fibrillation (NVAF) affects almost 6 million Americans and is a major contributor to stroke but is significantly undiagnosed and undertreated despite explicit guidelines for oral anticoagulation. Objective: The aim of this study is to investigate whether the use of semisupervised natural language processing (NLP) of electronic health record’s (EHR) free-text information combined with structured EHR data improves NVAF discovery and treatment and perhaps offers a method to prevent thousands of deaths and save billions of dollars. Methods: We abstracted 96,681 participants from the University of Buffalo faculty practice’s EHR. NLP was used to index the notes and compare the ability to identify NVAF, congestive heart failure, hypertension, age ≥75 years, diabetes mellitus, stroke or transient ischemic attack, vascular disease, age 65 to 74 years, sex category (CHA2DS2-VASc), and Hypertension, Abnormal liver/renal function, Stroke history, Bleeding history or predisposition, Labile INR, Elderly, Drug/alcohol usage (HAS-BLED) scores using unstructured data (International Classification of Diseases codes) versus structured and unstructured data from clinical notes. In addition, we analyzed data from 63,296,120 participants in the Optum and Truven databases to determine the NVAF frequency, rates of CHA2DS2‑VASc ≥2, and no contraindications to oral anticoagulants, rates of stroke and death in the untreated population, and first year’s costs after stroke. Results: The structured-plus-unstructured method would have identified 3,976,056 additional true NVAF cases (P<.001) and improved sensitivity for CHA2DS2-VASc and HAS-BLED scores compared with the structured data alone (P=.002 and P<.001, respectively), causing a 32.1% improvement. For the United States, this method would prevent an estimated 176,537 strokes, save 10,575 lives, and save >US $13.5 billion. Conclusions: Artificial intelligence–informed bio-surveillance combining NLP of free-text information with structured EHR data improves data completeness, prevents thousands of strokes, and saves lives and funds. This method is applicable to many disorders with profound public health consequences. %M 34751659 %R 10.2196/28946 %U https://www.jmir.org/2021/11/e28946 %U https://doi.org/10.2196/28946 %U http://www.ncbi.nlm.nih.gov/pubmed/34751659 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 5 %N 11 %P e33335 %T A Virtual Community for Disability Advocacy: Development of a Searchable Artificial Intelligence–Supported Platform %A El Morr,Christo %A Maret,Pierre %A Muhlenbach,Fabrice %A Dharmalingam,Dhayananth %A Tadesse,Rediet %A Creighton,Alexandra %A Kundi,Bushra %A Buettgen,Alexis %A Mgwigwi,Thumeka %A Dinca-Panaitescu,Serban %A Dua,Enakshi %A Gorman,Rachel %+ School of Health Policy and Management, Faculty of Health, York University, Stong College Room 306, 4700 Keele St, Toronto, ON, M3J 1P3, Canada, 1 4167362100, elmorr@yorku.ca %K virtual community %K machine learning %K Semantic Web %K natural language processing %K web intelligence %K health informatics %K Wikibase %K disability rights %K human rights %K CRPD %K equity %K community %K disability %K ethics %K rights %K pilot %K platform %D 2021 %7 5.11.2021 %9 Original Paper %J JMIR Form Res %G English %X Background: The lack of availability of disability data has been identified as a major challenge hindering continuous disability equity monitoring. It is important to develop a platform that enables searching for disability data to expose systemic discrimination and social exclusion, which increase vulnerability to inequitable social conditions. Objective: Our project aims to create an accessible and multilingual pilot disability website that structures and integrates data about people with disabilities and provides data for national and international disability advocacy communities. The platform will be endowed with a document upload function with hybrid (automated and manual) paragraph tagging, while the querying function will involve an intelligent natural language search in the supported languages. Methods: We have designed and implemented a virtual community platform using Wikibase, Semantic Web, machine learning, and web programming tools to enable disability communities to upload and search for disability documents. The platform data model is based on an ontology we have designed following the United Nations Convention on the Rights of Persons with Disabilities (CRPD). The virtual community facilitates the uploading and sharing of validated information, and supports disability rights advocacy by enabling dissemination of knowledge. Results: Using health informatics and artificial intelligence techniques (namely Semantic Web, machine learning, and natural language processing techniques), we were able to develop a pilot virtual community that supports disability rights advocacy by facilitating uploading, sharing, and accessing disability data. The system consists of a website on top of a Wikibase (a Semantic Web–based datastore). The virtual community accepts 4 types of users: information producers, information consumers, validators, and administrators. The virtual community enables the uploading of documents, semiautomatic tagging of their paragraphs with meaningful keywords, and validation of the process before uploading the data to the disability Wikibase. Once uploaded, public users (information consumers) can perform a semantic search using an intelligent and multilingual search engine (QAnswer). Further enhancements of the platform are planned. Conclusions: The platform ontology is flexible and can accommodate advocacy reports and disability policy and legislation from specific jurisdictions, which can be accessed in relation to the CRPD articles. The platform ontology can be expanded to fit international contexts. The virtual community supports information upload and search. Semiautomatic tagging and intelligent multilingual semantic search using natural language are enabled using artificial intelligence techniques, namely Semantic Web, machine learning, and natural language processing. %M 34738910 %R 10.2196/33335 %U https://formative.jmir.org/2021/11/e33335 %U https://doi.org/10.2196/33335 %U http://www.ncbi.nlm.nih.gov/pubmed/34738910 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 11 %P e25745 %T Artificial Intelligence in Rehabilitation Targeting the Participation of Children and Youth With Disabilities: Scoping Review %A Kaelin,Vera C %A Valizadeh,Mina %A Salgado,Zurisadai %A Parde,Natalie %A Khetani,Mary A %+ Occupational Therapy, College of Applied Health Sciences, University of Illinois at Chicago, 1919 West Taylor Street, Room 316A, Chicago, IL, 60612-7250, United States, 1 312 996 0942, mkhetani@uic.edu %K health care %K pediatric rehabilitation %K technology %K young persons %K robotics %K human-machine interaction %K personalization %K customization %K goal-setting %K natural language processing %K machine learning %D 2021 %7 4.11.2021 %9 Review %J J Med Internet Res %G English %X Background: In the last decade, there has been a rapid increase in research on the use of artificial intelligence (AI) to improve child and youth participation in daily life activities, which is a key rehabilitation outcome. However, existing reviews place variable focus on participation, are narrow in scope, and are restricted to select diagnoses, hindering interpretability regarding the existing scope of AI applications that target the participation of children and youth in a pediatric rehabilitation setting. Objective: The aim of this scoping review is to examine how AI is integrated into pediatric rehabilitation interventions targeting the participation of children and youth with disabilities or other diagnosed health conditions in valued activities. Methods: We conducted a comprehensive literature search using established Applied Health Sciences and Computer Science databases. Two independent researchers screened and selected the studies based on a systematic procedure. Inclusion criteria were as follows: participation was an explicit study aim or outcome or the targeted focus of the AI application; AI was applied as part of the provided and tested intervention; children or youth with a disability or other diagnosed health conditions were the focus of either the study or AI application or both; and the study was published in English. Data were mapped according to the types of AI, the mode of delivery, the type of personalization, and whether the intervention addressed individual goal-setting. Results: The literature search identified 3029 documents, of which 94 met the inclusion criteria. Most of the included studies used multiple applications of AI with the highest prevalence of robotics (72/94, 77%) and human-machine interaction (51/94, 54%). Regarding mode of delivery, most of the included studies described an intervention delivered in-person (84/94, 89%), and only 11% (10/94) were delivered remotely. Most interventions were tailored to groups of individuals (93/94, 99%). Only 1% (1/94) of interventions was tailored to patients’ individually reported participation needs, and only one intervention (1/94, 1%) described individual goal-setting as part of their therapy process or intervention planning. Conclusions: There is an increasing amount of research on interventions using AI to target the participation of children and youth with disabilities or other diagnosed health conditions, supporting the potential of using AI in pediatric rehabilitation. On the basis of our results, 3 major gaps for further research and development were identified: a lack of remotely delivered participation-focused interventions using AI; a lack of individual goal-setting integrated in interventions; and a lack of interventions tailored to individually reported participation needs of children, youth, or families. %M 34734833 %R 10.2196/25745 %U https://www.jmir.org/2021/11/e25745 %U https://doi.org/10.2196/25745 %U http://www.ncbi.nlm.nih.gov/pubmed/34734833 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 11 %P e29386 %T The Impact of Explanations on Layperson Trust in Artificial Intelligence–Driven Symptom Checker Apps: Experimental Study %A Woodcock,Claire %A Mittelstadt,Brent %A Busbridge,Dan %A Blank,Grant %+ Oxford Internet Institute, University of Oxford, 1 St Giles, Oxford, OX1 3JS, United Kingdom, 44 1865 287210, cwoodcock.academic@gmail.com %K symptom checker %K chatbot %K artificial intelligence %K explanations %K trust %K knowledge %K clinical communication %K mHealth %K digital health %K eHealth %K conversational agent %K virtual health care %K symptoms %K diagnostics %K mobile phone %D 2021 %7 3.11.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI)–driven symptom checkers are available to millions of users globally and are advocated as a tool to deliver health care more efficiently. To achieve the promoted benefits of a symptom checker, laypeople must trust and subsequently follow its instructions. In AI, explanations are seen as a tool to communicate the rationale behind black-box decisions to encourage trust and adoption. However, the effectiveness of the types of explanations used in AI-driven symptom checkers has not yet been studied. Explanations can follow many forms, including why-explanations and how-explanations. Social theories suggest that why-explanations are better at communicating knowledge and cultivating trust among laypeople. Objective: The aim of this study is to ascertain whether explanations provided by a symptom checker affect explanatory trust among laypeople and whether this trust is impacted by their existing knowledge of disease. Methods: A cross-sectional survey of 750 healthy participants was conducted. The participants were shown a video of a chatbot simulation that resulted in the diagnosis of either a migraine or temporal arteritis, chosen for their differing levels of epidemiological prevalence. These diagnoses were accompanied by one of four types of explanations. Each explanation type was selected either because of its current use in symptom checkers or because it was informed by theories of contrastive explanation. Exploratory factor analysis of participants’ responses followed by comparison-of-means tests were used to evaluate group differences in trust. Results: Depending on the treatment group, two or three variables were generated, reflecting the prior knowledge and subsequent mental model that the participants held. When varying explanation type by disease, migraine was found to be nonsignificant (P=.65) and temporal arteritis, marginally significant (P=.09). Varying disease by explanation type resulted in statistical significance for input influence (P=.001), social proof (P=.049), and no explanation (P=.006), with counterfactual explanation (P=.053). The results suggest that trust in explanations is significantly affected by the disease being explained. When laypeople have existing knowledge of a disease, explanations have little impact on trust. Where the need for information is greater, different explanation types engender significantly different levels of trust. These results indicate that to be successful, symptom checkers need to tailor explanations to each user’s specific question and discount the diseases that they may also be aware of. Conclusions: System builders developing explanations for symptom-checking apps should consider the recipient’s knowledge of a disease and tailor explanations to each user’s specific need. Effort should be placed on generating explanations that are personalized to each user of a symptom checker to fully discount the diseases that they may be aware of and to close their information gap. %M 34730544 %R 10.2196/29386 %U https://www.jmir.org/2021/11/e29386 %U https://doi.org/10.2196/29386 %U http://www.ncbi.nlm.nih.gov/pubmed/34730544 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 11 %P e26777 %T Natural Language Processing and Machine Learning Methods to Characterize Unstructured Patient-Reported Outcomes: Validation Study %A Lu,Zhaohua %A Sim,Jin-ah %A Wang,Jade X %A Forrest,Christopher B %A Krull,Kevin R %A Srivastava,Deokumar %A Hudson,Melissa M %A Robison,Leslie L %A Baker,Justin N %A Huang,I-Chan %+ Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, MS 735, 262 Danny Thomas Pl, Memphis, TN, 38105, United States, 1 9015958369, I-Chan.Huang@STJUDE.ORG %K natural language processing %K machine learning %K PROs %K pediatric oncology %D 2021 %7 3.11.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Assessing patient-reported outcomes (PROs) through interviews or conversations during clinical encounters provides insightful information about survivorship. Objective: This study aims to test the validity of natural language processing (NLP) and machine learning (ML) algorithms in identifying different attributes of pain interference and fatigue symptoms experienced by child and adolescent survivors of cancer versus the judgment by PRO content experts as the gold standard to validate NLP/ML algorithms. Methods: This cross-sectional study focused on child and adolescent survivors of cancer, aged 8 to 17 years, and caregivers, from whom 391 meaning units in the pain interference domain and 423 in the fatigue domain were generated for analyses. Data were collected from the After Completion of Therapy Clinic at St. Jude Children’s Research Hospital. Experienced pain interference and fatigue symptoms were reported through in-depth interviews. After verbatim transcription, analyzable sentences (ie, meaning units) were semantically labeled by 2 content experts for each attribute (physical, cognitive, social, or unclassified). Two NLP/ML methods were used to extract and validate the semantic features: bidirectional encoder representations from transformers (BERT) and Word2vec plus one of the ML methods, the support vector machine or extreme gradient boosting. Receiver operating characteristic and precision-recall curves were used to evaluate the accuracy and validity of the NLP/ML methods. Results: Compared with Word2vec/support vector machine and Word2vec/extreme gradient boosting, BERT demonstrated higher accuracy in both symptom domains, with 0.931 (95% CI 0.905-0.957) and 0.916 (95% CI 0.887-0.941) for problems with cognitive and social attributes on pain interference, respectively, and 0.929 (95% CI 0.903-0.953) and 0.917 (95% CI 0.891-0.943) for problems with cognitive and social attributes on fatigue, respectively. In addition, BERT yielded superior areas under the receiver operating characteristic curve for cognitive attributes on pain interference and fatigue domains (0.923, 95% CI 0.879-0.997; 0.948, 95% CI 0.922-0.979) and superior areas under the precision-recall curve for cognitive attributes on pain interference and fatigue domains (0.818, 95% CI 0.735-0.917; 0.855, 95% CI 0.791-0.930). Conclusions: The BERT method performed better than the other methods. As an alternative to using standard PRO surveys, collecting unstructured PROs via interviews or conversations during clinical encounters and applying NLP/ML methods can facilitate PRO assessment in child and adolescent cancer survivors. %M 34730546 %R 10.2196/26777 %U https://www.jmir.org/2021/11/e26777 %U https://doi.org/10.2196/26777 %U http://www.ncbi.nlm.nih.gov/pubmed/34730546 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 11 %P e28999 %T Exploratory Data Mining Techniques (Decision Tree Models) for Examining the Impact of Internet-Based Cognitive Behavioral Therapy for Tinnitus: Machine Learning Approach %A Rodrigo,Hansapani %A Beukes,Eldré W %A Andersson,Gerhard %A Manchaiah,Vinaya %+ School of Mathematical and Statistical Sciences, University of Texas Rio Grande Valley, 1201 W University Drive, Edinburgh, TX, 78539, United States, 1 9566652313, hansapani.rodrigo@utrgv.edu %K tinnitus %K internet interventions %K digital therapeutics %K cognitive behavioral therapy %K artificial intelligence %K machine learning %K data mining %K decision tree %K random forest %D 2021 %7 2.11.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: There is huge variability in the way that individuals with tinnitus respond to interventions. These experiential variations, together with a range of associated etiologies, contribute to tinnitus being a highly heterogeneous condition. Despite this heterogeneity, a “one size fits all” approach is taken when making management recommendations. Although there are various management approaches, not all are equally effective. Psychological approaches such as cognitive behavioral therapy have the most evidence base. Managing tinnitus is challenging due to the significant variations in tinnitus experiences and treatment successes. Tailored interventions based on individual tinnitus profiles may improve outcomes. Predictive models of treatment success are, however, lacking. Objective: This study aimed to use exploratory data mining techniques (ie, decision tree models) to identify the variables associated with the treatment success of internet-based cognitive behavioral therapy (ICBT) for tinnitus. Methods: Individuals (N=228) who underwent ICBT in 3 separate clinical trials were included in this analysis. The primary outcome variable was a reduction of 13 points in tinnitus severity, which was measured by using the Tinnitus Functional Index following the intervention. The predictor variables included demographic characteristics, tinnitus and hearing-related variables, and clinical factors (ie, anxiety, depression, insomnia, hyperacusis, hearing disability, cognitive function, and life satisfaction). Analyses were undertaken by using various exploratory machine learning algorithms to identify the most influencing variables. In total, 6 decision tree models were implemented, namely the classification and regression tree (CART), C5.0, GB, XGBoost, AdaBoost algorithm and random forest models. The Shapley additive explanations framework was applied to the two optimal decision tree models to determine relative predictor importance. Results: Among the six decision tree models, the CART (accuracy: mean 70.7%, SD 2.4%; sensitivity: mean 74%, SD 5.5%; specificity: mean 64%, SD 3.7%; area under the receiver operating characteristic curve [AUC]: mean 0.69, SD 0.001) and gradient boosting (accuracy: mean 71.8%, SD 1.5%; sensitivity: mean 78.3%, SD 2.8%; specificity: 58.7%, SD 4.2%; AUC: mean 0.68, SD 0.02) models were found to be the best predictive models. Although the other models had acceptable accuracy (range 56.3%-66.7%) and sensitivity (range 68.6%-77.9%), they all had relatively weak specificity (range 31.1%-50%) and AUCs (range 0.52-0.62). A higher education level was the most influencing factor for ICBT outcomes. The CART decision tree model identified 3 participant groups who had at least an 85% success probability following the undertaking of ICBT. Conclusions: Decision tree models, especially the CART and gradient boosting models, appeared to be promising in predicting ICBT outcomes. Their predictive power may be improved by using larger sample sizes and including a wider range of predictive factors in future studies. %M 34726612 %R 10.2196/28999 %U https://www.jmir.org/2021/11/e28999 %U https://doi.org/10.2196/28999 %U http://www.ncbi.nlm.nih.gov/pubmed/34726612 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 11 %P e19846 %T Blockchain Integration With Digital Technology and the Future of Health Care Ecosystems: Systematic Review %A Fatoum,Hanaa %A Hanna,Sam %A Halamka,John D %A Sicker,Douglas C %A Spangenberg,Peter %A Hashmi,Shahrukh K %+ College of Medicine, Alfaisal University, Takhassusi, Riyadh, 11533, Saudi Arabia, 966 000000, Hanaa.Fatoum@outlook.com %K blockchain, Internet of Things %K digital %K artificial intelligence %K machine learning %K eHealth %K ledger %K distributed ledger technology %D 2021 %7 2.11.2021 %9 Review %J J Med Internet Res %G English %X Background: In the era of big data, artificial intelligence (AI), and the Internet of Things (IoT), digital data have become essential for our everyday functioning and in health care services. The sensitive nature of health care data presents several crucial issues such as privacy, security, interoperability, and reliability that must be addressed in any health care data management system. However, most of the current health care systems are still facing major obstacles and are lacking in some of these areas. This is where decentralized, secure, and scalable databases, most notably blockchains, play critical roles in addressing these requirements without compromising security, thereby attracting considerable interest within the health care community. A blockchain can be maintained and widely distributed using a large network of nodes, mostly computers, each of which stores a full replica of the data. A blockchain protocol is a set of predefined rules or procedures that govern how the nodes interact with the network, view, verify, and add data to the ledger. Objective: In this article, we aim to explore blockchain technology, its framework, current applications, and integration with other innovations, as well as opportunities in diverse areas of health care and clinical research, in addition to clarifying its future impact on the health care ecosystem. We also elucidate 2 case studies to instantiate the potential role of blockchains in health care. Methods: To identify related existing work, terms based on Medical Subject Headings were used. We included studies focusing mainly on health care and clinical research and developed a functional framework for implementation and testing with data. The literature sources for this systematic review were PubMed, Medline, and the Cochrane library, in addition to a preliminary search of IEEE Xplore. Results: The included studies demonstrated multiple framework designs and various implementations in health care including chronic disease diagnosis, management, monitoring, and evaluation. We found that blockchains exhibit many promising applications in clinical trial management such as smart-contract application, participant-controlled data access, trustless protocols, and data validity. Electronic health records (EHRs), patient-centered interoperability, remote patient monitoring, and clinical trial data management were found to be major areas for blockchain usage, which can become a key catalyst for health care innovations. Conclusions: The potential benefits of blockchains are limitless; however, concrete data on long-term clinical outcomes based on blockchains powered and supplemented by AI and IoT are yet to be obtained. Nonetheless, implementing blockchains as a novel way to integrate EHRs nationwide and manage common clinical problems in an algorithmic fashion has the potential for improving patient outcomes, health care experiences, as well as the overall health and well-being of individuals. %M 34726603 %R 10.2196/19846 %U https://www.jmir.org/2021/11/e19846 %U https://doi.org/10.2196/19846 %U http://www.ncbi.nlm.nih.gov/pubmed/34726603 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 11 %P e32726 %T Optimal Triage for COVID-19 Patients Under Limited Health Care Resources With a Parsimonious Machine Learning Prediction Model and Threshold Optimization Using Discrete-Event Simulation: Development Study %A Kim,Jeongmin %A Lim,Hakyung %A Ahn,Jae-Hyeon %A Lee,Kyoung Hwa %A Lee,Kwang Suk %A Koo,Kyo Chul %+ Department of Urology, Yonsei University College of Medicine, 211 Eonju-ro, Gangnam-gu, Seoul, 135-720, Republic of Korea, 82 01099480342, gckoo@yuhs.ac %K COVID-19 %K decision support techniques %K machine learning %K prediction %K triage %D 2021 %7 2.11.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: The COVID-19 pandemic has placed an unprecedented burden on health care systems. Objective: We aimed to effectively triage COVID-19 patients within situations of limited data availability and explore optimal thresholds to minimize mortality rates while maintaining health care system capacity. Methods: A nationwide sample of 5601 patients confirmed with COVID-19 until April 2020 was retrospectively reviewed. Extreme gradient boosting (XGBoost) and logistic regression analysis were used to develop prediction models for the maximum clinical severity during hospitalization, classified according to the World Health Organization Ordinal Scale for Clinical Improvement (OSCI). The recursive feature elimination technique was used to evaluate the maintenance of model performance when clinical and laboratory variables were eliminated. Using populations based on hypothetical patient influx scenarios, discrete-event simulation was performed to find an optimal threshold within limited resource environments that minimizes mortality rates. Results: The cross-validated area under the receiver operating characteristic curve (AUROC) of the baseline XGBoost model that utilized all 37 variables was 0.965 for OSCI ≥6. Compared to the baseline model’s performance, the AUROC of the feature-eliminated model that utilized 17 variables was maintained at 0.963 with statistical insignificance. Optimal thresholds were found to minimize mortality rates in a hypothetical patient influx scenario. The benefit of utilizing an optimal triage threshold was clear, reducing mortality up to 18.1%, compared with the conventional Youden index. Conclusions: Our adaptive triage model and its threshold optimization capability revealed that COVID-19 management can be achieved via the cooperation of both the medical and health care management sectors for maximum treatment efficacy. The model is available online for clinical implementation. %M 34609319 %R 10.2196/32726 %U https://medinform.jmir.org/2021/11/e32726 %U https://doi.org/10.2196/32726 %U http://www.ncbi.nlm.nih.gov/pubmed/34609319 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 11 %P e27875 %T Prediction of Smoking Risk From Repeated Sampling of Environmental Images: Model Validation %A Engelhard,Matthew M %A D'Arcy,Joshua %A Oliver,Jason A %A Kozink,Rachel %A McClernon,F Joseph %+ Department of Biostatistics & Bioinformatics, Duke University School of Medicine, 2608 Erwin Rd, Durham, NC, 27705, United States, 1 919 613 3665, m.engelhard@duke.edu %K smoking %K smoking cessation %K machine learning %K computer vision %K digital health %K eHealth %K behavior %K CNN %K neural network %K artificial intelligence %K AI %K images %K environment %K ecological momentary assessment %K mobile health %K mHealth %K mobile phone %D 2021 %7 1.11.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Viewing their habitual smoking environments increases smokers’ craving and smoking behaviors in laboratory settings. A deep learning approach can differentiate between habitual smoking versus nonsmoking environments, suggesting that it may be possible to predict environment-associated smoking risk from continuously acquired images of smokers’ daily environments. Objective: In this study, we aim to predict environment-associated risk from continuously acquired images of smokers’ daily environments. We also aim to understand how model performance varies by location type, as reported by participants. Methods: Smokers from Durham, North Carolina and surrounding areas completed ecological momentary assessments both immediately after smoking and at randomly selected times throughout the day for 2 weeks. At each assessment, participants took a picture of their current environment and completed a questionnaire on smoking, craving, and the environmental setting. A convolutional neural network–based model was trained to predict smoking, craving, whether smoking was permitted in the current environment and whether the participant was outside based on images of participants’ daily environments, the time since their last cigarette, and baseline data on daily smoking habits. Prediction performance, quantified using the area under the receiver operating characteristic curve (AUC) and average precision (AP), was assessed for out-of-sample prediction as well as personalized models trained on images from days 1 to 10. The models were optimized for mobile devices and implemented as a smartphone app. Results: A total of 48 participants completed the study, and 8008 images were acquired. The personalized models were highly effective in predicting smoking risk (AUC=0.827; AP=0.882), craving (AUC=0.837; AP=0.798), whether smoking was permitted in the current environment (AUC=0.932; AP=0.981), and whether the participant was outside (AUC=0.977; AP=0.956). The out-of-sample models were also effective in predicting smoking risk (AUC=0.723; AP=0.785), whether smoking was permitted in the current environment (AUC=0.815; AP=0.937), and whether the participant was outside (AUC=0.949; AP=0.922); however, they were not effective in predicting craving (AUC=0.522; AP=0.427). Omitting image features reduced AUC by over 0.1 when predicting all outcomes except craving. Prediction of smoking was more effective for participants whose self-reported location type was more variable (Spearman ρ=0.48; P=.001). Conclusions: Images of daily environments can be used to effectively predict smoking risk. Model personalization, achieved by incorporating information about daily smoking habits and training on participant-specific images, further improves prediction performance. Environment-associated smoking risk can be assessed in real time on a mobile device and can be incorporated into device-based smoking cessation interventions. %M 34723819 %R 10.2196/27875 %U https://www.jmir.org/2021/11/e27875 %U https://doi.org/10.2196/27875 %U http://www.ncbi.nlm.nih.gov/pubmed/34723819 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 10 %P e25460 %T Improved Environment-Aware–Based Noise Reduction System for Cochlear Implant Users Based on a Knowledge Transfer Approach: Development and Usability Study %A Li,Lieber Po-Hung %A Han,Ji-Yan %A Zheng,Wei-Zhong %A Huang,Ren-Jie %A Lai,Ying-Hui %+ Department of Biomedical Engineering, National Yang Ming Chiao Tung University, No 155, Sec 2, Linong Street, Taipei, 112, Taiwan, 886 228267021, yh.lai@nycu.edu.tw %K cochlear implants %K noise reduction %K deep learning %K noise classification %K hearing %K deaf %K sound %K audio %K cochlear %D 2021 %7 28.10.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Cochlear implant technology is a well-known approach to help deaf individuals hear speech again and can improve speech intelligibility in quiet conditions; however, it still has room for improvement in noisy conditions. More recently, it has been proven that deep learning–based noise reduction, such as noise classification and deep denoising autoencoder (NC+DDAE), can benefit the intelligibility performance of patients with cochlear implants compared to classical noise reduction algorithms. Objective: Following the successful implementation of the NC+DDAE model in our previous study, this study aimed to propose an advanced noise reduction system using knowledge transfer technology, called NC+DDAE_T; examine the proposed NC+DDAE_T noise reduction system using objective evaluations and subjective listening tests; and investigate which layer substitution of the knowledge transfer technology in the NC+DDAE_T noise reduction system provides the best outcome. Methods: The knowledge transfer technology was adopted to reduce the number of parameters of the NC+DDAE_T compared with the NC+DDAE. We investigated which layer should be substituted using short-time objective intelligibility and perceptual evaluation of speech quality scores as well as t-distributed stochastic neighbor embedding to visualize the features in each model layer. Moreover, we enrolled 10 cochlear implant users for listening tests to evaluate the benefits of the newly developed NC+DDAE_T. Results: The experimental results showed that substituting the middle layer (ie, the second layer in this study) of the noise-independent DDAE (NI-DDAE) model achieved the best performance gain regarding short-time objective intelligibility and perceptual evaluation of speech quality scores. Therefore, the parameters of layer 3 in the NI-DDAE were chosen to be replaced, thereby establishing the NC+DDAE_T. Both objective and listening test results showed that the proposed NC+DDAE_T noise reduction system achieved similar performances compared with the previous NC+DDAE in several noisy test conditions. However, the proposed NC+DDAE_T only required a quarter of the number of parameters compared to the NC+DDAE. Conclusions: This study demonstrated that knowledge transfer technology can help reduce the number of parameters in an NC+DDAE while keeping similar performance rates. This suggests that the proposed NC+DDAE_T model may reduce the implementation costs of this noise reduction system and provide more benefits for cochlear implant users. %M 34709193 %R 10.2196/25460 %U https://www.jmir.org/2021/10/e25460 %U https://doi.org/10.2196/25460 %U http://www.ncbi.nlm.nih.gov/pubmed/34709193 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 7 %N 4 %P e19812 %T Predicting Hepatocellular Carcinoma With Minimal Features From Electronic Health Records: Development of a Deep Learning Model %A Liang,Chia-Wei %A Yang,Hsuan-Chia %A Islam,Md Mohaimenul %A Nguyen,Phung Anh Alex %A Feng,Yi-Ting %A Hou,Ze Yu %A Huang,Chih-Wei %A Poly,Tahmina Nasrin %A Li,Yu-Chuan Jack %+ Taipei Medical University, 15 F, No. 172-1, Sec 2, Kellung Rd, Da'an district, Taipei, 106, Taiwan, 886 096 654 6813, jaak88@gmail.com %K hepatocellular carcinoma %K deep learning %K risk prediction %K convolution neural network %K deep learning model %K hepatoma %D 2021 %7 28.10.2021 %9 Original Paper %J JMIR Cancer %G English %X Background: Hepatocellular carcinoma (HCC), usually known as hepatoma, is the third leading cause of cancer mortality globally. Early detection of HCC helps in its treatment and increases survival rates. Objective: The aim of this study is to develop a deep learning model, using the trend and severity of each medical event from the electronic health record to accurately predict the patients who will be diagnosed with HCC in 1 year. Methods: Patients with HCC were screened out from the National Health Insurance Research Database of Taiwan between 1999 and 2013. To be included, the patients with HCC had to register as patients with cancer in the catastrophic illness file and had to be diagnosed as a patient with HCC in an inpatient admission. The control cases (non-HCC patients) were randomly sampled from the same database. We used age, gender, diagnosis code, drug code, and time information as the input variables of a convolution neural network model to predict those patients with HCC. We also inspected the highly weighted variables in the model and compared them to their odds ratio at HCC to understand how the predictive model works Results: We included 47,945 individuals, 9553 of whom were patients with HCC. The area under the receiver operating curve (AUROC) of the model for predicting HCC risk 1 year in advance was 0.94 (95% CI 0.937-0.943), with a sensitivity of 0.869 and a specificity 0.865. The AUROC for predicting HCC patients 7 days, 6 months, 1 year, 2 years, and 3 years early were 0.96, 0.94, 0.94, 0.91, and 0.91, respectively. Conclusions: The findings of this study show that the convolutional neural network model has immense potential to predict the risk of HCC 1 year in advance with minimal features available in the electronic health records. %M 34709180 %R 10.2196/19812 %U https://cancer.jmir.org/2021/4/e19812 %U https://doi.org/10.2196/19812 %U http://www.ncbi.nlm.nih.gov/pubmed/34709180 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 7 %N 4 %P e31616 %T Computer-Based Decision Tools for Shared Therapeutic Decision-making in Oncology: Systematic Review %A Yung,Alan %A Kay,Judy %A Beale,Philip %A Gibson,Kathryn A %A Shaw,Tim %+ Research in Implementation Science and eHealth, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2006, Australia, 61 433697881, ayun4081@uni.sydney.edu.au %K oncology %K cancer %K computer-based %K decision support %K decision-making %K system %K tool %K machine learning %K artificial intelligence %K uncertainty %K shared decision-making %D 2021 %7 26.10.2021 %9 Review %J JMIR Cancer %G English %X Background: Therapeutic decision-making in oncology is a complex process because physicians must consider many forms of medical data and protocols. Another challenge for physicians is to clearly communicate their decision-making process to patients to ensure informed consent. Computer-based decision tools have the potential to play a valuable role in supporting this process. Objective: This systematic review aims to investigate the extent to which computer-based decision tools have been successfully adopted in oncology consultations to improve patient-physician joint therapeutic decision-making. Methods: This review was conducted in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 checklist and guidelines. A literature search was conducted on February 4, 2021, across the Cochrane Database of Systematic Reviews (from 2005 to January 28, 2021), the Cochrane Central Register of Controlled Trials (December 2020), MEDLINE (from 1946 to February 4, 2021), Embase (from 1947 to February 4, 2021), Web of Science (from 1900 to 2021), Scopus (from 1969 to 2021), and PubMed (from 1991 to 2021). We used a snowball approach to identify additional studies by searching the reference lists of the studies included for full-text review. Additional supplementary searches of relevant journals and gray literature websites were conducted. The reviewers screened the articles eligible for review for quality and inclusion before data extraction. Results: There are relatively few studies looking at the use of computer-based decision tools in oncology consultations. Of the 4431 unique articles obtained from the searches, only 10 (0.22%) satisfied the selection criteria. From the 10 selected studies, 8 computer-based decision tools were identified. Of the 10 studies, 6 (60%) were conducted in the United States. Communication and information-sharing were improved between physicians and patients. However, physicians did not change their habits to take advantage of computer-assisted decision-making tools or the information they provide. On average, the use of these computer-based decision tools added approximately 5 minutes to the total length of consultations. In addition, some physicians felt that the technology increased patients’ anxiety. Conclusions: Of the 10 selected studies, 6 (60%) demonstrated positive outcomes, 1 (10%) showed negative results, and 3 (30%) were neutral. Adoption of computer-based decision tools during oncology consultations continues to be low. This review shows that information-sharing and communication between physicians and patients can be improved with the assistance of technology. However, the lack of integration with electronic health records is a barrier. This review provides key requirements for enhancing the chance of success of future computer-based decision tools. However, it does not show the effects of health care policies, regulations, or business administration on physicians’ propensity to adopt the technology. Nevertheless, it is important that future research address the influence of these higher-level factors as well. Trial Registration: PROSPERO International Prospective Register of Systematic Reviews CRD42021226087; https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42021226087 %M 34544680 %R 10.2196/31616 %U https://cancer.jmir.org/2021/4/e31616 %U https://doi.org/10.2196/31616 %U http://www.ncbi.nlm.nih.gov/pubmed/34544680 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 10 %P e30545 %T Algorithm Change Protocols in the Regulation of Adaptive Machine Learning–Based Medical Devices %A Gilbert,Stephen %A Fenech,Matthew %A Hirsch,Martin %A Upadhyay,Shubhanan %A Biasiucci,Andrea %A Starlinger,Johannes %+ Else Kröner-Fresenius Center for Digital Health, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Postfach 151, Fetscherstraße 74, Dresden, 01307, Germany, 49 17680396015, ra.ekfz@tu-dresden.de %K artificial intelligence %K machine learning %K regulation %K algorithm change protocol %K healthcare %K regulatory framework %K health care %D 2021 %7 26.10.2021 %9 Viewpoint %J J Med Internet Res %G English %X One of the greatest strengths of artificial intelligence (AI) and machine learning (ML) approaches in health care is that their performance can be continually improved based on updates from automated learning from data. However, health care ML models are currently essentially regulated under provisions that were developed for an earlier age of slowly updated medical devices—requiring major documentation reshape and revalidation with every major update of the model generated by the ML algorithm. This creates minor problems for models that will be retrained and updated only occasionally, but major problems for models that will learn from data in real time or near real time. Regulators have announced action plans for fundamental changes in regulatory approaches. In this Viewpoint, we examine the current regulatory frameworks and developments in this domain. The status quo and recent developments are reviewed, and we argue that these innovative approaches to health care need matching innovative approaches to regulation and that these approaches will bring benefits for patients. International perspectives from the World Health Organization, and the Food and Drug Administration’s proposed approach, based around oversight of tool developers’ quality management systems and defined algorithm change protocols, offer a much-needed paradigm shift, and strive for a balanced approach to enabling rapid improvements in health care through AI innovation while simultaneously ensuring patient safety. The draft European Union (EU) regulatory framework indicates similar approaches, but no detail has yet been provided on how algorithm change protocols will be implemented in the EU. We argue that detail must be provided, and we describe how this could be done in a manner that would allow the full benefits of AI/ML-based innovation for EU patients and health care systems to be realized. %M 34697010 %R 10.2196/30545 %U https://www.jmir.org/2021/10/e30545 %U https://doi.org/10.2196/30545 %U http://www.ncbi.nlm.nih.gov/pubmed/34697010 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 8 %N 4 %P e27706 %T Computer-Aided Screening of Autism Spectrum Disorder: Eye-Tracking Study Using Data Visualization and Deep Learning %A Cilia,Federica %A Carette,Romuald %A Elbattah,Mahmoud %A Dequen,Gilles %A Guérin,Jean-Luc %A Bosche,Jérôme %A Vandromme,Luc %A Le Driant,Barbara %+ UR-UPJV 7273, Centre de Recherche en Psychologie - Cognition, Psychisme, Organisations, Université de Picardie Jules Verne, 10 rue des Français Libres, Amiens, 80000, France, 33 322 827 397, federica.cilia@u-picardie.fr %K autism spectrum disorder %K screening %K eye tracking %K data visualization %K machine learning %K deep learning %K AI %K ASS %K artificial intelligence %K ML %K screening %K adolescent %K diagnosis %D 2021 %7 25.10.2021 %9 Original Paper %J JMIR Hum Factors %G English %X Background: The early diagnosis of autism spectrum disorder (ASD) is highly desirable but remains a challenging task, which requires a set of cognitive tests and hours of clinical examinations. In addition, variations of such symptoms exist, which can make the identification of ASD even more difficult. Although diagnosis tests are largely developed by experts, they are still subject to human bias. In this respect, computer-assisted technologies can play a key role in supporting the screening process. Objective: This paper follows on the path of using eye tracking as an integrated part of screening assessment in ASD based on the characteristic elements of the eye gaze. This study adds to the mounting efforts in using eye tracking technology to support the process of ASD screening Methods: The proposed approach basically aims to integrate eye tracking with visualization and machine learning. A group of 59 school-aged participants took part in the study. The participants were invited to watch a set of age-appropriate photographs and videos related to social cognition. Initially, eye-tracking scanpaths were transformed into a visual representation as a set of images. Subsequently, a convolutional neural network was trained to perform the image classification task. Results: The experimental results demonstrated that the visual representation could simplify the diagnostic task and also attained high accuracy. Specifically, the convolutional neural network model could achieve a promising classification accuracy. This largely suggests that visualizations could successfully encode the information of gaze motion and its underlying dynamics. Further, we explored possible correlations between the autism severity and the dynamics of eye movement based on the maximal information coefficient. The findings primarily show that the combination of eye tracking, visualization, and machine learning have strong potential in developing an objective tool to assist in the screening of ASD. Conclusions: Broadly speaking, the approach we propose could be transferable to screening for other disorders, particularly neurodevelopmental disorders. %M 34694238 %R 10.2196/27706 %U https://humanfactors.jmir.org/2021/4/e27706 %U https://doi.org/10.2196/27706 %U http://www.ncbi.nlm.nih.gov/pubmed/34694238 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 5 %N 10 %P e31862 %T Evaluating the Clinical Feasibility of an Artificial Intelligence–Powered, Web-Based Clinical Decision Support System for the Treatment of Depression in Adults: Longitudinal Feasibility Study %A Popescu,Christina %A Golden,Grace %A Benrimoh,David %A Tanguay-Sela,Myriam %A Slowey,Dominique %A Lundrigan,Eryn %A Williams,Jérôme %A Desormeau,Bennet %A Kardani,Divyesh %A Perez,Tamara %A Rollins,Colleen %A Israel,Sonia %A Perlman,Kelly %A Armstrong,Caitrin %A Baxter,Jacob %A Whitmore,Kate %A Fradette,Marie-Jeanne %A Felcarek-Hope,Kaelan %A Soufi,Ghassen %A Fratila,Robert %A Mehltretter,Joseph %A Looper,Karl %A Steiner,Warren %A Rej,Soham %A Karp,Jordan F %A Heller,Katherine %A Parikh,Sagar V %A McGuire-Snieckus,Rebecca %A Ferrari,Manuela %A Margolese,Howard %A Turecki,Gustavo %+ Aifred Health Inc., 1250 Rue Guy Suite #600, Montreal, QC, H3H 2T4, Canada, 1 5144637813, david.benrimoh@mail.mcgill.com %K clinical decision support system %K major depressive disorder %K artificial intelligence %K feasibility %K usability %K mobile phone %D 2021 %7 25.10.2021 %9 Original Paper %J JMIR Form Res %G English %X Background: Approximately two-thirds of patients with major depressive disorder do not achieve remission during their first treatment. There has been increasing interest in the use of digital, artificial intelligence–powered clinical decision support systems (CDSSs) to assist physicians in their treatment selection and management, improving the personalization and use of best practices such as measurement-based care. Previous literature shows that for digital mental health tools to be successful, the tool must be easy for patients and physicians to use and feasible within existing clinical workflows. Objective: This study aims to examine the feasibility of an artificial intelligence–powered CDSS, which combines the operationalized 2016 Canadian Network for Mood and Anxiety Treatments guidelines with a neural network–based individualized treatment remission prediction. Methods: Owing to the COVID-19 pandemic, the study was adapted to be completed entirely remotely. A total of 7 physicians recruited outpatients diagnosed with major depressive disorder according to the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition criteria. Patients completed a minimum of one visit without the CDSS (baseline) and 2 subsequent visits where the CDSS was used by the physician (visits 1 and 2). The primary outcome of interest was change in appointment length after the introduction of the CDSS as a proxy for feasibility. Feasibility and acceptability data were collected through self-report questionnaires and semistructured interviews. Results: Data were collected between January and November 2020. A total of 17 patients were enrolled in the study; of the 17 patients, 14 (82%) completed the study. There was no significant difference in appointment length between visits (introduction of the tool did not increase appointment length; F2,24=0.805; mean squared error 58.08; P=.46). In total, 92% (12/13) of patients and 71% (5/7) of physicians felt that the tool was easy to use; 62% (8/13) of patients and 71% (5/7) of physicians rated that they trusted the CDSS. Of the 13 patients, 6 (46%) felt that the patient-clinician relationship significantly or somewhat improved, whereas 7 (54%) felt that it did not change. Conclusions: Our findings confirm that the integration of the tool does not significantly increase appointment length and suggest that the CDSS is easy to use and may have positive effects on the patient-physician relationship for some patients. The CDSS is feasible and ready for effectiveness studies. Trial Registration: ClinicalTrials.gov NCT04061642; http://clinicaltrials.gov/ct2/show/NCT04061642 %M 34694234 %R 10.2196/31862 %U https://formative.jmir.org/2021/10/e31862 %U https://doi.org/10.2196/31862 %U http://www.ncbi.nlm.nih.gov/pubmed/34694234 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 8 %N 10 %P e22651 %T Use of Automated Thematic Annotations for Small Data Sets in a Psychotherapeutic Context: Systematic Review of Machine Learning Algorithms %A Hudon,Alexandre %A Beaudoin,Mélissa %A Phraxayavong,Kingsada %A Dellazizzo,Laura %A Potvin,Stéphane %A Dumais,Alexandre %+ Centre de recherche de l'Institut Universitaire en Santé Mentale de Montréal, 7331, rue Hochelaga, Montréal, QC, Canada, 1 (514) 251 4000, alexandre.dumais@umontreal.ca %K psychotherapy %K artificial intelligence %K automated text classification %K machine learning %K systematic review %D 2021 %7 22.10.2021 %9 Review %J JMIR Ment Health %G English %X Background: A growing body of literature has detailed the use of qualitative analyses to measure the therapeutic processes and intrinsic effectiveness of psychotherapies, which yield small databases. Nonetheless, these approaches have several limitations and machine learning algorithms are needed. Objective: The objective of this study is to conduct a systematic review of the use of machine learning for automated text classification for small data sets in the fields of psychiatry, psychology, and social sciences. This review will identify available algorithms and assess if automated classification of textual entities is comparable to the classification done by human evaluators. Methods: A systematic search was performed in the electronic databases of Medline, Web of Science, PsycNet (PsycINFO), and Google Scholar from their inception dates to 2021. The fields of psychiatry, psychology, and social sciences were selected as they include a vast array of textual entities in the domain of mental health that can be reviewed. Additional records identified through cross-referencing were used to find other studies. Results: This literature search identified 5442 articles that were eligible for our study after the removal of duplicates. Following abstract screening, 114 full articles were assessed in their entirety, of which 107 were excluded. The remaining 7 studies were analyzed. Classification algorithms such as naive Bayes, decision tree, and support vector machine classifiers were identified. Support vector machine is the most used algorithm and best performing as per the identified articles. Prediction classification scores for the identified algorithms ranged from 53%-91% for the classification of textual entities in 4-7 categories. In addition, 3 of the 7 studies reported an interjudge agreement statistic; these were consistent with agreement statistics for text classification done by human evaluators. Conclusions: A systematic review of available machine learning algorithms for automated text classification for small data sets in several fields (psychiatry, psychology, and social sciences) was conducted. We compared automated classification with classification done by human evaluators. Our results show that it is possible to automatically classify textual entities of a transcript based solely on small databases. Future studies are nevertheless needed to assess whether such algorithms can be implemented in the context of psychotherapies. %M 34677133 %R 10.2196/22651 %U https://mental.jmir.org/2021/10/e22651 %U https://doi.org/10.2196/22651 %U http://www.ncbi.nlm.nih.gov/pubmed/34677133 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 10 %N 10 %P e32921 %T Assessing a Smartphone App (AICaries) That Uses Artificial Intelligence to Detect Dental Caries in Children and Provides Interactive Oral Health Education: Protocol for a Design and Usability Testing Study %A Xiao,Jin %A Luo,Jiebo %A Ly-Mapes,Oriana %A Wu,Tong Tong %A Dye,Timothy %A Al Jallad,Nisreen %A Hao,Peirong %A Ruan,Jinlong %A Bullock,Sherita %A Fiscella,Kevin %+ Department of Family Medicine, University of Rochester Medical Center, 1381 South Avenue, Rochester, NY, 14620, United States, 1 585 506 9484, Kevin_Fiscella@URMC.Rochester.edu %K artificial intelligence %K smartphone app %K mDentistry %K dental caries %K underserved population %K mobile dentistry %D 2021 %7 22.10.2021 %9 Protocol %J JMIR Res Protoc %G English %X Background: Early childhood caries (ECC) is the most common chronic childhood disease, with nearly 1.8 billion new cases per year worldwide. ECC afflicts approximately 55% of low-income and minority US preschool children, resulting in harmful short- and long-term effects on health and quality of life. Clinical evidence shows that caries is reversible if detected and addressed in its early stages. However, many low-income US children often have poor access to pediatric dental services. In this underserved group, dental caries is often diagnosed at a late stage when extensive restorative treatment is needed. With more than 85% of lower-income Americans owning a smartphone, mobile health tools such as smartphone apps hold promise in achieving patient-driven early detection and risk control of ECC. Objective: This study aims to use a community-based participatory research strategy to refine and test the usability of an artificial intelligence–powered smartphone app, AICaries, to be used by children’s parents/caregivers for dental caries detection in their children. Methods: Our previous work has led to the prototype of AICaries, which offers artificial intelligence–powered caries detection using photos of children’s teeth taken by the parents’ smartphones, interactive caries risk assessment, and personalized education on reducing children’s ECC risk. This AICaries study will use a two-step qualitative study design to assess the feedback and usability of the app component and app flow, and whether parents can take photos of children’s teeth on their own. Specifically, in step 1, we will conduct individual usability tests among 10 pairs of end users (parents with young children) to facilitate app module modification and fine-tuning using think aloud and instant data analysis strategies. In step 2, we will conduct unmoderated field testing for app feasibility and acceptability among 32 pairs of parents with their young children to assess the usability and acceptability of AICaries, including assessing the number/quality of teeth images taken by the parents for their children and parents’ satisfaction. Results: The study is funded by the National Institute of Dental and Craniofacial Research, United States. This study received institutional review board approval and launched in August 2021. Data collection and analysis are expected to conclude by March 2022 and June 2022, respectively. Conclusions: Using AICaries, parents can use their regular smartphones to take photos of their children’s teeth and detect ECC aided by AICaries so that they can actively seek treatment for their children at an early and reversible stage of ECC. Using AICaries, parents can also obtain essential knowledge on reducing their children’s caries risk. Data from this study will support a future clinical trial that evaluates the real-world impact of using this smartphone app on early detection and prevention of ECC among low-income children. International Registered Report Identifier (IRRID): PRR1-10.2196/32921 %M 34529582 %R 10.2196/32921 %U https://www.researchprotocols.org/2021/10/e32921 %U https://doi.org/10.2196/32921 %U http://www.ncbi.nlm.nih.gov/pubmed/34529582 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 10 %P e32328 %T Ethics, Integrity, and Retributions of Digital Detection Surveillance Systems for Infectious Diseases: Systematic Literature Review %A Zhao,Ivy Y %A Ma,Ye Xuan %A Yu,Man Wai Cecilia %A Liu,Jia %A Dong,Wei Nan %A Pang,Qin %A Lu,Xiao Qin %A Molassiotis,Alex %A Holroyd,Eleanor %A Wong,Chi Wai William %+ Department of Family Medicine and Primary Care, Li Ka Shing Faculty of Medicine, The University of Hong Kong, 3/F, Ap Lei Chau Clinic, 161 Main Street, Ap Lei Chau, Hong Kong SAR, China, 86 2518 5657, wongwcw@hku.hk %K artificial intelligence %K electronic medical records %K ethics %K infectious diseases %K machine learning %D 2021 %7 20.10.2021 %9 Review %J J Med Internet Res %G English %X Background: The COVID-19 pandemic has increased the importance of the deployment of digital detection surveillance systems to support early warning and monitoring of infectious diseases. These opportunities create a “double-edge sword,” as the ethical governance of such approaches often lags behind technological achievements. Objective: The aim was to investigate ethical issues identified from utilizing artificial intelligence–augmented surveillance or early warning systems to monitor and detect common or novel infectious disease outbreaks. Methods: In a number of databases, we searched relevant articles that addressed ethical issues of using artificial intelligence, digital surveillance systems, early warning systems, and/or big data analytics technology for detecting, monitoring, or tracing infectious diseases according to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, and further identified and analyzed them with a theoretical framework. Results: This systematic review identified 29 articles presented in 6 major themes clustered under individual, organizational, and societal levels, including awareness of implementing digital surveillance, digital integrity, trust, privacy and confidentiality, civil rights, and governance. While these measures were understandable during a pandemic, the public had concerns about receiving inadequate information; unclear governance frameworks; and lack of privacy protection, data integrity, and autonomy when utilizing infectious disease digital surveillance. The barriers to engagement could widen existing health care disparities or digital divides by underrepresenting vulnerable and at-risk populations, and patients’ highly sensitive data, such as their movements and contacts, could be exposed to outside sources, impinging significantly upon basic human and civil rights. Conclusions: Our findings inform ethical considerations for service delivery models for medical practitioners and policymakers involved in the use of digital surveillance for infectious disease spread, and provide a basis for a global governance structure. Trial Registration: PROSPERO CRD42021259180; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=259180 %M 34543228 %R 10.2196/32328 %U https://www.jmir.org/2021/10/e32328 %U https://doi.org/10.2196/32328 %U http://www.ncbi.nlm.nih.gov/pubmed/34543228 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 10 %P e26486 %T Prediction of Readmission in Geriatric Patients From Clinical Notes: Retrospective Text Mining Study %A Goh,Kim Huat %A Wang,Le %A Yeow,Adrian Yong Kwang %A Ding,Yew Yoong %A Au,Lydia Shu Yi %A Poh,Hermione Mei Niang %A Li,Ke %A Yeow,Joannas Jie Lin %A Tan,Gamaliel Yu Heng %+ Nanyang Business School, Nanyang Technological University, S3-B2A-34, 50 Nanyang Avenue, Singapore, 639798, Singapore, 65 67904808, akhgoh@ntu.edu.sg %K geriatrics %K readmission risk %K artificial intelligence %K text mining %K psychosocial factors %D 2021 %7 19.10.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Prior literature suggests that psychosocial factors adversely impact health and health care utilization outcomes. However, psychosocial factors are typically not captured by the structured data in electronic medical records (EMRs) but are rather recorded as free text in different types of clinical notes. Objective: We here propose a text-mining approach to analyze EMRs to identify older adults with key psychosocial factors that predict adverse health care utilization outcomes, measured by 30-day readmission. The psychological factors were appended to the LACE (Length of stay, Acuity of the admission, Comorbidity of the patient, and Emergency department use) Index for Readmission to improve the prediction of readmission risk. Methods: We performed a retrospective analysis using EMR notes of 43,216 hospitalization encounters in a hospital from January 1, 2017 to February 28, 2019. The mean age of the cohort was 67.51 years (SD 15.87), the mean length of stay was 5.57 days (SD 10.41), and the mean intensive care unit stay was 5% (SD 22%). We employed text-mining techniques to extract psychosocial topics that are representative of these patients and tested the utility of these topics in predicting 30-day hospital readmission beyond the predictive value of the LACE Index for Readmission. Results: The added text-mined factors improved the area under the receiver operating characteristic curve of the readmission prediction by 8.46% for geriatric patients, 6.99% for the general hospital population, and 6.64% for frequent admitters. Medical social workers and case managers captured more of the psychosocial text topics than physicians. Conclusions: The results of this study demonstrate the feasibility of extracting psychosocial factors from EMR clinical notes and the value of these notes in improving readmission risk prediction. Psychosocial profiles of patients can be curated and quantified from text mining clinical notes and these profiles can be successfully applied to artificial intelligence models to improve readmission risk prediction. %M 34665149 %R 10.2196/26486 %U https://www.jmir.org/2021/10/e26486 %U https://doi.org/10.2196/26486 %U http://www.ncbi.nlm.nih.gov/pubmed/34665149 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 10 %P e26305 %T Detecting Parkinson Disease Using a Web-Based Speech Task: Observational Study %A Rahman,Wasifur %A Lee,Sangwu %A Islam,Md Saiful %A Antony,Victor Nikhil %A Ratnu,Harshil %A Ali,Mohammad Rafayet %A Mamun,Abdullah Al %A Wagner,Ellen %A Jensen-Roberts,Stella %A Waddell,Emma %A Myers,Taylor %A Pawlik,Meghan %A Soto,Julia %A Coffey,Madeleine %A Sarkar,Aayush %A Schneider,Ruth %A Tarolli,Christopher %A Lizarraga,Karlo %A Adams,Jamie %A Little,Max A %A Dorsey,E Ray %A Hoque,Ehsan %+ Department of Computer Science, University of Rochester, 250 Hutchinson Rd, Rochester, NY, 14620, United States, 1 5857487677, echowdh2@ur.rochester.edu %K Parkinson’s disease %K speech analysis %K improving access and equity in health care %K mobile phone %D 2021 %7 19.10.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Access to neurological care for Parkinson disease (PD) is a rare privilege for millions of people worldwide, especially in resource-limited countries. In 2013, there were just 1200 neurologists in India for a population of 1.3 billion people; in Africa, the average population per neurologist exceeds 3.3 million people. In contrast, 60,000 people receive a diagnosis of PD every year in the United States alone, and similar patterns of rising PD cases—fueled mostly by environmental pollution and an aging population—can be seen worldwide. The current projection of more than 12 million patients with PD worldwide by 2040 is only part of the picture given that more than 20% of patients with PD remain undiagnosed. Timely diagnosis and frequent assessment are key to ensure timely and appropriate medical intervention, thus improving the quality of life of patients with PD. Objective: In this paper, we propose a web-based framework that can help anyone anywhere around the world record a short speech task and analyze the recorded data to screen for PD. Methods: We collected data from 726 unique participants (PD: 262/726, 36.1% were women; non-PD: 464/726, 63.9% were women; average age 61 years) from all over the United States and beyond. A small portion of the data (approximately 54/726, 7.4%) was collected in a laboratory setting to compare the performance of the models trained with noisy home environment data against high-quality laboratory-environment data. The participants were instructed to utter a popular pangram containing all the letters in the English alphabet, “the quick brown fox jumps over the lazy dog.” We extracted both standard acoustic features (mel-frequency cepstral coefficients and jitter and shimmer variants) and deep learning–based embedding features from the speech data. Using these features, we trained several machine learning algorithms. We also applied model interpretation techniques such as Shapley additive explanations to ascertain the importance of each feature in determining the model’s output. Results: We achieved an area under the curve of 0.753 for determining the presence of self-reported PD by modeling the standard acoustic features through the XGBoost—a gradient-boosted decision tree model. Further analysis revealed that the widely used mel-frequency cepstral coefficient features and a subset of previously validated dysphonia features designed for detecting PD from a verbal phonation task (pronouncing “ahh”) influence the model’s decision the most. Conclusions: Our model performed equally well on data collected in a controlled laboratory environment and in the wild across different gender and age groups. Using this tool, we can collect data from almost anyone anywhere with an audio-enabled device and help the participants screen for PD remotely, contributing to equity and access in neurological care. %M 34665148 %R 10.2196/26305 %U https://www.jmir.org/2021/10/e26305 %U https://doi.org/10.2196/26305 %U http://www.ncbi.nlm.nih.gov/pubmed/34665148 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 10 %P e25497 %T Harnessing Machine Learning to Personalize Web-Based Health Care Content %A Guni,Ahmad %A Normahani,Pasha %A Davies,Alun %A Jaffer,Usman %+ Department of Surgery and Cancer, Imperial College London, Exhibition Road, London, SW7 2AZ, United Kingdom, 44 7803434969, ahmad.guni@nhs.net %K internet %K online health information %K personalized content %K patient education %K machine learning %D 2021 %7 19.10.2021 %9 Viewpoint %J J Med Internet Res %G English %X Web-based health care content has emerged as a primary source for patients to access health information without direct guidance from health care providers. The benefit of this approach is dependent on the ability of patients to access engaging high-quality information, but significant variability in the quality of web-based information often forces patients to navigate large quantities of inaccurate, incomplete, irrelevant, or inaccessible content. Personalization positions the patient at the center of health care models by considering their needs, preferences, goals, and values. However, the traditional methods used thus far in health care to determine the factors of high-quality content for a particular user are insufficient. Machine learning (ML) uses algorithms to process and uncover patterns within large volumes of data to develop predictive models that automatically improve over time. The health care sector has lagged behind other industries in implementing ML to analyze user and content features, which can automate personalized content recommendations on a mass scale. With the advent of big data in health care, which builds comprehensive patient profiles drawn from several disparate sources, ML can be used to integrate structured and unstructured data from users and content to deliver content that is predicted to be effective and engaging for patients. This enables patients to engage in their health and support education, self-management, and positive behavior change as well as to enhance clinical outcomes. %M 34665146 %R 10.2196/25497 %U https://www.jmir.org/2021/10/e25497 %U https://doi.org/10.2196/25497 %U http://www.ncbi.nlm.nih.gov/pubmed/34665146 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 10 %P e29301 %T Adoption of Machine Learning Systems for Medical Diagnostics in Clinics: Qualitative Interview Study %A Pumplun,Luisa %A Fecho,Mariska %A Wahl,Nihal %A Peters,Felix %A Buxmann,Peter %+ Software & Digital Business Group, Technical University of Darmstadt, Hochschulstraße 1, Darmstadt, 64289, Germany, 49 6151 16 24221, luisa.pumplun@tu-darmstadt.de %K machine learning %K clinics %K diagnostics %K adoption %K maturity model %D 2021 %7 15.10.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Recently, machine learning (ML) has been transforming our daily lives by enabling intelligent voice assistants, personalized support for purchase decisions, and efficient credit card fraud detection. In addition to its everyday applications, ML holds the potential to improve medicine as well, especially with regard to diagnostics in clinics. In a world characterized by population growth, demographic change, and the global COVID-19 pandemic, ML systems offer the opportunity to make diagnostics more effective and efficient, leading to a high interest of clinics in such systems. However, despite the high potential of ML, only a few ML systems have been deployed in clinics yet, as their adoption process differs significantly from the integration of prior health information technologies given the specific characteristics of ML. Objective: This study aims to explore the factors that influence the adoption process of ML systems for medical diagnostics in clinics to foster the adoption of these systems in clinics. Furthermore, this study provides insight into how these factors can be used to determine the ML maturity score of clinics, which can be applied by practitioners to measure the clinic status quo in the adoption process of ML systems. Methods: To gain more insight into the adoption process of ML systems for medical diagnostics in clinics, we conducted a qualitative study by interviewing 22 selected medical experts from clinics and their suppliers with profound knowledge in the field of ML. We used a semistructured interview guideline, asked open-ended questions, and transcribed the interviews verbatim. To analyze the transcripts, we first used a content analysis approach based on the health care–specific framework of nonadoption, abandonment, scale-up, spread, and sustainability. Then, we drew on the results of the content analysis to create a maturity model for ML adoption in clinics according to an established development process. Results: With the help of the interviews, we were able to identify 13 ML-specific factors that influence the adoption process of ML systems in clinics. We categorized these factors according to 7 domains that form a holistic ML adoption framework for clinics. In addition, we created an applicable maturity model that could help practitioners assess their current state in the ML adoption process. Conclusions: Many clinics still face major problems in adopting ML systems for medical diagnostics; thus, they do not benefit from the potential of these systems. Therefore, both the ML adoption framework and the maturity model for ML systems in clinics can not only guide future research that seeks to explore the promises and challenges associated with ML systems in a medical setting but also be a practical reference point for clinicians. %M 34652275 %R 10.2196/29301 %U https://www.jmir.org/2021/10/e29301 %U https://doi.org/10.2196/29301 %U http://www.ncbi.nlm.nih.gov/pubmed/34652275 %0 Journal Article %@ 2564-1891 %I JMIR Publications %V 1 %N 1 %P e31983 %T Change in Threads on Twitter Regarding Influenza, Vaccines, and Vaccination During the COVID-19 Pandemic: Artificial Intelligence–Based Infodemiology Study %A Benis,Arriel %A Chatsubi,Anat %A Levner,Eugene %A Ashkenazi,Shai %+ Faculty of Industrial Engineering and Technology Management, Holon Institute of Technology, Golomb St. 52, Holon, 5810201, Israel, 972 35026892, arrielb@hit.ac.il %K influenza %K vaccines %K vaccination %K social media %K social networks %K health communication %K artificial intelligence %K machine learning %K text mining %K infodemiology %K COVID-19 %K SARS-CoV-2 %D 2021 %7 14.10.2021 %9 Original Paper %J JMIR Infodemiology %G English %X Background: Discussions of health issues on social media are a crucial information source reflecting real-world responses regarding events and opinions. They are often important in public health care, since these are influencing pathways that affect vaccination decision-making by hesitant individuals. Artificial intelligence methodologies based on internet search engine queries have been suggested to detect disease outbreaks and population behavior. Among social media, Twitter is a common platform of choice to search and share opinions and (mis)information about health care issues, including vaccination and vaccines. Objective: Our primary objective was to support the design and implementation of future eHealth strategies and interventions on social media to increase the quality of targeted communication campaigns and therefore increase influenza vaccination rates. Our goal was to define an artificial intelligence–based approach to elucidate how threads in Twitter on influenza vaccination changed during the COVID-19 pandemic. Such findings may support adapted vaccination campaigns and could be generalized to other health-related mass communications. Methods: The study comprised the following 5 stages: (1) collecting tweets from Twitter related to influenza, vaccines, and vaccination in the United States; (2) data cleansing and storage using machine learning techniques; (3) identifying terms, hashtags, and topics related to influenza, vaccines, and vaccination; (4) building a dynamic folksonomy of the previously defined vocabulary (terms and topics) to support the understanding of its trends; and (5) labeling and evaluating the folksonomy. Results: We collected and analyzed 2,782,720 tweets of 420,617 unique users between December 30, 2019, and April 30, 2021. These tweets were in English, were from the United States, and included at least one of the following terms: “flu,” “influenza,” “vaccination,” “vaccine,” and “vaxx.” We noticed that the prevalence of the terms vaccine and vaccination increased over 2020, and that “flu” and “covid” occurrences were inversely correlated as “flu” disappeared over time from the tweets. By combining word embedding and clustering, we then identified a folksonomy built around the following 3 topics dominating the content of the collected tweets: “health and medicine (biological and clinical aspects),” “protection and responsibility,” and “politics.” By analyzing terms frequently appearing together, we noticed that the tweets were related mainly to COVID-19 pandemic events. Conclusions: This study focused initially on vaccination against influenza and moved to vaccination against COVID-19. Infoveillance supported by machine learning on Twitter and other social media about topics related to vaccines and vaccination against communicable diseases and their trends can lead to the design of personalized messages encouraging targeted subpopulations’ engagement in vaccination. A greater likelihood that a targeted population receives a personalized message is associated with higher response, engagement, and proactiveness of the target population for the vaccination process. %M 34693212 %R 10.2196/31983 %U https://infodemiology.jmir.org/2021/1/e31983 %U https://doi.org/10.2196/31983 %U http://www.ncbi.nlm.nih.gov/pubmed/34693212 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 10 %P e32771 %T Predictability of Mortality in Patients With Myocardial Injury After Noncardiac Surgery Based on Perioperative Factors via Machine Learning: Retrospective Study %A Shin,Seo Jeong %A Park,Jungchan %A Lee,Seung-Hwa %A Yang,Kwangmo %A Park,Rae Woong %+ Department of Biomedical Sciences, Ajou University Graduate School of Medicine, 206, World cup-ro, Yeongtong-gu, Suwon, 16499, Republic of Korea, 82 0312194471, veritas@ajou.ac.kr %K myocardial injury after noncardiac surgery %K high-sensitivity cardiac troponin %K machine learning %K extreme gradient boosting %D 2021 %7 14.10.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Myocardial injury after noncardiac surgery (MINS) is associated with increased postoperative mortality, but the relevant perioperative factors that contribute to the mortality of patients with MINS have not been fully evaluated. Objective: To establish a comprehensive body of knowledge relating to patients with MINS, we researched the best performing predictive model based on machine learning algorithms. Methods: Using clinical data from 7629 patients with MINS from the clinical data warehouse, we evaluated 8 machine learning algorithms for accuracy, precision, recall, F1 score, area under the receiver operating characteristic (AUROC) curve, and area under the precision-recall curve to investigate the best model for predicting mortality. Feature importance and Shapley Additive Explanations values were analyzed to explain the role of each clinical factor in patients with MINS. Results: Extreme gradient boosting outperformed the other models. The model showed an AUROC of 0.923 (95% CI 0.916-0.930). The AUROC of the model did not decrease in the test data set (0.894, 95% CI 0.86-0.922; P=.06). Antiplatelet drugs prescription, elevated C-reactive protein level, and beta blocker prescription were associated with reduced 30-day mortality. Conclusions: Predicting the mortality of patients with MINS was shown to be feasible using machine learning. By analyzing the impact of predictors, markers that should be cautiously monitored by clinicians may be identified. %M 34647900 %R 10.2196/32771 %U https://medinform.jmir.org/2021/10/e32771 %U https://doi.org/10.2196/32771 %U http://www.ncbi.nlm.nih.gov/pubmed/34647900 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 7 %N 10 %P e30824 %T Self–Training With Quantile Errors for Multivariate Missing Data Imputation for Regression Problems in Electronic Medical Records: Algorithm Development Study %A Gwon,Hansle %A Ahn,Imjin %A Kim,Yunha %A Kang,Hee Jun %A Seo,Hyeram %A Cho,Ha Na %A Choi,Heejung %A Jun,Tae Joon %A Kim,Young-Hak %+ Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympic-ro 43-gil, Songpa-gu, Seoul, 05505, Republic of Korea, 82 2 3010 0955, mdyhkim@amc.seoul.kr %K self-training %K artificial intelligence %K electronic medical records %K imputation %D 2021 %7 13.10.2021 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: When using machine learning in the real world, the missing value problem is the first problem encountered. Methods to impute this missing value include statistical methods such as mean, expectation-maximization, and multiple imputations by chained equations (MICE) as well as machine learning methods such as multilayer perceptron, k-nearest neighbor, and decision tree. Objective: The objective of this study was to impute numeric medical data such as physical data and laboratory data. We aimed to effectively impute data using a progressive method called self-training in the medical field where training data are scarce. Methods: In this paper, we propose a self-training method that gradually increases the available data. Models trained with complete data predict the missing values in incomplete data. Among the incomplete data, the data in which the missing value is validly predicted are incorporated into the complete data. Using the predicted value as the actual value is called pseudolabeling. This process is repeated until the condition is satisfied. The most important part of this process is how to evaluate the accuracy of pseudolabels. They can be evaluated by observing the effect of the pseudolabeled data on the performance of the model. Results: In self-training using random forest (RF), mean squared error was up to 12% lower than pure RF, and the Pearson correlation coefficient was 0.1% higher. This difference was confirmed statistically. In the Friedman test performed on MICE and RF, self-training showed a P value between .003 and .02. A Wilcoxon signed-rank test performed on the mean imputation showed the lowest possible P value, 3.05e-5, in all situations. Conclusions: Self-training showed significant results in comparing the predicted values and actual values, but it needs to be verified in an actual machine learning system. And self-training has the potential to improve performance according to the pseudolabel evaluation method, which will be the main subject of our future research. %M 34643539 %R 10.2196/30824 %U https://publichealth.jmir.org/2021/10/e30824 %U https://doi.org/10.2196/30824 %U http://www.ncbi.nlm.nih.gov/pubmed/34643539 %0 Journal Article %@ 2562-0959 %I JMIR Publications %V 4 %N 2 %P e31697 %T Performance of Artificial Intelligence Imaging Models in Detecting Dermatological Manifestations in Higher Fitzpatrick Skin Color Classifications %A Aggarwal,Pushkar %+ College of Medicine, University of Cincinnati, 3230 Eden Ave, Cincinnati, OH, 45267, United States, 1 2402000896, aggarwpr@mail.uc.edu %K deep learning %K melanoma %K basal cell carcinoma %K skin of color %K image recognition %K dermatology %K disease %K convolutional neural network %K specificity %K prediction %K artificial intelligence %K skin color %K skin tone %D 2021 %7 12.10.2021 %9 Short Paper %J JMIR Dermatol %G English %X Background: The performance of deep-learning image recognition models is below par when applied to images with Fitzpatrick classification skin types 4 and 5. Objective: The objective of this research was to assess whether image recognition models perform differently when differentiating between dermatological diseases in individuals with darker skin color (Fitzpatrick skin types 4 and 5) than when differentiating between the same dermatological diseases in Caucasians (Fitzpatrick skin types 1, 2, and 3) when both models are trained on the same number of images. Methods: Two image recognition models were trained, validated, and tested. The goal of each model was to differentiate between melanoma and basal cell carcinoma. Open-source images of melanoma and basal cell carcinoma were acquired from the Hellenic Dermatological Atlas, the Dermatology Atlas, the Interactive Dermatology Atlas, and DermNet NZ. Results: The image recognition models trained and validated on images with light skin color had higher sensitivity, specificity, positive predictive value, negative predictive value, and F1 score than the image recognition models trained and validated on images of skin of color for differentiation between melanoma and basal cell carcinoma. Conclusions: A higher number of images of dermatological diseases in individuals with darker skin color than images of dermatological diseases in individuals with light skin color would need to be gathered for artificial intelligence models to perform equally well. %M 37632853 %R 10.2196/31697 %U https://derma.jmir.org/2021/2/e31697 %U https://doi.org/10.2196/31697 %U http://www.ncbi.nlm.nih.gov/pubmed/37632853 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 9 %N 10 %P e32444 %T Smartphone-Based Artificial Intelligence–Assisted Prediction for Eyelid Measurements: Algorithm Development and Observational Validation Study %A Chen,Hung-Chang %A Tzeng,Shin-Shi %A Hsiao,Yen-Chang %A Chen,Ruei-Feng %A Hung,Erh-Chien %A Lee,Oscar K %+ Institute of Clinical Medicine, National Yang Ming Chiao Tung University, No 155, Section 2, Li-Nong Street, Beitou District, Taipei, 112, Taiwan, 886 2 28757391, oscarlee9203@gmail.com %K artificial intelligence %K AI %K deep learning %K margin reflex distance 1 %K margin reflex distance 2 %K levator muscle function %K smartphone %K measurement %K eye %K prediction %K processing %K limit %K image %K algorithm %K observational %D 2021 %7 8.10.2021 %9 Original Paper %J JMIR Mhealth Uhealth %G English %X Background: Margin reflex distance 1 (MRD1), margin reflex distance 2 (MRD2), and levator muscle function (LF) are crucial metrics for ptosis evaluation and management. However, manual measurements of MRD1, MRD2, and LF are time-consuming, subjective, and prone to human error. Smartphone-based artificial intelligence (AI) image processing is a potential solution to overcome these limitations. Objective: We propose the first smartphone-based AI-assisted image processing algorithm for MRD1, MRD2, and LF measurements. Methods: This observational study included 822 eyes of 411 volunteers aged over 18 years from August 1, 2020, to April 30, 2021. Six orbital photographs (bilateral primary gaze, up-gaze, and down-gaze) were taken using a smartphone (iPhone 11 Pro Max). The gold-standard measurements and normalized eye photographs were obtained from these orbital photographs and compiled using AI-assisted software to create MRD1, MRD2, and LF models. Results: The Pearson correlation coefficients between the gold-standard measurements and the predicted values obtained with the MRD1 and MRD2 models were excellent (r=0.91 and 0.88, respectively) and that obtained with the LF model was good (r=0.73). The intraclass correlation coefficient demonstrated excellent agreement between the gold-standard measurements and the values predicted by the MRD1 and MRD2 models (0.90 and 0.84, respectively), and substantial agreement with the LF model (0.69). The mean absolute errors were 0.35 mm, 0.37 mm, and 1.06 mm for the MRD1, MRD2, and LF models, respectively. The 95% limits of agreement were –0.94 to 0.94 mm for the MRD1 model, –0.92 to 1.03 mm for the MRD2 model, and –0.63 to 2.53 mm for the LF model. Conclusions: We developed the first smartphone-based AI-assisted image processing algorithm for eyelid measurements. MRD1, MRD2, and LF measures can be taken in a quick, objective, and convenient manner. Furthermore, by using a smartphone, the examiner can check these measurements anywhere and at any time, which facilitates data collection. %M 34538776 %R 10.2196/32444 %U https://mhealth.jmir.org/2021/10/e32444 %U https://doi.org/10.2196/32444 %U http://www.ncbi.nlm.nih.gov/pubmed/34538776 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 10 %N 10 %P e30940 %T Accelerating the Appropriate Adoption of Artificial Intelligence in Health Care: Protocol for a Multistepped Approach %A Wiljer,David %A Salhia,Mohammad %A Dolatabadi,Elham %A Dhalla,Azra %A Gillan,Caitlin %A Al-Mouaswas,Dalia %A Jackson,Ethan %A Waldorf,Jacqueline %A Mattson,Jane %A Clare,Megan %A Lalani,Nadim %A Charow,Rebecca %A Balakumar,Sarmini %A Younus,Sarah %A Jeyakumar,Tharshini %A Peteanu,Wanda %A Tavares,Walter %+ University Health Network, 190 Elizabeth Street, R Fraser Elliot Building RFE 3S-441, Toronto, ON, M5G 2C4, Canada, 1 416 340 4800 ext 6322, David.wiljer@uhn.ca %K artificial intelligence %K health care providers %K education %K learning %K patient care %K adoption %K mHealth %D 2021 %7 6.10.2021 %9 Protocol %J JMIR Res Protoc %G English %X Background: Significant investments and advances in health care technologies and practices have created a need for digital and data-literate health care providers. Artificial intelligence (AI) algorithms transform the analysis, diagnosis, and treatment of medical conditions. Complex and massive data sets are informing significant health care decisions and clinical practices. The ability to read, manage, and interpret large data sets to provide data-driven care and to protect patient privacy are increasingly critical skills for today’s health care providers. Objective: The aim of this study is to accelerate the appropriate adoption of data-driven and AI-enhanced care by focusing on the mindsets, skillsets, and toolsets of point-of-care health providers and their leaders in the health system. Methods: To accelerate the adoption of AI and the need for organizational change at a national level, our multistepped approach includes creating awareness and capacity building, learning through innovation and adoption, developing appropriate and strategic partnerships, and building effective knowledge exchange initiatives. Education interventions designed to adapt knowledge to the local context and address any challenges to knowledge use include engagement activities to increase awareness, educational curricula for health care providers and leaders, and the development of a coaching and practice-based innovation hub. Framed by the Knowledge-to-Action framework, we are currently in the knowledge creation stage to inform the curricula for each deliverable. An environmental scan and scoping review were conducted to understand the current state of AI education programs as reported in the academic literature. Results: The environmental scan identified 24 AI-accredited programs specific to health providers, of which 11 were from the United States, 6 from Canada, 4 from the United Kingdom, and 3 from Asian countries. The most common curriculum topics across the environmental scan and scoping review included AI fundamentals, applications of AI, applied machine learning in health care, ethics, data science, and challenges to and opportunities for using AI. Conclusions: Technologies are advancing more rapidly than organizations, and professionals can adopt and adapt to them. To help shape AI practices, health care providers must have the skills and abilities to initiate change and shape the future of their discipline and practices for advancing high-quality care within the digital ecosystem. International Registered Report Identifier (IRRID): PRR1-10.2196/30940 %M 34612839 %R 10.2196/30940 %U https://www.researchprotocols.org/2021/10/e30940 %U https://doi.org/10.2196/30940 %U http://www.ncbi.nlm.nih.gov/pubmed/34612839 %0 Journal Article %@ 2561-9128 %I JMIR Publications %V 4 %N 2 %P e29200 %T Predicting Prolonged Apnea During Nurse-Administered Procedural Sedation: Machine Learning Study %A Conway,Aaron %A Jungquist,Carla R %A Chang,Kristina %A Kamboj,Navpreet %A Sutherland,Joanna %A Mafeld,Sebastian %A Parotto,Matteo %+ Lawrence S. Bloomberg Faculty of Nursing, University of Toronto, Suite 130, 155 College Street, Toronto, ON, M5T 1P8, Canada, 1 (416) 340 4654, aaron.conway@utoronto.ca %K procedural sedation and analgesia %K conscious sedation %K nursing %K informatics %K patient safety %K machine learning %K capnography %K anesthesia %K anaesthesia %K medical informatics %K sleep apnea %K apnea %K apnoea %K sedation %D 2021 %7 5.10.2021 %9 Original Paper %J JMIR Perioper Med %G English %X Background: Capnography is commonly used for nurse-administered procedural sedation. Distinguishing between capnography waveform abnormalities that signal the need for clinical intervention for an event and those that do not indicate the need for intervention is essential for the successful implementation of this technology into practice. It is possible that capnography alarm management may be improved by using machine learning to create a “smart alarm” that can alert clinicians to apneic events that are predicted to be prolonged. Objective: To determine the accuracy of machine learning models for predicting at the 15-second time point if apnea will be prolonged (ie, apnea that persists for >30 seconds). Methods: A secondary analysis of an observational study was conducted. We selected several candidate models to evaluate, including a random forest model, generalized linear model (logistic regression), least absolute shrinkage and selection operator regression, ridge regression, and the XGBoost model. Out-of-sample accuracy of the models was calculated using 10-fold cross-validation. The net benefit decision analytic measure was used to assist with deciding whether using the models in practice would lead to better outcomes on average than using the current default capnography alarm management strategies. The default strategies are the aggressive approach, in which an alarm is triggered after brief periods of apnea (typically 15 seconds) and the conservative approach, in which an alarm is triggered for only prolonged periods of apnea (typically >30 seconds). Results: A total of 384 apneic events longer than 15 seconds were observed in 61 of the 102 patients (59.8%) who participated in the observational study. Nearly half of the apneic events (180/384, 46.9%) were prolonged. The random forest model performed the best in terms of discrimination (area under the receiver operating characteristic curve 0.66) and calibration. The net benefit associated with the random forest model exceeded that associated with the aggressive strategy but was lower than that associated with the conservative strategy. Conclusions: Decision curve analysis indicated that using a random forest model would lead to a better outcome for capnography alarm management than using an aggressive strategy in which alarms are triggered after 15 seconds of apnea. The model would not be superior to the conservative strategy in which alarms are only triggered after 30 seconds. %M 34609322 %R 10.2196/29200 %U https://periop.jmir.org/2021/2/e29200 %U https://doi.org/10.2196/29200 %U http://www.ncbi.nlm.nih.gov/pubmed/34609322 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 10 %P e27177 %T Use of Deep Learning to Predict Acute Kidney Injury After Intravenous Contrast Media Administration: Prediction Model Development Study %A Yun,Donghwan %A Cho,Semin %A Kim,Yong Chul %A Kim,Dong Ki %A Oh,Kook-Hwan %A Joo,Kwon Wook %A Kim,Yon Su %A Han,Seung Seok %+ Department of Biomedical Sciences, Seoul National University College of Medicine, 103 Daehakro, Jongno-gu, Seoul, 03080, Republic of Korea, 82 2 2072 4785 ext 8095, hansway80@gmail.com %K acute kidney injury %K artificial intelligence %K contrast media %K deep learning %K machine learning %K kidney injury %K computed tomography %D 2021 %7 1.10.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Precise prediction of contrast media–induced acute kidney injury (CIAKI) is an important issue because of its relationship with poor outcomes. Objective: Herein, we examined whether a deep learning algorithm could predict the risk of intravenous CIAKI better than other machine learning and logistic regression models in patients undergoing computed tomography (CT). Methods: A total of 14,185 patients who were administered intravenous contrast media for CT at the preventive and monitoring facility in Seoul National University Hospital were reviewed. CIAKI was defined as an increase in serum creatinine of ≥0.3 mg/dL within 2 days or ≥50% within 7 days. Using both time-varying and time-invariant features, machine learning models, such as the recurrent neural network (RNN), light gradient boosting machine (LGM), extreme gradient boosting machine (XGB), random forest (RF), decision tree (DT), support vector machine (SVM), κ-nearest neighbors, and logistic regression, were developed using a training set, and their performance was compared using the area under the receiver operating characteristic curve (AUROC) in a test set. Results: CIAKI developed in 261 cases (1.8%). The RNN model had the highest AUROC of 0.755 (0.708-0.802) for predicting CIAKI, which was superior to that obtained from other machine learning models. Although CIAKI was defined as an increase in serum creatinine of ≥0.5 mg/dL or ≥25% within 3 days, the highest performance was achieved in the RNN model with an AUROC of 0.716 (95% confidence interval [CI] 0.664-0.768). In feature ranking analysis, the albumin level was the most highly contributing factor to RNN performance, followed by time-varying kidney function. Conclusions: Application of a deep learning algorithm improves the predictability of intravenous CIAKI after CT, representing a basis for future clinical alarming and preventive systems. %M 34596574 %R 10.2196/27177 %U https://medinform.jmir.org/2021/10/e27177 %U https://doi.org/10.2196/27177 %U http://www.ncbi.nlm.nih.gov/pubmed/34596574 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 7 %N 9 %P e29544 %T Uncovering Clinical Risk Factors and Predicting Severe COVID-19 Cases Using UK Biobank Data: Machine Learning Approach %A Wong,Kenneth Chi-Yin %A Xiang,Yong %A Yin,Liangying %A So,Hon-Cheong %+ School of Biomedical Sciences, The Chinese University of Hong Kong, RM 520A, Lo Kwee Seong Biomedical Sciences Buildiing, Chinese University of Hong Kong, Hong Kong, China, 86 39439255, hcso@cuhk.edu.hk %K prediction %K COVID-19 %K risk factors %K machine learning %K pandemic %K biobank %K public health %K prediction models %K medical informatics %D 2021 %7 30.9.2021 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: COVID-19 is a major public health concern. Given the extent of the pandemic, it is urgent to identify risk factors associated with disease severity. More accurate prediction of those at risk of developing severe infections is of high clinical importance. Objective: Based on the UK Biobank (UKBB), we aimed to build machine learning models to predict the risk of developing severe or fatal infections, and uncover major risk factors involved. Methods: We first restricted the analysis to infected individuals (n=7846), then performed analysis at a population level, considering those with no known infection as controls (ncontrols=465,728). Hospitalization was used as a proxy for severity. A total of 97 clinical variables (collected prior to the COVID-19 outbreak) covering demographic variables, comorbidities, blood measurements (eg, hematological/liver/renal function/metabolic parameters), anthropometric measures, and other risk factors (eg, smoking/drinking) were included as predictors. We also constructed a simplified (lite) prediction model using 27 covariates that can be more easily obtained (demographic and comorbidity data). XGboost (gradient-boosted trees) was used for prediction and predictive performance was assessed by cross-validation. Variable importance was quantified by Shapley values (ShapVal), permutation importance (PermImp), and accuracy gain. Shapley dependency and interaction plots were used to evaluate the pattern of relationships between risk factors and outcomes. Results: A total of 2386 severe and 477 fatal cases were identified. For analyses within infected individuals (n=7846), our prediction model achieved area under the receiving-operating characteristic curve (AUC–ROC) of 0.723 (95% CI 0.711-0.736) and 0.814 (95% CI 0.791-0.838) for severe and fatal infections, respectively. The top 5 contributing factors (sorted by ShapVal) for severity were age, number of drugs taken (cnt_tx), cystatin C (reflecting renal function), waist-to-hip ratio (WHR), and Townsend deprivation index (TDI). For mortality, the top features were age, testosterone, cnt_tx, waist circumference (WC), and red cell distribution width. For analyses involving the whole UKBB population, AUCs for severity and fatality were 0.696 (95% CI 0.684-0.708) and 0.825 (95% CI 0.802-0.848), respectively. The same top 5 risk factors were identified for both outcomes, namely, age, cnt_tx, WC, WHR, and TDI. Apart from the above, age, cystatin C, TDI, and cnt_tx were among the top 10 across all 4 analyses. Other diseases top ranked by ShapVal or PermImp were type 2 diabetes mellitus (T2DM), coronary artery disease, atrial fibrillation, and dementia, among others. For the “lite” models, predictive performances were broadly similar, with estimated AUCs of 0.716, 0.818, 0.696, and 0.830, respectively. The top ranked variables were similar to above, including age, cnt_tx, WC, sex (male), and T2DM. Conclusions: We identified numerous baseline clinical risk factors for severe/fatal infection by XGboost. For example, age, central obesity, impaired renal function, multiple comorbidities, and cardiometabolic abnormalities may predispose to poorer outcomes. The prediction models may be useful at a population level to identify those susceptible to developing severe/fatal infections, facilitating targeted prevention strategies. A risk-prediction tool is also available online. Further replications in independent cohorts are required to verify our findings. %M 34591027 %R 10.2196/29544 %U https://publichealth.jmir.org/2021/9/e29544 %U https://doi.org/10.2196/29544 %U http://www.ncbi.nlm.nih.gov/pubmed/34591027 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 9 %P e27122 %T Radiation Oncologists’ Perceptions of Adopting an Artificial Intelligence–Assisted Contouring Technology: Model Development and Questionnaire Study %A Zhai,Huiwen %A Yang,Xin %A Xue,Jiaolong %A Lavender,Christopher %A Ye,Tiantian %A Li,Ji-Bin %A Xu,Lanyang %A Lin,Li %A Cao,Weiwei %A Sun,Ying %+ Department of Radiation Oncology, Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, 651 Dongfeng Road, Guangzhou, 510060, China, 86 02087343066, sunying@sysucc.org.cn %K artificial intelligence %K technology acceptance model %K intension %K resistance %D 2021 %7 30.9.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: An artificial intelligence (AI)–assisted contouring system benefits radiation oncologists by saving time and improving treatment accuracy. Yet, there is much hope and fear surrounding such technologies, and this fear can manifest as resistance from health care professionals, which can lead to the failure of AI projects. Objective: The objective of this study was to develop and test a model for investigating the factors that drive radiation oncologists’ acceptance of AI contouring technology in a Chinese context. Methods: A model of AI-assisted contouring technology acceptance was developed based on the Unified Theory of Acceptance and Use of Technology (UTAUT) model by adding the variables of perceived risk and resistance that were proposed in this study. The model included 8 constructs with 29 questionnaire items. A total of 307 respondents completed the questionnaires. Structural equation modeling was conducted to evaluate the model’s path effects, significance, and fitness. Results: The overall fitness indices for the model were evaluated and showed that the model was a good fit to the data. Behavioral intention was significantly affected by performance expectancy (β=.155; P=.01), social influence (β=.365; P<.001), and facilitating conditions (β=.459; P<.001). Effort expectancy (β=.055; P=.45), perceived risk (β=−.048; P=.35), and resistance bias (β=−.020; P=.63) did not significantly affect behavioral intention. Conclusions: The physicians’ overall perceptions of an AI-assisted technology for radiation contouring were high. Technology resistance among Chinese radiation oncologists was low and not related to behavioral intention. Not all of the factors in the Venkatesh UTAUT model applied to AI technology adoption among physicians in a Chinese context. %M 34591029 %R 10.2196/27122 %U https://www.jmir.org/2021/9/e27122 %U https://doi.org/10.2196/27122 %U http://www.ncbi.nlm.nih.gov/pubmed/34591029 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 9 %P e24560 %T Prevention of Suicidal Relapses in Adolescents With a Smartphone Application: Bayesian Network Analysis of a Preclinical Trial Using In Silico Patient Simulations %A Mouchabac,Stephane %A Leray,Philippe %A Adrien,Vladimir %A Gollier-Briant,Fanny %A Bonnot,Olivier %+ Department of Child and Adolescent Psychiatry, Centre hospitalier universitaire de Nantes, 30 boulevard Jean Monnet, Nantes, 44000, France, 33 4323232, olivier.bonnot@chu-nantes.fr %K suicide %K bayesian network %K smartphone application %K digital psychiatry %K artificial intelligence %D 2021 %7 30.9.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Recently, artificial intelligence technologies and machine learning methods have offered attractive prospects to design and manage crisis response processes, especially in suicide crisis management. In other domains, most algorithms are based on big data to help diagnose and suggest rational treatment options in medicine. But data in psychiatry are related to behavior and clinical evaluation. They are more heterogeneous, less objective, and incomplete compared to other fields of medicine. Consequently, the use of psychiatric clinical data may lead to less accurate and sometimes impossible-to-build algorithms and provide inefficient digital tools. In this case, the Bayesian network (BN) might be helpful and accurate when constructed from expert knowledge. Medical Companion is a government-funded smartphone application based on repeated questions posed to the subject and algorithm-matched advice to prevent relapse of suicide attempts within several months. Objective: Our paper aims to present our development of a BN algorithm as a medical device in accordance with the American Psychiatric Association digital healthcare guidelines and to provide results from a preclinical phase. Methods: The experts are psychiatrists working in university hospitals who are experienced and trained in managing suicidal crises. As recommended when building a BN, we divided the process into 2 tasks. Task 1 is structure determination, representing the qualitative part of the BN. The factors were chosen for their known and demonstrated link with suicidal risk in the literature (clinical, behavioral, and psychometrics) and therapeutic accuracy (advice). Task 2 is parameter elicitation, with the conditional probabilities corresponding to the quantitative part. The 4-step simulation (use case) process allowed us to ensure that the advice was adapted to the clinical states of patients and the context. Results: For task 1, in this formative part, we defined clinical questions related to the mental state of the patients, and we proposed specific factors related to the questions. Subsequently, we suggested specific advice related to the patient’s state. We obtained a structure for the BN with a graphical representation of causal relations between variables. For task 2, several runs of simulations confirmed the a priori model of experts regarding mental state, refining the precision of our model. Moreover, we noticed that the advice had the same distribution as the previous state and was clinically relevant. After 2 rounds of simulation, the experts found the exact match. Conclusions: BN is an efficient methodology to build an algorithm for a digital assistant dedicated to suicidal crisis management. Digital psychiatry is an emerging field, but it needs validation and testing before being used with patients. Similar to psychotropics, any medical device requires a phase II (preclinical) trial. With this method, we propose another step to respond to the American Psychiatric Association guidelines. Trial Registration: ClinicalTrials.gov NCT03975881; https://clinicaltrials.gov/ct2/show/NCT03975881 %M 34591030 %R 10.2196/24560 %U https://www.jmir.org/2021/9/e24560 %U https://doi.org/10.2196/24560 %U http://www.ncbi.nlm.nih.gov/pubmed/34591030 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 9 %P e31311 %T Short-Term Event Prediction in the Operating Room (STEP-OP) of Five-Minute Intraoperative Hypotension Using Hybrid Deep Learning: Retrospective Observational Study and Model Development %A Choe,Sooho %A Park,Eunjeong %A Shin,Wooseok %A Koo,Bonah %A Shin,Dongjin %A Jung,Chulwoo %A Lee,Hyungchul %A Kim,Jeongmin %+ Department of Anesthesiology and Pain Medicine, Yonsei University College of Medicine, 50 Yonsei-ro, Seodaemungu, Seoul, 03722, Republic of Korea, 82 2 2228 6456, Anesjeongmin@yuhs.ac %K arterial pressure %K artificial intelligence %K biosignals %K deep learning %K hypotension %K machine learning %D 2021 %7 30.9.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Intraoperative hypotension has an adverse impact on postoperative outcomes. However, it is difficult to predict and treat intraoperative hypotension in advance according to individual clinical parameters. Objective: The aim of this study was to develop a prediction model to forecast 5-minute intraoperative hypotension based on the weighted average ensemble of individual neural networks, utilizing the biosignals recorded during noncardiac surgery. Methods: In this retrospective observational study, arterial waveforms were recorded during noncardiac operations performed between August 2016 and December 2019, at Seoul National University Hospital, Seoul, South Korea. We analyzed the arterial waveforms from the big data in the VitalDB repository of electronic health records. We defined 2s hypotension as the moving average of arterial pressure under 65 mmHg for 2 seconds, and intraoperative hypotensive events were defined when the 2s hypotension lasted for at least 60 seconds. We developed an artificial intelligence–enabled process, named short-term event prediction in the operating room (STEP-OP), for predicting short-term intraoperative hypotension. Results: The study was performed on 18,813 subjects undergoing noncardiac surgeries. Deep-learning algorithms (convolutional neural network [CNN] and recurrent neural network [RNN]) using raw waveforms as input showed greater area under the precision-recall curve (AUPRC) scores (0.698, 95% CI 0.690-0.705 and 0.706, 95% CI 0.698-0.715, respectively) than that of the logistic regression algorithm (0.673, 95% CI 0.665-0.682). STEP-OP performed better and had greater AUPRC values than those of the RNN and CNN algorithms (0.716, 95% CI 0.708-0.723). Conclusions: We developed STEP-OP as a weighted average of deep-learning models. STEP-OP predicts intraoperative hypotension more accurately than the CNN, RNN, and logistic regression models. Trial Registration: ClinicalTrials.gov NCT02914444; https://clinicaltrials.gov/ct2/show/NCT02914444. %M 34591024 %R 10.2196/31311 %U https://medinform.jmir.org/2021/9/e31311 %U https://doi.org/10.2196/31311 %U http://www.ncbi.nlm.nih.gov/pubmed/34591024 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 9 %P e30157 %T COVID-19 Mortality Prediction From Deep Learning in a Large Multistate Electronic Health Record and Laboratory Information System Data Set: Algorithm Development and Validation %A Sankaranarayanan,Saranya %A Balan,Jagadheshwar %A Walsh,Jesse R %A Wu,Yanhong %A Minnich,Sara %A Piazza,Amy %A Osborne,Collin %A Oliver,Gavin R %A Lesko,Jessica %A Bates,Kathy L %A Khezeli,Kia %A Block,Darci R %A DiGuardo,Margaret %A Kreuter,Justin %A O’Horo,John C %A Kalantari,John %A Klee,Eric W %A Salama,Mohamed E %A Kipp,Benjamin %A Morice,William G %A Jenkinson,Garrett %+ Mayo Clinic, 200 1st St SW, Rochester, MN, 55905, United States, 1 507 293 9457, Jenkinson.William@mayo.edu %K COVID-19 %K mortality %K prediction %K recurrent neural networks %K missing data %K time series %K deep learning %K machine learning %K neural network %K electronic health record %K EHR %K algorithm %K development %K validation %D 2021 %7 28.9.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: COVID-19 is caused by the SARS-CoV-2 virus and has strikingly heterogeneous clinical manifestations, with most individuals contracting mild disease but a substantial minority experiencing fulminant cardiopulmonary symptoms or death. The clinical covariates and the laboratory tests performed on a patient provide robust statistics to guide clinical treatment. Deep learning approaches on a data set of this nature enable patient stratification and provide methods to guide clinical treatment. Objective: Here, we report on the development and prospective validation of a state-of-the-art machine learning model to provide mortality prediction shortly after confirmation of SARS-CoV-2 infection in the Mayo Clinic patient population. Methods: We retrospectively constructed one of the largest reported and most geographically diverse laboratory information system and electronic health record of COVID-19 data sets in the published literature, which included 11,807 patients residing in 41 states of the United States of America and treated at medical sites across 5 states in 3 time zones. Traditional machine learning models were evaluated independently as well as in a stacked learner approach by using AutoGluon, and various recurrent neural network architectures were considered. The traditional machine learning models were implemented using the AutoGluon-Tabular framework, whereas the recurrent neural networks utilized the TensorFlow Keras framework. We trained these models to operate solely using routine laboratory measurements and clinical covariates available within 72 hours of a patient’s first positive COVID-19 nucleic acid test result. Results: The GRU-D recurrent neural network achieved peak cross-validation performance with 0.938 (SE 0.004) as the area under the receiver operating characteristic (AUROC) curve. This model retained strong performance by reducing the follow-up time to 12 hours (0.916 [SE 0.005] AUROC), and the leave-one-out feature importance analysis indicated that the most independently valuable features were age, Charlson comorbidity index, minimum oxygen saturation, fibrinogen level, and serum iron level. In the prospective testing cohort, this model provided an AUROC of 0.901 and a statistically significant difference in survival (P<.001, hazard ratio for those predicted to survive, 95% CI 0.043-0.106). Conclusions: Our deep learning approach using GRU-D provides an alert system to flag mortality for COVID-19–positive patients by using clinical covariates and laboratory values within a 72-hour window after the first positive nucleic acid test result. %M 34449401 %R 10.2196/30157 %U https://www.jmir.org/2021/9/e30157 %U https://doi.org/10.2196/30157 %U http://www.ncbi.nlm.nih.gov/pubmed/34449401 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 5 %N 9 %P e27285 %T Assessment of the Quality Management System for Clinical Nutrition in Jiangsu: Survey Study %A Wang,Jin %A Pan,Chen %A Ma,Xianghua %+ First Affiliated Hospital of Nanjing Medical University, No. 300 Guangzhou Road, Nanjing, 210029, China, 86 17625989728, yixingpanchen@163.com %K quality management system %K human resource management %K artificial intelligence %K online health %K health science %K clinical nutrition %K online platform %K health platform %K nutrition %K patient education %K dietitian %D 2021 %7 27.9.2021 %9 Original Paper %J JMIR Form Res %G English %X Background: An electronic system that automatically collects medical information can realize timely monitoring of patient health and improve the effectiveness and accuracy of medical treatment. To our knowledge, the application of artificial intelligence (AI) in medical service quality assessment has been minimally evaluated, especially for clinical nutrition departments in China. From the perspective of medical ethics, patient safety comes before any other factors within health science, and this responsibility belongs to the quality management system (QMS) within medical institutions. Objective: This study aims to evaluate the QMS for clinical nutrition in Jiangsu, monitor its performance in quality assessment and human resource management from a nutrition aspect, and investigate the application and development of AI in medical quality control. Methods: The participants for this study were the staff of 70 clinical nutrition departments of the tertiary hospitals in Jiangsu Province, China. These departments are all members of the Quality Management System of Clinical Nutrition in Jiangsu (QMSNJ). An online survey was conducted on all 341 employees within all clinical nutrition departments based on the staff information from the surveyed medical institutions. The questionnaire contains five sections, and the data analysis and AI evaluation were focused on human resource information. Results: A total of 330 questionnaires were collected, with a response rate of 96.77% (330/341). A QMS for clinical nutrition was built for clinical nutrition departments in Jiangsu and achieved its target of human resource improvements, especially among dietitians. The growing number of participating departments (an increase of 42.8% from 2018 to 2020) and the significant growth of dietitians (t93.4=–0.42; P=.02) both show the advancements of the QMSNJ. Conclusions: As the first innovation of an online platform for quality management in Jiangsu, the Jiangsu Province Clinical Nutrition Management Platform was successfully implemented as a QMS for this study. This multidimensional electronic system can help the QMSNJ and clinical nutrition departments achieve quality assessment from various aspects so as to realize the continuous improvement of clinical nutrition. The use of an online platform and AI technology for quality assessment is worth recommending and promoting in the future. %M 34569942 %R 10.2196/27285 %U https://formative.jmir.org/2021/9/e27285 %U https://doi.org/10.2196/27285 %U http://www.ncbi.nlm.nih.gov/pubmed/34569942 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 9 %N 9 %P e27535 %T Effects of an Artificial Intelligence–Assisted Health Program on Workers With Neck/Shoulder Pain/Stiffness and Low Back Pain: Randomized Controlled Trial %A Anan,Tomomi %A Kajiki,Shigeyuki %A Oka,Hiroyuki %A Fujii,Tomoko %A Kawamata,Kayo %A Mori,Koji %A Matsudaira,Ko %+ Department of Occupational Health Practice and Management, Institute of Industrial Ecological Sciences, University of Occupational and Environmental Health, Japan, 1-1 Iseigaoka, Yahatanishi-ku, Kitakyushu, 807-8555, Japan, 81 93 691 7523, skajiki@med.uoeh-u.ac.jp %K neck pain %K shoulder pain %K shoulder stiffness %K low back pain %K musculoskeletal symptoms %K digital intervention %K mobile app %K mHealth %K eHealth %K digital health %K mobile phone %D 2021 %7 24.9.2021 %9 Original Paper %J JMIR Mhealth Uhealth %G English %X Background: Musculoskeletal symptoms such as neck and shoulder pain/stiffness and low back pain are common health problems in the working population. They are the leading causes of presenteeism (employees being physically present at work but unable to be fully engaged). Recently, digital interventions have begun to be used to manage health but their effectiveness has not yet been fully verified, and adherence to such programs is always a problem. Objective: This study aimed to evaluate the improvements in musculoskeletal symptoms in workers with neck/shoulder stiffness/pain and low back pain after the use of an exercise-based artificial intelligence (AI)–assisted interactive health promotion system that operates through a mobile messaging app (the AI-assisted health program). We expected that this program would support participants’ adherence to exercises. Methods: We conducted a two-armed, randomized, controlled, and unblinded trial in workers with either neck/shoulder stiffness/pain or low back pain or both. We recruited participants with these symptoms through email notifications. The intervention group received the AI-assisted health program, in which the chatbot sent messages to users with the exercise instructions at a fixed time every day through the smartphone’s chatting app (LINE) for 12 weeks. The program was fully automated. The control group continued with their usual care routines. We assessed the subjective severity of the neck and shoulder pain/stiffness and low back pain of the participants by using a scoring scale of 1 to 5 for both the intervention group and the control group at baseline and after 12 weeks of intervention by using a web-based form. We used a logistic regression model to calculate the odds ratios (ORs) of the intervention group to achieve to reduce pain scores with those of the control group, and the ORs of the subjective assessment of the improvement of the symptoms compared to the intervention and control groups, which were performed using Stata software (version 16, StataCorp LLC). Results: We analyzed 48 participants in the intervention group and 46 participants in the control group. The adherence rate was 92% (44/48) during the intervention. The participants in the intervention group showed significant improvements in the severity of the neck/shoulder pain/stiffness and low back pain compared to those in the control group (OR 6.36, 95% CI 2.57-15.73; P<.001). Based on the subjective assessment of the improvement of the pain/stiffness at 12 weeks, 36 (75%) out of 48 participants in the intervention group and 3 (7%) out of 46 participants in the control group showed improvements (improved, slightly improved) (OR 43.00, 95% CI 11.25-164.28; P<.001). Conclusions: This study shows that the short exercises provided by the AI-assisted health program improved both neck/shoulder pain/stiffness and low back pain in 12 weeks. Further studies are needed to identify the elements contributing to the successful outcome of the AI-assisted health program. Trial Registration: University hospital Medical Information Network-Clinical Trials Registry (UMIN-CTR) 000033894; https://upload.umin.ac.jp/cgi-open-bin/ctr_e/ctr_view.cgi?recptno=R000038307. %M 34559054 %R 10.2196/27535 %U https://mhealth.jmir.org/2021/9/e27535 %U https://doi.org/10.2196/27535 %U http://www.ncbi.nlm.nih.gov/pubmed/34559054 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 9 %P e27283 %T Chloe for COVID-19: Evolution of an Intelligent Conversational Agent to Address Infodemic Management Needs During the COVID-19 Pandemic %A Siedlikowski,Sophia %A Noël,Louis-Philippe %A Moynihan,Stephanie Anne %A Robin,Marc %+ Dialogue Health Technologies Inc, 390 Rue Notre-Dame Ouest #200, Montreal, QC, H2Y 1T9, Canada, 1 613 806 0671, marc.robin@dialogue.co %K chatbot %K COVID-19 %K conversational agents %K public health %K artificial intelligence %K infodemic %K infodemiology %K misinformation %K digital health %K virtual care %D 2021 %7 21.9.2021 %9 Viewpoint %J J Med Internet Res %G English %X There is an unprecedented demand for infodemic management due to rapidly evolving information about the novel COVID-19 pandemic. This viewpoint paper details the evolution of a Canadian digital information tool, Chloe for COVID-19, based on incremental leveraging of artificial intelligence techniques. By providing an accessible summary of Chloe’s development, we show how proactive cooperation between health, technology, and corporate sectors can lead to a rapidly scalable, safe, and secure virtual chatbot to assist public health efforts in keeping Canadians informed. We then highlight Chloe’s strengths, the challenges we faced during the development process, and future directions for the role of chatbots in infodemic management. The information presented here may guide future collaborative efforts in health technology in order to enhance access to accurate and timely health information to the public. %M 34375299 %R 10.2196/27283 %U https://www.jmir.org/2021/9/e27283 %U https://doi.org/10.2196/27283 %U http://www.ncbi.nlm.nih.gov/pubmed/34375299 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 9 %P e29678 %T A Fully Automated Analytic System for Measuring Endolymphatic Hydrops Ratios in Patients With Ménière Disease via Magnetic Resonance Imaging: Deep Learning Model Development Study %A Park,Chae Jung %A Cho,Young Sang %A Chung,Myung Jin %A Kim,Yi-Kyung %A Kim,Hyung-Jin %A Kim,Kyunga %A Ko,Jae-Wook %A Chung,Won-Ho %A Cho,Baek Hwan %+ Department of Medical Device Management and Research, Samsung Advanced Institute for Health Sciences & Technology, Sungkyunkwan University, 81 Irwon-ro, Gangnam-gu, Seoul, 06355, Republic of Korea, 82 234100885, baekhwan.cho@samsung.com %K deep learning %K magnetic resonance imaging %K medical image segmentation %K Ménière disease %K inner ear %K endolymphatic hydrops %K artificial intelligence %K machine learning %K multi-class segmentation %K convolutional neural network %K end-to-end system %K clinician support %K clinical decision support system %K image selection %K clinical usability %K automation %D 2021 %7 21.9.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Recently, the analysis of endolymphatic hydropses (EHs) via inner ear magnetic resonance imaging (MRI) for patients with Ménière disease has been attempted in various studies. In addition, artificial intelligence has rapidly been incorporated into the medical field. In our previous studies, an automated algorithm for EH analysis was developed by using a convolutional neural network. However, several limitations existed, and further studies were conducted to compensate for these limitations. Objective: The aim of this study is to develop a fully automated analytic system for measuring EH ratios that enhances EH analysis accuracy and clinical usability when studying Ménière disease via MRI. Methods: We proposed the 3into3Inception and 3intoUNet networks. Their network architectures were based on those of the Inception-v3 and U-Net networks, respectively. The developed networks were trained for inner ear segmentation by using the magnetic resonance images of 124 people and were embedded in a new, automated EH analysis system—inner-ear hydrops estimation via artificial intelligence (INHEARIT)-version 2 (INHEARIT-v2). After fivefold cross-validation, an additional test was performed by using 60 new, unseen magnetic resonance images to evaluate the performance of our system. The INHEARIT-v2 system has a new function that automatically selects representative images from a full MRI stack. Results: The average segmentation performance of the fivefold cross-validation was measured via the intersection of union method, resulting in performance values of 0.743 (SD 0.030) for the 3into3Inception network and 0.811 (SD 0.032) for the 3intoUNet network. The representative magnetic resonance slices (ie, from a data set of unseen magnetic resonance images) that were automatically selected by the INHEARIT-v2 system only differed from a maximum of 2 expert-selected slices. After comparing the ratios calculated by experienced physicians and those calculated by the INHEARIT-v2 system, we found that the average intraclass correlation coefficient for all cases was 0.941; the average intraclass correlation coefficient of the vestibules was 0.968, and that of the cochleae was 0.914. The time required for the fully automated system to accurately analyze EH ratios based on a patient's MRI stack was approximately 3.5 seconds. Conclusions: In this study, a fully automated full-stack magnetic resonance analysis system for measuring EH ratios was developed (named INHEARIT-v2), and the results showed that there was a high correlation between the expert-calculated EH ratio values and those calculated by the INHEARIT-v2 system. The system is an upgraded version of the INHEARIT system; it has higher segmentation performance and automatically selects representative images from an MRI stack. The new model can help clinicians by providing objective analysis results and reducing the workload for interpreting magnetic resonance images. %M 34546181 %R 10.2196/29678 %U https://www.jmir.org/2021/9/e29678 %U https://doi.org/10.2196/29678 %U http://www.ncbi.nlm.nih.gov/pubmed/34546181 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 9 %P e27414 %T Accuracy of Using Generative Adversarial Networks for Glaucoma Detection: Systematic Review and Bibliometric Analysis %A Saeed,Ali Q %A Sheikh Abdullah,Siti Norul Huda %A Che-Hamzah,Jemaima %A Abdul Ghani,Ahmad Tarmizi %+ Center for Cyber Security, Faculty of Information Science & Technology, Universiti Kebangsaan Malaysia, Bangi Street, Bangi, Selangor, 43600, Malaysia, 60 7740870504, ali.qasim@ntu.edu.iq %K glaucoma %K generative adversarial network %K deep learning %K systematic literature review %K retinal disease %K blood vessels %K optic disc %D 2021 %7 21.9.2021 %9 Review %J J Med Internet Res %G English %X Background: Glaucoma leads to irreversible blindness. Globally, it is the second most common retinal disease that leads to blindness, slightly less common than cataracts. Therefore, there is a great need to avoid the silent growth of this disease using recently developed generative adversarial networks (GANs). Objective: This paper aims to introduce a GAN technology for the diagnosis of eye disorders, particularly glaucoma. This paper illustrates deep adversarial learning as a potential diagnostic tool and the challenges involved in its implementation. This study describes and analyzes many of the pitfalls and problems that researchers will need to overcome to implement this kind of technology. Methods: To organize this review comprehensively, articles and reviews were collected using the following keywords: (“Glaucoma,” “optic disc,” “blood vessels”) and (“receptive field,” “loss function,” “GAN,” “Generative Adversarial Network,” “Deep learning,” “CNN,” “convolutional neural network” OR encoder). The records were identified from 5 highly reputed databases: IEEE Xplore, Web of Science, Scopus, ScienceDirect, and PubMed. These libraries broadly cover the technical and medical literature. Publications within the last 5 years, specifically 2015-2020, were included because the target GAN technique was invented only in 2014 and the publishing date of the collected papers was not earlier than 2016. Duplicate records were removed, and irrelevant titles and abstracts were excluded. In addition, we excluded papers that used optical coherence tomography and visual field images, except for those with 2D images. A large-scale systematic analysis was performed, and then a summarized taxonomy was generated. Furthermore, the results of the collected articles were summarized and a visual representation of the results was presented on a T-shaped matrix diagram. This study was conducted between March 2020 and November 2020. Results: We found 59 articles after conducting a comprehensive survey of the literature. Among the 59 articles, 30 present actual attempts to synthesize images and provide accurate segmentation/classification using single/multiple landmarks or share certain experiences. The other 29 articles discuss the recent advances in GANs, do practical experiments, and contain analytical studies of retinal disease. Conclusions: Recent deep learning techniques, namely GANs, have shown encouraging performance in retinal disease detection. Although this methodology involves an extensive computing budget and optimization process, it saturates the greedy nature of deep learning techniques by synthesizing images and solves major medical issues. This paper contributes to this research field by offering a thorough analysis of existing works, highlighting current limitations, and suggesting alternatives to support other researchers and participants in further improving and strengthening future work. Finally, new directions for this research have been identified. %M 34236992 %R 10.2196/27414 %U https://www.jmir.org/2021/9/e27414 %U https://doi.org/10.2196/27414 %U http://www.ncbi.nlm.nih.gov/pubmed/34236992 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 9 %P e26025 %T The Classification of Six Common Skin Diseases Based on Xiangya-Derm: Development of a Chinese Database for Artificial Intelligence %A Huang,Kai %A Jiang,Zixi %A Li,Yixin %A Wu,Zhe %A Wu,Xian %A Zhu,Wu %A Chen,Mingliang %A Zhang,Yu %A Zuo,Ke %A Li,Yi %A Yu,Nianzhou %A Liu,Siliang %A Huang,Xing %A Su,Juan %A Yin,Mingzhu %A Qian,Buyue %A Wang,Xianggui %A Chen,Xiang %A Zhao,Shuang %+ Department of Dermatology, Xiangya Hospital, Central South University, 87 Xiangya Road, Kaifu District, Changsha, China, 86 13808485224, shuangxy@csu.edu.cn %K artificial intelligence %K skin disease %K convolutional neural network %K medical image processing %K automatic auxiliary diagnoses %K dermatology %K skin %K classification %K China %D 2021 %7 21.9.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Skin and subcutaneous disease is the fourth-leading cause of the nonfatal disease burden worldwide and constitutes one of the most common burdens in primary care. However, there is a severe lack of dermatologists, particularly in rural Chinese areas. Furthermore, although artificial intelligence (AI) tools can assist in diagnosing skin disorders from images, the database for the Chinese population is limited. Objective: This study aims to establish a database for AI based on the Chinese population and presents an initial study on six common skin diseases. Methods: Each image was captured with either a digital camera or a smartphone, verified by at least three experienced dermatologists and corresponding pathology information, and finally added to the Xiangya-Derm database. Based on this database, we conducted AI-assisted classification research on six common skin diseases and then proposed a network called Xy-SkinNet. Xy-SkinNet applies a two-step strategy to identify skin diseases. First, given an input image, we segmented the regions of the skin lesion. Second, we introduced an information fusion block to combine the output of all segmented regions. We compared the performance with 31 dermatologists of varied experiences. Results: Xiangya-Derm, as a new database that consists of over 150,000 clinical images of 571 different skin diseases in the Chinese population, is the largest and most diverse dermatological data set of the Chinese population. The AI-based six-category classification achieved a top 3 accuracy of 84.77%, which exceeded the average accuracy of dermatologists (78.15%). Conclusions: Xiangya-Derm, the largest database for the Chinese population, was created. The classification of six common skin conditions was conducted based on Xiangya-Derm to lay a foundation for product research. %M 34546174 %R 10.2196/26025 %U https://www.jmir.org/2021/9/e26025 %U https://doi.org/10.2196/26025 %U http://www.ncbi.nlm.nih.gov/pubmed/34546174 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 9 %P e30223 %T Automatic Classification of Thyroid Findings Using Static and Contextualized Ensemble Natural Language Processing Systems: Development Study %A Shin,Dongyup %A Kam,Hye Jin %A Jeon,Min-Seok %A Kim,Ha Young %+ Graduate School of Information, Yonsei University, New millennium hall 420, Yonsei-ro 50, Seodaemun-gu, Seoul, 03722, Republic of Korea, 82 10 4094 2392, hayoung.kim@yonsei.ac.kr %K deep learning %K natural language processing %K word embedding %K convolution neural network %K long short-term memory %K transformer %K ensemble %K thyroid %K electronic medical records %D 2021 %7 21.9.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: In the case of Korean institutions and enterprises that collect nonstandardized and nonunified formats of electronic medical examination results from multiple medical institutions, a group of experienced nurses who can understand the results and related contexts initially classified the reports manually. The classification guidelines were established by years of workers’ clinical experiences and there were attempts to automate the classification work. However, there have been problems in which rule-based algorithms or human labor–intensive efforts can be time-consuming or limited owing to high potential errors. We investigated natural language processing (NLP) architectures and proposed ensemble models to create automated classifiers. Objective: This study aimed to develop practical deep learning models with electronic medical records from 284 health care institutions and open-source corpus data sets for automatically classifying 3 thyroid conditions: healthy, caution required, and critical. The primary goal is to increase the overall accuracy of the classification, yet there are practical and industrial needs to correctly predict healthy (negative) thyroid condition data, which are mostly medical examination results, and minimize false-negative rates under the prediction of healthy thyroid conditions. Methods: The data sets included thyroid and comprehensive medical examination reports. The textual data are not only documented in fully complete sentences but also written in lists of words or phrases. Therefore, we propose static and contextualized ensemble NLP network (SCENT) systems to successfully reflect static and contextual information and handle incomplete sentences. We prepared each convolution neural network (CNN)-, long short-term memory (LSTM)-, and efficiently learning an encoder that classifies token replacements accurately (ELECTRA)-based ensemble model by training or fine-tuning them multiple times. Through comprehensive experiments, we propose 2 versions of ensemble models, SCENT-v1 and SCENT-v2, with the single-architecture–based CNN, LSTM, and ELECTRA ensemble models for the best classification performance and practical use, respectively. SCENT-v1 is an ensemble of CNN and ELECTRA ensemble models, and SCENT-v2 is a hierarchical ensemble of CNN, LSTM, and ELECTRA ensemble models. SCENT-v2 first classifies the 3 labels using an ELECTRA ensemble model and then reclassifies them using an ensemble model of CNN and LSTM if the ELECTRA ensemble model predicted them as “healthy” labels. Results: SCENT-v1 outperformed all the suggested models, with the highest F1 score (92.56%). SCENT-v2 had the second-highest recall value (94.44%) and the fewest misclassifications for caution-required thyroid condition while maintaining 0 classification error for the critical thyroid condition under the prediction of the healthy thyroid condition. Conclusions: The proposed SCENT demonstrates good classification performance despite the unique characteristics of the Korean language and problems of data lack and imbalance, especially for the extremely low amount of critical condition data. The result of SCENT-v1 indicates that different perspectives of static and contextual input token representations can enhance classification performance. SCENT-v2 has a strong impact on the prediction of healthy thyroid conditions. %M 34546183 %R 10.2196/30223 %U https://medinform.jmir.org/2021/9/e30223 %U https://doi.org/10.2196/30223 %U http://www.ncbi.nlm.nih.gov/pubmed/34546183 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 9 %P e30770 %T Prediction of Critical Care Outcome for Adult Patients Presenting to Emergency Department Using Initial Triage Information: An XGBoost Algorithm Analysis %A Yun,Hyoungju %A Choi,Jinwook %A Park,Jeong Ho %+ Institute of Medical and Biological Engineering, Medical Research Center, Seoul National University, 103 Daehak-Ro, Jongno-Gu, Seoul, 03080, Republic of Korea, 82 2 2072 3421, jinchoi@snu.ac.kr %K triage %K critical care %K prediction %K XGBoost %K explainable machine learning %K interpretable artificial intelligence %K machine learning %K algorithm %K prediction %K outcome %K emergency %K triage %K classify %K prioritize %K risk %K model %D 2021 %7 20.9.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: The emergency department (ED) triage system to classify and prioritize patients from high risk to less urgent continues to be a challenge. Objective: This study, comprising 80,433 patients, aims to develop a machine learning algorithm prediction model of critical care outcomes for adult patients using information collected during ED triage and compare the performance with that of the baseline model using the Korean Triage and Acuity Scale (KTAS). Methods: To predict the need for critical care, we used 13 predictors from triage information: age, gender, mode of ED arrival, the time interval between onset and ED arrival, reason of ED visit, chief complaints, systolic blood pressure, diastolic blood pressure, pulse rate, respiratory rate, body temperature, oxygen saturation, and level of consciousness. The baseline model with KTAS was developed using logistic regression, and the machine learning model with 13 variables was generated using extreme gradient boosting (XGB) and deep neural network (DNN) algorithms. The discrimination was measured by the area under the receiver operating characteristic (AUROC) curve. The ability of calibration with Hosmer–Lemeshow test and reclassification with net reclassification index were evaluated. The calibration plot and partial dependence plot were used in the analysis. Results: The AUROC of the model with the full set of variables (0.833-0.861) was better than that of the baseline model (0.796). The XGB model of AUROC 0.861 (95% CI 0.848-0.874) showed a higher discriminative performance than the DNN model of 0.833 (95% CI 0.819-0.848). The XGB and DNN models proved better reclassification than the baseline model with a positive net reclassification index. The XGB models were well-calibrated (Hosmer-Lemeshow test; P>.05); however, the DNN showed poor calibration power (Hosmer-Lemeshow test; P<.001). We further interpreted the nonlinear association between variables and critical care prediction. Conclusions: Our study demonstrated that the performance of the XGB model using initial information at ED triage for predicting patients in need of critical care outperformed the conventional model with KTAS. %M 34346889 %R 10.2196/30770 %U https://medinform.jmir.org/2021/9/e30770 %U https://doi.org/10.2196/30770 %U http://www.ncbi.nlm.nih.gov/pubmed/34346889 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 10 %N 9 %P e27799 %T Health Equity in Artificial Intelligence and Primary Care Research: Protocol for a Scoping Review %A Wang,Jonathan Xin %A Somani,Sulaiman %A Chen,Jonathan H %A Murray,Sara %A Sarkar,Urmimala %+ Center for Vulnerable Populations at San Francisco General Hospital, University of California San Francisco, 2789 25th St, Suite 350, San Francisco, CA, United States, 1 651 285 3335, jonxwang@alumni.stanford.edu %K artificial intelligence %K health information technology %K health informatics %K electronic health records %K big data %K data mining %K primary care %K family medicine %K decision support %K diagnosis %K treatment %K scoping review %K health equity %K health disparity %D 2021 %7 17.9.2021 %9 Protocol %J JMIR Res Protoc %G English %X Background: Though artificial intelligence (AI) has the potential to augment the patient-physician relationship in primary care, bias in intelligent health care systems has the potential to differentially impact vulnerable patient populations. Objective: The purpose of this scoping review is to summarize the extent to which AI systems in primary care examine the inherent bias toward or against vulnerable populations and appraise how these systems have mitigated the impact of such biases during their development. Methods: We will conduct a search update from an existing scoping review to identify studies on AI and primary care in the following databases: Medline-OVID, Embase, CINAHL, Cochrane Library, Web of Science, Scopus, IEEE Xplore, ACM Digital Library, MathSciNet, AAAI, and arXiv. Two screeners will independently review all abstracts, titles, and full-text articles. The team will extract data using a structured data extraction form and synthesize the results in accordance with PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. Results: This review will provide an assessment of the current state of health care equity within AI for primary care. Specifically, we will identify the degree to which vulnerable patients have been included, assess how bias is interpreted and documented, and understand the extent to which harmful biases are addressed. As of October 2020, the scoping review is in the title- and abstract-screening stage. The results are expected to be submitted for publication in fall 2021. Conclusions: AI applications in primary care are becoming an increasingly common tool in health care delivery and in preventative care efforts for underserved populations. This scoping review would potentially show the extent to which studies on AI in primary care employ a health equity lens and take steps to mitigate bias. International Registered Report Identifier (IRRID): PRR1-10.2196/27799 %M 34533458 %R 10.2196/27799 %U https://www.researchprotocols.org/2021/9/e27799 %U https://doi.org/10.2196/27799 %U http://www.ncbi.nlm.nih.gov/pubmed/34533458 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 10 %N 9 %P e26680 %T Conversational Agents for Health and Well-being Across the Life Course: Protocol for an Evidence Map %A Guerreiro,Mara Pereira %A Angelini,Leonardo %A Rafael Henriques,Helga %A El Kamali,Mira %A Baixinho,Cristina %A Balsa,João %A Félix,Isa Brito %A Khaled,Omar Abou %A Carmo,Maria Beatriz %A Cláudio,Ana Paula %A Caon,Maurizio %A Daher,Karl %A Alexandre,Bruno %A Padinha,Mafalda %A Mugellini,Elena %+ Nursing Research, Innovation and Development Centre of Lisbon, Nursing School of Lisbon, Avenida Professor Egas Moniz, Lisbon, Portugal, 351 217913400 ext 23507, mara.guerreiro@esel.pt %K artificial intelligence %K conversational agent %K chatbot %K virtual assistant %K relational agent %K virtual humans %K e-coach %K intervention %K health %K well-being %D 2021 %7 17.9.2021 %9 Protocol %J JMIR Res Protoc %G English %X Background: Conversational agents, which we defined as computer programs that are designed to simulate two-way human conversation by using language and are potentially supplemented with nonlanguage modalities, offer promising avenues for health interventions for different populations across the life course. There is a lack of open-access and user-friendly resources for identifying research trends and gaps and pinpointing expertise across international centers. Objective: Our aim is to provide an overview of all relevant evidence on conversational agents for health and well-being across the life course. Specifically, our objectives are to identify, categorize, and synthesize—through visual formats and a searchable database—primary studies and reviews in this research field. Methods: An evidence map was selected as the type of literature review to be conducted, as it optimally corresponded to our aim. We systematically searched 8 databases (MEDLINE; CINAHL; Web of Science; Scopus; the Cochrane, ACM, IEEE, and Joanna Briggs Institute databases; and Google Scholar). We will perform backward citation searching on all included studies. The first stage of a double-stage screening procedure, which was based on abstracts and titles only, was conducted by using predetermined eligibility criteria for primary studies and reviews. An operational screening procedure was developed for streamlined and consistent screening across the team. Double data extraction will be performed with previously piloted data collection forms. We will appraise systematic reviews by using A Measurement Tool to Assess Systematic Reviews (AMSTAR) 2. Primary studies and reviews will be assessed separately in the analysis. Data will be synthesized through descriptive statistics, bivariate statistics, and subgroup analysis (if appropriate) and through high-level maps such as scatter and bubble charts. The development of the searchable database will be informed by the research questions and data extraction forms. Results: As of April 2021, the literature search in the eight databases was concluded, yielding a total of 16,351 records. The first stage of screening, which was based on abstracts and titles only, resulted in the selection of 1282 records of primary studies and 151 records of reviews. These will be subjected to second-stage screening. A glossary with operational definitions for supporting the study selection and data extraction stages was drafted. The anticipated completion date is October 2021. Conclusions: Our wider definition of a conversational agent and the broad scope of our evidence map will explicate trends and gaps in this field of research. Additionally, our evidence map and searchable database of studies will help researchers to avoid fragmented research efforts and wasteful redundancies. Finally, as part of the Harnessing the Power of Conversational e-Coaches for Health and Well-being Through Swiss-Portuguese Collaboration project, our work will also inform the development of an international taxonomy on conversational agents for health and well-being, thereby contributing to terminology standardization and categorization. International Registered Report Identifier (IRRID): DERR1-10.2196/26680 %M 34533460 %R 10.2196/26680 %U https://www.researchprotocols.org/2021/9/e26680 %U https://doi.org/10.2196/26680 %U http://www.ncbi.nlm.nih.gov/pubmed/34533460 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 5 %N 9 %P e28028 %T Semisupervised Deep Learning Techniques for Predicting Acute Respiratory Distress Syndrome From Time-Series Clinical Data: Model Development and Validation Study %A Lam,Carson %A Tso,Chak Foon %A Green-Saxena,Abigail %A Pellegrini,Emily %A Iqbal,Zohora %A Evans,Daniel %A Hoffman,Jana %A Calvert,Jacob %A Mao,Qingqing %A Das,Ritankar %+ Dascena, Inc, Suite B, Private Mailbox 65148, 12333 Sowden Rd, Houston, TX, 77080, United States, 1 7149326188, clam@dascena.com %K acute respiratory distress syndrome %K COVID-19 %K semisupervised learning %K deep learning %K machine learning %K algorithm %K prediction %K decision support %D 2021 %7 14.9.2021 %9 Original Paper %J JMIR Form Res %G English %X Background: A high number of patients who are hospitalized with COVID-19 develop acute respiratory distress syndrome (ARDS). Objective: In response to the need for clinical decision support tools to help manage the next pandemic during the early stages (ie, when limited labeled data are present), we developed machine learning algorithms that use semisupervised learning (SSL) techniques to predict ARDS development in general and COVID-19 populations based on limited labeled data. Methods: SSL techniques were applied to 29,127 encounters with patients who were admitted to 7 US hospitals from May 1, 2019, to May 1, 2021. A recurrent neural network that used a time series of electronic health record data was applied to data that were collected when a patient’s peripheral oxygen saturation level fell below the normal range (<97%) to predict the subsequent development of ARDS during the remaining duration of patients’ hospital stay. Model performance was assessed with the area under the receiver operating characteristic curve and area under the precision recall curve of an external hold-out test set. Results: For the whole data set, the median time between the first peripheral oxygen saturation measurement of <97% and subsequent respiratory failure was 21 hours. The area under the receiver operating characteristic curve for predicting subsequent ARDS development was 0.73 when the model was trained on a labeled data set of 6930 patients, 0.78 when the model was trained on the labeled data set that had been augmented with the unlabeled data set of 16,173 patients by using SSL techniques, and 0.84 when the model was trained on the entire training set of 23,103 labeled patients. Conclusions: In the context of using time-series inpatient data and a careful model training design, unlabeled data can be used to improve the performance of machine learning models when labeled data for predicting ARDS development are scarce or expensive. %M 34398784 %R 10.2196/28028 %U https://formative.jmir.org/2021/9/e28028 %U https://doi.org/10.2196/28028 %U http://www.ncbi.nlm.nih.gov/pubmed/34398784 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 9 %P e27798 %T Predicting the Mortality and Readmission of In-Hospital Cardiac Arrest Patients With Electronic Health Records: A Machine Learning Approach %A Chi,Chien-Yu %A Ao,Shuang %A Winkler,Adrian %A Fu,Kuan-Chun %A Xu,Jie %A Ho,Yi-Lwun %A Huang,Chien-Hua %A Soltani,Rohollah %+ Department of Emergency Medicine, National Taiwan University Hospital, #7 Chung-Shan South Road, Taipei, 100, Taiwan, 886 0972651304, chhuang5940@ntu.edu.tw %K in-hospital cardiac arrest %K 30-day mortality %K 30-day readmission %K machine learning %K imbalanced dataset %D 2021 %7 13.9.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: In-hospital cardiac arrest (IHCA) is associated with high mortality and health care costs in the recovery phase. Predicting adverse outcome events, including readmission, improves the chance for appropriate interventions and reduces health care costs. However, studies related to the early prediction of adverse events of IHCA survivors are rare. Therefore, we used a deep learning model for prediction in this study. Objective: This study aimed to demonstrate that with the proper data set and learning strategies, we can predict the 30-day mortality and readmission of IHCA survivors based on their historical claims. Methods: National Health Insurance Research Database claims data, including 168,693 patients who had experienced IHCA at least once and 1,569,478 clinical records, were obtained to generate a data set for outcome prediction. We predicted the 30-day mortality/readmission after each current record (ALL-mortality/ALL-readmission) and 30-day mortality/readmission after IHCA (cardiac arrest [CA]-mortality/CA-readmission). We developed a hierarchical vectorizer (HVec) deep learning model to extract patients’ information and predict mortality and readmission. To embed the textual medical concepts of the clinical records into our deep learning model, we used Text2Node to compute the distributed representations of all medical concept codes as a 128-dimensional vector. Along with the patient’s demographic information, our novel HVec model generated embedding vectors to hierarchically describe the health status at the record-level and patient-level. Multitask learning involving two main tasks and auxiliary tasks was proposed. As CA-mortality and CA-readmission were rare, person upsampling of patients with CA and weighting of CA records were used to improve prediction performance. Results: With the multitask learning setting in the model learning process, we achieved an area under the receiver operating characteristic of 0.752 for CA-mortality, 0.711 for ALL-mortality, 0.852 for CA-readmission, and 0.889 for ALL-readmission. The area under the receiver operating characteristic was improved to 0.808 for CA-mortality and 0.862 for CA-readmission after solving the extremely imbalanced issue for CA-mortality/CA-readmission by upsampling and weighting. Conclusions: This study demonstrated the potential of predicting future outcomes for IHCA survivors by machine learning. The results showed that our proposed approach could effectively alleviate data imbalance problems and train a better model for outcome prediction. %M 34515639 %R 10.2196/27798 %U https://www.jmir.org/2021/9/e27798 %U https://doi.org/10.2196/27798 %U http://www.ncbi.nlm.nih.gov/pubmed/34515639 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 9 %P e31129 %T Automated Detection of Acute Myocardial Infarction Using Asynchronous Electrocardiogram Signals—Preview of Implementing Artificial Intelligence With Multichannel Electrocardiographs Obtained From Smartwatches: Retrospective Study %A Han,Changho %A Song,Youngjae %A Lim,Hong-Seok %A Tae,Yunwon %A Jang,Jong-Hwan %A Lee,Byeong Tak %A Lee,Yeha %A Bae,Woong %A Yoon,Dukyong %+ Department of Biomedical Systems Informatics, Yonsei University College of Medicine, 363, Dongbaekjukjeon-daero, Giheung-gu, Yongin, 16995, Republic of Korea, 82 3151898450, dukyong.yoon@yonsei.ac.kr %K wearables %K smartwatches %K asynchronous electrocardiogram %K artificial intelligence %K deep learning %K automatic diagnosis %K myocardial infarction %K timely diagnosis %K machine learning %K digital health %K cardiac health %K cardiology %D 2021 %7 10.9.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: When using a smartwatch to obtain electrocardiogram (ECG) signals from multiple leads, the device has to be placed on different parts of the body sequentially. The ECG signals measured from different leads are asynchronous. Artificial intelligence (AI) models for asynchronous ECG signals have barely been explored. Objective: We aimed to develop an AI model for detecting acute myocardial infarction using asynchronous ECGs and compare its performance with that of the automatic ECG interpretations provided by a commercial ECG analysis software. We sought to evaluate the feasibility of implementing multiple lead–based AI-enabled ECG algorithms on smartwatches. Moreover, we aimed to determine the optimal number of leads for sufficient diagnostic power. Methods: We extracted ECGs recorded within 24 hours from each visit to the emergency room of Ajou University Medical Center between June 1994 and January 2018 from patients aged 20 years or older. The ECGs were labeled on the basis of whether a diagnostic code corresponding to acute myocardial infarction was entered. We derived asynchronous ECG lead sets from standard 12-lead ECG reports and simulated a situation similar to the sequential recording of ECG leads via smartwatches. We constructed an AI model based on residual networks and self-attention mechanisms by randomly masking each lead channel during the training phase and then testing the model using various targeting lead sets with the remaining lead channels masked. Results: The performance of lead sets with 3 or more leads compared favorably with that of the automatic ECG interpretations provided by a commercial ECG analysis software, with 8.1%-13.9% gain in sensitivity when the specificity was matched. Our results indicate that multiple lead-based AI-enabled ECG algorithms can be implemented on smartwatches. Model performance generally increased as the number of leads increased (12-lead sets: area under the receiver operating characteristic curve [AUROC] 0.880; 4-lead sets: AUROC 0.858, SD 0.008; 3-lead sets: AUROC 0.845, SD 0.011; 2-lead sets: AUROC 0.813, SD 0.018; single-lead sets: AUROC 0.768, SD 0.001). Considering the short amount of time needed to measure additional leads, measuring at least 3 leads—ideally more than 4 leads—is necessary for minimizing the risk of failing to detect acute myocardial infarction occurring in a certain spatial location or direction. Conclusions: By developing an AI model for detecting acute myocardial infarction with asynchronous ECG lead sets, we demonstrated the feasibility of multiple lead-based AI-enabled ECG algorithms on smartwatches for automated diagnosis of cardiac disorders. We also demonstrated the necessity of measuring at least 3 leads for accurate detection. Our results can be used as reference for the development of other AI models using sequentially measured asynchronous ECG leads via smartwatches for detecting various cardiac disorders. %M 34505839 %R 10.2196/31129 %U https://www.jmir.org/2021/9/e31129 %U https://doi.org/10.2196/31129 %U http://www.ncbi.nlm.nih.gov/pubmed/34505839 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 9 %P e26231 %T Understanding Pediatric Surgery Cancellation: Geospatial Analysis %A Liu,Lei %A Ni,Yizhao %A Beck,Andrew F %A Brokamp,Cole %A Ramphul,Ryan C %A Highfield,Linda D %A Kanjia,Megha Karkera %A Pratap,J “Nick” %+ Department of Anesthesia, Cincinnati Children's Hospital Medical Center, MLC 2001, 3333 Burnet Avenue, Cincinnati, OH, 45229-3039, United States, 1 513 636 4408, jnpratap@pratap.co.uk %K surgery cancellation %K socioeconomic factors %K spatial regression models %K machine learning %D 2021 %7 10.9.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Day-of-surgery cancellation (DoSC) represents a substantial wastage of hospital resources and can cause significant inconvenience to patients and families. Cancellation is reported to impact between 2% and 20% of the 50 million procedures performed annually in American hospitals. Up to 85% of cancellations may be amenable to the modification of patients’ and families’ behaviors. However, the factors underlying DoSC and the barriers experienced by families are not well understood. Objective: This study aims to conduct a geospatial analysis of patient-specific variables from electronic health records (EHRs) of Cincinnati Children’s Hospital Medical Center (CCHMC) and of Texas Children’s Hospital (TCH), as well as linked socioeconomic factors measured at the census tract level, to understand potential underlying contributors to disparities in DoSC rates across neighborhoods. Methods: The study population included pediatric patients who underwent scheduled surgeries at CCHMC and TCH. A 5-year data set was extracted from the CCHMC EHR, and addresses were geocoded. An equivalent set of data >5.7 years was extracted from the TCH EHR. Case-based data related to patients’ health care use were aggregated at the census tract level. Community-level variables were extracted from the American Community Survey as surrogates for patients’ socioeconomic and minority status as well as markers of the surrounding context. Leveraging the selected variables, we built spatial models to understand the variation in DoSC rates across census tracts. The findings were compared to those of the nonspatial regression and deep learning models. Model performance was evaluated from the root mean squared error (RMSE) using nested 10-fold cross-validation. Feature importance was evaluated by computing the increment of the RMSE when a single variable was shuffled within the data set. Results: Data collection yielded sets of 463 census tracts at CCHMC (DoSC rates 1.2%-12.5%) and 1024 census tracts at TCH (DoSC rates 3%-12.2%). For CCHMC, an L2-normalized generalized linear regression model achieved the best performance in predicting all-cause DoSC rate (RMSE 1.299%, 95% CI 1.21%-1.387%); however, its improvement over others was marginal. For TCH, an L2-normalized generalized linear regression model also performed best (RMSE 1.305%, 95% CI 1.257%-1.352%). All-cause DoSC rate at CCHMC was predicted most strongly by previous no show. As for community-level data, the proportion of African American inhabitants per census tract was consistently an important predictor. In the Texas area, the proportion of overcrowded households was salient to DoSC rate. Conclusions: Our findings suggest that geospatial analysis offers potential for use in targeting interventions for census tracts at a higher risk of cancellation. Our study also demonstrates the importance of home location, socioeconomic disadvantage, and racial minority status on the DoSC of children’s surgery. The success of future efforts to reduce cancellation may benefit from taking social, economic, and cultural issues into account. %M 34505837 %R 10.2196/26231 %U https://www.jmir.org/2021/9/e26231 %U https://doi.org/10.2196/26231 %U http://www.ncbi.nlm.nih.gov/pubmed/34505837 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 9 %P e24295 %T Data Empowerment of Decision-Makers in an Era of a Pandemic: Intersection of “Classic” and Artificial Intelligence in the Service of Medicine %A Geva,Gil A %A Ketko,Itay %A Nitecki,Maya %A Simon,Shoham %A Inbar,Barr %A Toledo,Itay %A Shapiro,Michael %A Vaturi,Barak %A Votta,Yoni %A Filler,Daniel %A Yosef,Roey %A Shpitzer,Sagi A %A Hir,Nabil %A Peri Markovich,Michal %A Shapira,Shachar %A Fink,Noam %A Glasberg,Elon %A Furer,Ariel %+ Medical Corps, Israel Defense Force, IDF Medical Corps Headquarters, Tel HaShomer, Ramat Gan, 02149, Israel, 972 529277372, furera@gmail.com %K COVID-19 %K medical informatics %K decision-making %K pandemic %K data %K policy %K validation %K accuracy %K data analysis %D 2021 %7 10.9.2021 %9 Viewpoint %J J Med Internet Res %G English %X Background: The COVID-19 outbreak required prompt action by health authorities around the world in response to a novel threat. With enormous amounts of information originating in sources with uncertain degree of validation and accuracy, it is essential to provide executive-level decision-makers with the most actionable, pertinent, and updated data analysis to enable them to adapt their strategy swiftly and competently. Objective: We report here the origination of a COVID-19 dedicated response in the Israel Defense Forces with the assembly of an operational Data Center for the Campaign against Coronavirus. Methods: Spearheaded by directors with clinical, operational, and data analytics orientation, a multidisciplinary team utilized existing and newly developed platforms to collect and analyze large amounts of information on an individual level in the context of SARS-CoV-2 contraction and infection. Results: Nearly 300,000 responses to daily questionnaires were recorded and were merged with other data sets to form a unified data lake. By using basic as well as advanced analytic tools ranging from simple aggregation and display of trends to data science application, we provided commanders and clinicians with access to trusted, accurate, and personalized information and tools that were designed to foster operational changes and mitigate the propagation of the pandemic. The developed tools aided in the in the identification of high-risk individuals for severe disease and resulted in a 30% decline in their attendance to their units. Moreover, the queue for laboratory examination for COVID-19 was optimized using a predictive model and resulted in a high true-positive rate of 20%, which is more than twice as high as the baseline rate (2.28%, 95% CI 1.63%-3.19%). Conclusions: In times of ambiguity and uncertainty, along with an unprecedented flux of information, health organizations may find multidisciplinary teams working to provide intelligence from diverse and rich data a key factor in providing executives relevant and actionable support for decision-making. %M 34313589 %R 10.2196/24295 %U https://www.jmir.org/2021/9/e24295 %U https://doi.org/10.2196/24295 %U http://www.ncbi.nlm.nih.gov/pubmed/34313589 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 10 %N 9 %P e31695 %T Telehealth Behavioral Intervention for Diabetes Management in Adults With Physical Disabilities: Intervention Fidelity Protocol for a Randomized Controlled Trial %A Zengul,Ayse %A Evans,Eric %A Hall,Allyson %A Qu,Haiyan %A Willig,Amanda %A Cherrington,Andrea %A Thirumalai,Mohanraj %+ Department of Health Services Administration, University of Alabama at Birmingham, 1716 9th Avenue South, Birmingham, AL, 35294, United States, 1 205 934 7189, mohanraj@uab.edu %K telehealth %K health coaching %K artificial intelligence %K diabetes mellitus %K intervention fidelity %K mobile phone %D 2021 %7 10.9.2021 %9 Protocol %J JMIR Res Protoc %G English %X Background: Diabetes mellitus is a major health problem among people with physical disabilities. Health coaching has been proven to be an effective approach in terms of behavioral changes, patient self-efficacy, adherence to treatment, health service use, and health outcomes. Telehealth systems combined with health coaching have the potential to improve the quality of health care by increasing access to services. Treatment fidelity is particularly important for behavior change studies; however, fidelity protocols are inadequately administered and reported in the literature. Objective: The aim of this study is to outline all the intervention fidelity strategies and procedures of a telecoaching intervention—artificial intelligence for diabetes management (AI4DM)—which is a randomized controlled trial to evaluate the feasibility, acceptability, and preliminary efficacy of a telehealth platform in adults with type 2 diabetes and permanent impaired mobility. AI4DM aims to create a web-based disability-inclusive diabetes self-management program. We selected the National Institutes of Health Behavior Change Consortium (NIH BCC) fidelity framework to describe strategies to ensure intervention fidelity in our research. Methods: We have developed fidelity strategies based on the five fidelity domains outlined by the NIH BCC—focusing on study design, provider training, treatment delivery, treatment receipt, and enactment of treatment skills. The design of the study is grounded in the social cognitive theory and is intended to ensure that both arms would receive the same amount of attention from the intervention. All providers will receive standardized training to deliver consistent health coaching to the participants. The intervention will be delivered through various controlling and monitoring strategies to reduce differences within and between treatment groups. The content and structure of the study are delivered to ensure comprehension and participation among individuals with low health literacy. By constantly reviewing and monitoring participant progress and protocol adherence, we intend to ensure that participants use cognitive and behavioral skills in real-world settings to engage in health behavior. Results: Enrollment for AI4DM will begin in October 2021 and end in October 2022. The results of this study will be reported in late 2022. Conclusions: Developing and using fidelity protocols in behavior change studies is essential to ensure the internal and external validity of interventions. This study incorporates NIH BCC recommendations into an artificial intelligence embedded telecoaching platform for diabetes management designed for people with physical disabilities. The developed fidelity protocol can provide guidance for other researchers conducting telehealth interventions within behavioral health settings to present more consistent and reproducible research. Trial Registration: ClinicalTrials.gov NCT04927377; http://clinicaltrials.gov/ct2/show/NCT04927377. International Registered Report Identifier (IRRID): PRR1-10.2196/31695 %M 34505835 %R 10.2196/31695 %U https://www.researchprotocols.org/2021/9/e31695 %U https://doi.org/10.2196/31695 %U http://www.ncbi.nlm.nih.gov/pubmed/34505835 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 10 %N 9 %P e31689 %T Disability-Inclusive Diabetes Self-management Telehealth Program: Protocol for a Pilot and Feasibility Study %A Evans,Eric %A Zengul,Ayse %A Hall,Allyson %A Qu,Haiyan %A Willig,Amanda %A Cherrington,Andrea %A Thirumalai,Mohanraj %+ Department of Health Services Administration, University of Alabama at Birmingham, 1716 9th Avenue South, Birmingham, AL, 35294, United States, 1 205 934 7189, mohanraj@uab.edu %K telehealth %K health coaching %K artificial intelligence %K diabetes mellitus %K mobile phone %D 2021 %7 10.9.2021 %9 Protocol %J JMIR Res Protoc %G English %X Background: Individuals with disabilities and type 2 diabetes require self-management programs that are accessible, sustainable, inclusive, and adaptable. Health coaching has been shown to be an effective approach for improving behavioral changes in self-management. Health coaching combined with telehealth technology has the potential to improve the overall quality of and access to health services. Objective: This protocol outlines the study design for implementing the Artificial Intelligence for Diabetes Management (AI4DM) intervention. The protocol will assess the feasibility, acceptability, and preliminary efficacy of the AI4DM telehealth platform for people with disabilities. Methods: The AI4DM study is a 2-arm randomized controlled trial for evaluating the delivery of a 12-month intervention, which will involve telecoaching, diabetes educational content, and technology access, to 90 individuals with diabetes and physical disabilities. The hypothesis is that this pilot project is feasible and acceptable for adults with permanently impaired mobility and type 2 diabetes. We also hypothesize that adults in the AI4DM intervention groups will have significantly better glycemic control (glycated hemoglobin) and psychosocial and psychological measures than the attention control group at the 3-, 6-, and 12-month follow-ups. Results: The AI4DM study was approved by the university’s institutional review board, and recruitment and enrollment will begin in October 2021. Conclusions: The AI4DM study will improve our understanding of the feasibility and efficacy of a web-based diabetes self-management program for people with disabilities. The AI4DM intervention has the potential to become a scalable and novel method for successfully managing type 2 diabetes in people with disabilities. Trial Registration: ClinicalTrials.gov NCT04927377; https://clinicaltrials.gov/ct2/show/NCT04927377 International Registered Report Identifier (IRRID): PRR1-10.2196/31689 %M 34505831 %R 10.2196/31689 %U https://www.researchprotocols.org/2021/9/e31689 %U https://doi.org/10.2196/31689 %U http://www.ncbi.nlm.nih.gov/pubmed/34505831 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 9 %P e30401 %T Machine Learning Approaches to Retrieve High-Quality, Clinically Relevant Evidence From the Biomedical Literature: Systematic Review %A Abdelkader,Wael %A Navarro,Tamara %A Parrish,Rick %A Cotoi,Chris %A Germini,Federico %A Iorio,Alfonso %A Haynes,R Brian %A Lokker,Cynthia %+ Health Information Research Unit, Department of Health Research Methods, Evidence, and Impact, McMaster University, 1280 Main St W, CRL Building, First Floor, Hamilton, ON, L8S 4K1, Canada, 1 647 563 5732, Abdelkaw@mcmaster.ca %K machine learning %K bioinformatics %K information retrieval %K evidence-based medicine %K literature databases %K systematic review %K accuracy %K medical literature %K clinical support %K clinical care %D 2021 %7 9.9.2021 %9 Review %J JMIR Med Inform %G English %X Background: The rapid growth of the biomedical literature makes identifying strong evidence a time-consuming task. Applying machine learning to the process could be a viable solution that limits effort while maintaining accuracy. Objective: The goal of the research was to summarize the nature and comparative performance of machine learning approaches that have been applied to retrieve high-quality evidence for clinical consideration from the biomedical literature. Methods: We conducted a systematic review of studies that applied machine learning techniques to identify high-quality clinical articles in the biomedical literature. Multiple databases were searched to July 2020. Extracted data focused on the applied machine learning model, steps in the development of the models, and model performance. Results: From 3918 retrieved studies, 10 met our inclusion criteria. All followed a supervised machine learning approach and applied, from a limited range of options, a high-quality standard for the training of their model. The results show that machine learning can achieve a sensitivity of 95% while maintaining a high precision of 86%. Conclusions: Machine learning approaches perform well in retrieving high-quality clinical studies. Performance may improve by applying more sophisticated approaches such as active learning and unsupervised machine learning approaches. %M 34499041 %R 10.2196/30401 %U https://medinform.jmir.org/2021/9/e30401 %U https://doi.org/10.2196/30401 %U http://www.ncbi.nlm.nih.gov/pubmed/34499041 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 9 %P e27098 %T Machine Learning Analysis of Time-Dependent Features for Predicting Adverse Events During Hemodialysis Therapy: Model Development and Validation Study %A Liu,Yi-Shiuan %A Yang,Chih-Yu %A Chiu,Ping-Fang %A Lin,Hui-Chu %A Lo,Chung-Chuan %A Lai,Alan Szu-Han %A Chang,Chia-Chu %A Lee,Oscar Kuang-Sheng %+ Institute of Clinical Medicine, National Yang Ming Chiao Tung University School of Medicine, 2F, Shou-Ren Bldg, 155, Sec 2, Li-Nong St, Beitou Dist, Taipei, 11221, Taiwan, 886 228712121 ext 7391, oscarlee9203@gmail.com %K hemodialysis %K intradialytic adverse events %K prediction algorithm %K machine learning %D 2021 %7 7.9.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Hemodialysis (HD) therapy is an indispensable tool used in critical care management. Patients undergoing HD are at risk for intradialytic adverse events, ranging from muscle cramps to cardiac arrest. So far, there is no effective HD device–integrated algorithm to assist medical staff in response to these adverse events a step earlier during HD. Objective: We aimed to develop machine learning algorithms to predict intradialytic adverse events in an unbiased manner. Methods: Three-month dialysis and physiological time-series data were collected from all patients who underwent maintenance HD therapy at a tertiary care referral center. Dialysis data were collected automatically by HD devices, and physiological data were recorded by medical staff. Intradialytic adverse events were documented by medical staff according to patient complaints. Features extracted from the time series data sets by linear and differential analyses were used for machine learning to predict adverse events during HD. Results: Time series dialysis data were collected during the 4-hour HD session in 108 patients who underwent maintenance HD therapy. There were a total of 4221 HD sessions, 406 of which involved at least one intradialytic adverse event. Models were built by classification algorithms and evaluated by four-fold cross-validation. The developed algorithm predicted overall intradialytic adverse events, with an area under the curve (AUC) of 0.83, sensitivity of 0.53, and specificity of 0.96. The algorithm also predicted muscle cramps, with an AUC of 0.85, and blood pressure elevation, with an AUC of 0.93. In addition, the model built based on ultrafiltration-unrelated features predicted all types of adverse events, with an AUC of 0.81, indicating that ultrafiltration-unrelated factors also contribute to the onset of adverse events. Conclusions: Our results demonstrated that algorithms combining linear and differential analyses with two-class classification machine learning can predict intradialytic adverse events in quasi-real time with high AUCs. Such a methodology implemented with local cloud computation and real-time optimization by personalized HD data could warn clinicians to take timely actions in advance. %M 34491204 %R 10.2196/27098 %U https://www.jmir.org/2021/9/e27098 %U https://doi.org/10.2196/27098 %U http://www.ncbi.nlm.nih.gov/pubmed/34491204 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 9 %P e29839 %T Application of Artificial Intelligence in Community-Based Primary Health Care: Systematic Scoping Review and Critical Appraisal %A Abbasgholizadeh Rahimi,Samira %A Légaré,France %A Sharma,Gauri %A Archambault,Patrick %A Zomahoun,Herve Tchala Vignon %A Chandavong,Sam %A Rheault,Nathalie %A T Wong,Sabrina %A Langlois,Lyse %A Couturier,Yves %A Salmeron,Jose L %A Gagnon,Marie-Pierre %A Légaré,Jean %+ Department of Family Medicine, Faculty of Medicine and Health Sciences, McGill University, 5858 Côte-des-Neiges Road, Suite 300, Montreal, QC, Canada, 1 514 399 9218, samira.rahimi@mcgill.ca %K artificial intelligence %K machine learning %K community-based primary health care %K systematic scoping review %D 2021 %7 3.9.2021 %9 Review %J J Med Internet Res %G English %X Background: Research on the integration of artificial intelligence (AI) into community-based primary health care (CBPHC) has highlighted several advantages and disadvantages in practice regarding, for example, facilitating diagnosis and disease management, as well as doubts concerning the unintended harmful effects of this integration. However, there is a lack of evidence about a comprehensive knowledge synthesis that could shed light on AI systems tested or implemented in CBPHC. Objective: We intended to identify and evaluate published studies that have tested or implemented AI in CBPHC settings. Methods: We conducted a systematic scoping review informed by an earlier study and the Joanna Briggs Institute (JBI) scoping review framework and reported the findings according to PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analysis-Scoping Reviews) reporting guidelines. An information specialist performed a comprehensive search from the date of inception until February 2020, in seven bibliographic databases: Cochrane Library, MEDLINE, EMBASE, Web of Science, Cumulative Index to Nursing and Allied Health Literature (CINAHL), ScienceDirect, and IEEE Xplore. The selected studies considered all populations who provide and receive care in CBPHC settings, AI interventions that had been implemented, tested, or both, and assessed outcomes related to patients, health care providers, or CBPHC systems. Risk of bias was assessed using the Prediction Model Risk of Bias Assessment Tool (PROBAST). Two authors independently screened the titles and abstracts of the identified records, read the selected full texts, and extracted data from the included studies using a validated extraction form. Disagreements were resolved by consensus, and if this was not possible, the opinion of a third reviewer was sought. A third reviewer also validated all the extracted data. Results: We retrieved 22,113 documents. After the removal of duplicates, 16,870 documents were screened, and 90 peer-reviewed publications met our inclusion criteria. Machine learning (ML) (41/90, 45%), natural language processing (NLP) (24/90, 27%), and expert systems (17/90, 19%) were the most commonly studied AI interventions. These were primarily implemented for diagnosis, detection, or surveillance purposes. Neural networks (ie, convolutional neural networks and abductive networks) demonstrated the highest accuracy, considering the given database for the given clinical task. The risk of bias in diagnosis or prognosis studies was the lowest in the participant category (4/49, 4%) and the highest in the outcome category (22/49, 45%). Conclusions: We observed variabilities in reporting the participants, types of AI methods, analyses, and outcomes, and highlighted the large gap in the effective development and implementation of AI in CBPHC. Further studies are needed to efficiently guide the development and implementation of AI interventions in CBPHC settings. %M 34477556 %R 10.2196/29839 %U https://www.jmir.org/2021/9/e29839 %U https://doi.org/10.2196/29839 %U http://www.ncbi.nlm.nih.gov/pubmed/34477556 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 8 %P e27235 %T Real-Time Respiratory Tumor Motion Prediction Based on a Temporal Convolutional Neural Network: Prediction Model Development Study %A Chang,Panchun %A Dang,Jun %A Dai,Jianrong %A Sun,Wenzheng %+ Department of Radiation Oncology, School of Medicine, The Second Affiliated Hospital, Zhejiang University, 88 Jiefang Road, Hangzhou, 310009, China, 86 057187783538, sunwenzheng@zju.edu.cn %K radiation therapy %K temporal convolutional neural network %K respiratory signal prediction %K neural network %K deep learning model %K dynamic tracking %D 2021 %7 27.8.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: The dynamic tracking of tumors with radiation beams in radiation therapy requires the prediction of real-time target locations prior to beam delivery, as treatment involving radiation beams and gating tracking results in time latency. Objective: In this study, a deep learning model that was based on a temporal convolutional neural network was developed to predict internal target locations by using multiple external markers. Methods: Respiratory signals from 69 treatment fractions of 21 patients with cancer who were treated with the CyberKnife Synchrony device (Accuray Incorporated) were used to train and test the model. The reported model’s performance was evaluated by comparing the model to a long short-term memory model in terms of the root mean square errors (RMSEs) of real and predicted respiratory signals. The effect of the number of external markers was also investigated. Results: The average RMSEs of predicted (ahead time=400 ms) respiratory motion in the superior-inferior, anterior-posterior, and left-right directions and in 3D space were 0.49 mm, 0.28 mm, 0.25 mm, and 0.67 mm, respectively. Conclusions: The experiment results demonstrated that the temporal convolutional neural network–based respiratory prediction model could predict respiratory signals with submillimeter accuracy. %M 34236336 %R 10.2196/27235 %U https://www.jmir.org/2021/8/e27235 %U https://doi.org/10.2196/27235 %U http://www.ncbi.nlm.nih.gov/pubmed/34236336 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 8 %P e26843 %T Predicting Kidney Graft Survival Using Machine Learning Methods: Prediction Model Development and Feature Significance Analysis Study %A Naqvi,Syed Asil Ali %A Tennankore,Karthik %A Vinson,Amanda %A Roy,Patrice C %A Abidi,Syed Sibte Raza %+ Department of Computer Science, Dalhousie University, Goldberg Computer Science Bldg, 6050 University Ave, Halifax, NS, B3H 1W5, Canada, 1 9023290504, a.naqvi@dal.ca %K kidney transplantation %K machine learning %K predictive modeling %K survival prediction %K dimensionality reduction %K feature sensitivity analysis %D 2021 %7 27.8.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Kidney transplantation is the optimal treatment for patients with end-stage renal disease. Short- and long-term kidney graft survival is influenced by a number of donor and recipient factors. Predicting the success of kidney transplantation is important for optimizing kidney allocation. Objective: The aim of this study was to predict the risk of kidney graft failure across three temporal cohorts (within 1 year, within 5 years, and after 5 years following a transplant) based on donor and recipient characteristics. We analyzed a large data set comprising over 50,000 kidney transplants covering an approximate 20-year period. Methods: We applied machine learning–based classification algorithms to develop prediction models for the risk of graft failure for three different temporal cohorts. Deep learning–based autoencoders were applied for data dimensionality reduction, which improved the prediction performance. The influence of features on graft survival for each cohort was studied by investigating a new nonoverlapping patient stratification approach. Results: Our models predicted graft survival with area under the curve scores of 82% within 1 year, 69% within 5 years, and 81% within 17 years. The feature importance analysis elucidated the varying influence of clinical features on graft survival across the three different temporal cohorts. Conclusions: In this study, we applied machine learning to develop risk prediction models for graft failure that demonstrated a high level of prediction performance. Acknowledging that these models performed better than those reported in the literature for existing risk prediction tools, future studies will focus on how best to incorporate these prediction models into clinical care algorithms to optimize the long-term health of kidney recipients. %M 34448704 %R 10.2196/26843 %U https://www.jmir.org/2021/8/e26843 %U https://doi.org/10.2196/26843 %U http://www.ncbi.nlm.nih.gov/pubmed/34448704 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 8 %P e24408 %T Image Processing for Public Health Surveillance of Tobacco Point-of-Sale Advertising: Machine Learning–Based Methodology %A English,Ned %A Anesetti-Rothermel,Andrew %A Zhao,Chang %A Latterner,Andrew %A Benson,Adam F %A Herman,Peter %A Emery,Sherry %A Schneider,Jordan %A Rose,Shyanika W %A Patel,Minal %A Schillo,Barbara A %+ NORC at the University of Chicago, 55 E Monroe St, Ste 3100, Chicago, IL, 60603, United States, 1 3127594010, english-ned@norc.org %K machine learning %K image classification %K convolutional neural network %K object detection %K crowdsourcing %K tobacco point of sale %K public health surveillance %D 2021 %7 27.8.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: With a rapidly evolving tobacco retail environment, it is increasingly necessary to understand the point-of-sale (POS) advertising environment as part of tobacco surveillance and control. Advances in machine learning and image processing suggest the ability for more efficient and nuanced data capture than previously available. Objective: The study aims to use machine learning algorithms to discover the presence of tobacco advertising in photographs of tobacco POS advertising and their location in the photograph. Methods: We first collected images of the interiors of tobacco retailers in West Virginia and the District of Columbia during 2016 and 2018. The clearest photographs were selected and used to create a training and test data set. We then used a pretrained image classification network model, Inception V3, to discover the presence of tobacco logos and a unified object detection system, You Only Look Once V3, to identify logo locations. Results: Our model was successful in identifying the presence of advertising within images, with a classification accuracy of over 75% for 8 of the 42 brands. Discovering the location of logos within a given photograph was more challenging because of the relatively small training data set, resulting in a mean average precision score of 0.72 and an intersection over union score of 0.62. Conclusions: Our research provides preliminary evidence for a novel methodological approach that tobacco researchers and other public health practitioners can apply in the collection and processing of data for tobacco or other POS surveillance efforts. The resulting surveillance information can inform policy adoption, implementation, and enforcement. Limitations notwithstanding, our analysis shows the promise of using machine learning as part of a suite of tools to understand the tobacco retail environment, make policy recommendations, and design public health interventions at the municipal or other jurisdictional scale. %M 34448700 %R 10.2196/24408 %U https://www.jmir.org/2021/8/e24408 %U https://doi.org/10.2196/24408 %U http://www.ncbi.nlm.nih.gov/pubmed/34448700 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 8 %P e27709 %T A Machine Learning Approach to Passively Informed Prediction of Mental Health Risk in People with Diabetes: Retrospective Case-Control Analysis %A Yu,Jessica %A Chiu,Carter %A Wang,Yajuan %A Dzubur,Eldin %A Lu,Wei %A Hoffman,Julia %+ Livongo Health, Inc, 150 W Evelyn Ave, Ste 150, Mountain View, CA, 94041, United States, 1 6508048434, jessica.yu@livongo.com %K diabetes mellitus %K mental health %K risk detection %K passive sensing %K ecological momentary assessment %K machine learning %D 2021 %7 27.8.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Proactive detection of mental health needs among people with diabetes mellitus could facilitate early intervention, improve overall health and quality of life, and reduce individual and societal health and economic burdens. Passive sensing and ecological momentary assessment are relatively newer methods that may be leveraged for such proactive detection. Objective: The primary aim of this study was to conceptualize, develop, and evaluate a novel machine learning approach for predicting mental health risk in people with diabetes mellitus. Methods: A retrospective study was designed to develop and evaluate a machine learning model, utilizing data collected from 142,432 individuals with diabetes enrolled in the Livongo for Diabetes program. First, participants’ mental health statuses were verified using prescription and medical and pharmacy claims data. Next, four categories of passive sensing signals were extracted from the participants’ behavior in the program, including demographics and glucometer, coaching, and event data. Data sets were then assembled to create participant-period instances, and descriptive analyses were conducted to understand the correlation between mental health status and passive sensing signals. Passive sensing signals were then entered into the model to train and test its performance. The model was evaluated based on seven measures: sensitivity, specificity, precision, area under the curve, F1 score, accuracy, and confusion matrix. SHapley Additive exPlanations (SHAP) values were computed to determine the importance of individual signals. Results: In the training (and validation) and three subsequent test sets, the model achieved a confidence score greater than 0.5 for sensitivity, specificity, area under the curve, and accuracy. Signals identified as important by SHAP values included demographics such as race and gender, participant’s emotional state during blood glucose checks, time of day of blood glucose checks, blood glucose values, and interaction with the Livongo mobile app and web platform. Conclusions: Results of this study demonstrate the utility of a passively informed mental health risk algorithm and invite further exploration to identify additional signals and determine when and where such algorithms should be deployed. %M 34448707 %R 10.2196/27709 %U https://www.jmir.org/2021/8/e27709 %U https://doi.org/10.2196/27709 %U http://www.ncbi.nlm.nih.gov/pubmed/34448707 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 8 %P e29328 %T Classification of Children With Autism and Typical Development Using Eye-Tracking Data From Face-to-Face Conversations: Machine Learning Model Development and Performance Evaluation %A Zhao,Zhong %A Tang,Haiming %A Zhang,Xiaobin %A Qu,Xingda %A Hu,Xinyao %A Lu,Jianping %+ Institute of Human Factors and Ergonomics, College of Mechatronics and Control Engineering, Shenzhen University, 3688 Nanhai Avenue, Shenzhen, 518000, China, 86 86965716, quxd@szu.edu.cn %K autism spectrum disorder %K eye tracking %K face-to-face interaction %K machine learning %K visual fixation %D 2021 %7 26.8.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Previous studies have shown promising results in identifying individuals with autism spectrum disorder (ASD) by applying machine learning (ML) to eye-tracking data collected while participants viewed varying images (ie, pictures, videos, and web pages). Although gaze behavior is known to differ between face-to-face interaction and image-viewing tasks, no study has investigated whether eye-tracking data from face-to-face conversations can also accurately identify individuals with ASD. Objective: The objective of this study was to examine whether eye-tracking data from face-to-face conversations could classify children with ASD and typical development (TD). We further investigated whether combining features on visual fixation and length of conversation would achieve better classification performance. Methods: Eye tracking was performed on children with ASD and TD while they were engaged in face-to-face conversations (including 4 conversational sessions) with an interviewer. By implementing forward feature selection, four ML classifiers were used to determine the maximum classification accuracy and the corresponding features: support vector machine (SVM), linear discriminant analysis, decision tree, and random forest. Results: A maximum classification accuracy of 92.31% was achieved with the SVM classifier by combining features on both visual fixation and session length. The classification accuracy of combined features was higher than that obtained using visual fixation features (maximum classification accuracy 84.62%) or session length (maximum classification accuracy 84.62%) alone. Conclusions: Eye-tracking data from face-to-face conversations could accurately classify children with ASD and TD, suggesting that ASD might be objectively screened in everyday social interactions. However, these results will need to be validated with a larger sample of individuals with ASD (varying in severity and balanced sex ratio) using data collected from different modalities (eg, eye tracking, kinematic, electroencephalogram, and neuroimaging). In addition, individuals with other clinical conditions (eg, developmental delay and attention deficit hyperactivity disorder) should be included in similar ML studies for detecting ASD. %M 34435957 %R 10.2196/29328 %U https://www.jmir.org/2021/8/e29328 %U https://doi.org/10.2196/29328 %U http://www.ncbi.nlm.nih.gov/pubmed/34435957 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 8 %P e26162 %T Patient Perceptions on Data Sharing and Applying Artificial Intelligence to Health Care Data: Cross-sectional Survey %A Aggarwal,Ravi %A Farag,Soma %A Martin,Guy %A Ashrafian,Hutan %A Darzi,Ara %+ Institute of Global Health Innovation, Imperial College London, 10th Floor, QEQM Building,, St Marys Hospital, Praed St, London, W2 1NY, United Kingdom, 44 07799871597, h.ashrafian@imperial.ac.uk %K artificial intelligence %K patient perception %K data sharing %K health data %K privacy %D 2021 %7 26.8.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Considerable research is being conducted as to how artificial intelligence (AI) can be effectively applied to health care. However, for the successful implementation of AI, large amounts of health data are required for training and testing algorithms. As such, there is a need to understand the perspectives and viewpoints of patients regarding the use of their health data in AI research. Objective: We surveyed a large sample of patients for identifying current awareness regarding health data research, and for obtaining their opinions and views on data sharing for AI research purposes, and on the use of AI technology on health care data. Methods: A cross-sectional survey with patients was conducted at a large multisite teaching hospital in the United Kingdom. Data were collected on patient and public views about sharing health data for research and the use of AI on health data. Results: A total of 408 participants completed the survey. The respondents had generally low levels of prior knowledge about AI. Most were comfortable with sharing health data with the National Health Service (NHS) (318/408, 77.9%) or universities (268/408, 65.7%), but far fewer with commercial organizations such as technology companies (108/408, 26.4%). The majority endorsed AI research on health care data (357/408, 87.4%) and health care imaging (353/408, 86.4%) in a university setting, provided that concerns about privacy, reidentification of anonymized health care data, and consent processes were addressed. Conclusions: There were significant variations in the patient perceptions, levels of support, and understanding of health data research and AI. Greater public engagement levels and debates are necessary to ensure the acceptability of AI research and its successful integration into clinical practice in future. %M 34236994 %R 10.2196/26162 %U https://www.jmir.org/2021/8/e26162 %U https://doi.org/10.2196/26162 %U http://www.ncbi.nlm.nih.gov/pubmed/34236994 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 7 %N 8 %P e26604 %T Identifying Communities at Risk for COVID-19–Related Burden Across 500 US Cities and Within New York City: Unsupervised Learning of the Coprevalence of Health Indicators %A Deonarine,Andrew %A Lyons,Genevieve %A Lakhani,Chirag %A De Brouwer,Walter %+ XY.ai, 56 JFK Street, Cambridge, MA, 02138, United States, 1 8575000461, andrew@xy.ai %K COVID-19 %K satellite imagery %K built environment %K social determinants of health %K machine learning %K artificial intelligence %K community %K risk %K United States %K indicator %K comorbidity %K environment %K population %K determinant %K mortality %K prediction %D 2021 %7 26.8.2021 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Although it is well-known that older individuals with certain comorbidities are at the highest risk for complications related to COVID-19 including hospitalization and death, we lack tools to identify communities at the highest risk with fine-grained spatial resolution. Information collected at a county level obscures local risk and complex interactions between clinical comorbidities, the built environment, population factors, and other social determinants of health. Objective: This study aims to develop a COVID-19 community risk score that summarizes complex disease prevalence together with age and sex, and compares the score to different social determinants of health indicators and built environment measures derived from satellite images using deep learning. Methods: We developed a robust COVID-19 community risk score (COVID-19 risk score) that summarizes the complex disease co-occurrences (using data for 2019) for individual census tracts with unsupervised learning, selected on the basis of their association with risk for COVID-19 complications such as death. We mapped the COVID-19 risk score to corresponding zip codes in New York City and associated the score with COVID-19–related death. We further modeled the variance of the COVID-19 risk score using satellite imagery and social determinants of health. Results: Using 2019 chronic disease data, the COVID-19 risk score described 85% of the variation in the co-occurrence of 15 diseases and health behaviors that are risk factors for COVID-19 complications among ~28,000 census tract neighborhoods (median population size of tracts 4091). The COVID-19 risk score was associated with a 40% greater risk for COVID-19–related death across New York City (April and September 2020) for a 1 SD change in the score (risk ratio for 1 SD change in COVID-19 risk score 1.4; P<.001) at the zip code level. Satellite imagery coupled with social determinants of health explain nearly 90% of the variance in the COVID-19 risk score in the United States in census tracts (r2=0.87). Conclusions: The COVID-19 risk score localizes risk at the census tract level and was able to predict COVID-19–related mortality in New York City. The built environment explained significant variations in the score, suggesting risk models could be enhanced with satellite imagery. %M 34280122 %R 10.2196/26604 %U https://publichealth.jmir.org/2021/8/e26604 %U https://doi.org/10.2196/26604 %U http://www.ncbi.nlm.nih.gov/pubmed/34280122 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 8 %P e29682 %T Computer-Aided Diagnosis of Diminutive Colorectal Polyps in Endoscopic Images: Systematic Review and Meta-analysis of Diagnostic Test Accuracy %A Bang,Chang Seok %A Lee,Jae Jun %A Baik,Gwang Ho %+ Department of Internal Medicine, Hallym University College of Medicine, Sakju-ro 77, Chuncheon, 24253, Republic of Korea, 82 332405821, csbang@hallym.ac.kr %K artificial intelligence %K deep learning %K polyps %K colon %K colonoscopy %K diminutive %D 2021 %7 25.8.2021 %9 Review %J J Med Internet Res %G English %X Background: Most colorectal polyps are diminutive and benign, especially those in the rectosigmoid colon, and the resection of these polyps is not cost-effective. Advancements in image-enhanced endoscopy have improved the optical prediction of colorectal polyp histology. However, subjective interpretability and inter- and intraobserver variability prohibits widespread implementation. The number of studies on computer-aided diagnosis (CAD) is increasing; however, their small sample sizes limit statistical significance. Objective: This review aims to evaluate the diagnostic test accuracy of CAD models in predicting the histology of diminutive colorectal polyps by using endoscopic images. Methods: Core databases were searched for studies that were based on endoscopic imaging, used CAD models for the histologic diagnosis of diminutive colorectal polyps, and presented data on diagnostic performance. A systematic review and diagnostic test accuracy meta-analysis were performed. Results: Overall, 13 studies were included. The pooled area under the curve, sensitivity, specificity, and diagnostic odds ratio of CAD models for the diagnosis of diminutive colorectal polyps (adenomatous or neoplastic vs nonadenomatous or nonneoplastic) were 0.96 (95% CI 0.93-0.97), 0.93 (95% CI 0.91-0.95), 0.87 (95% CI 0.76-0.93), and 87 (95% CI 38-201), respectively. The meta-regression analysis showed no heterogeneity, and no publication bias was detected. Subgroup analyses showed robust results. The negative predictive value of CAD models for the diagnosis of adenomatous polyps in the rectosigmoid colon was 0.96 (95% CI 0.95-0.97), and this value exceeded the threshold of the diagnosis and leave strategy. Conclusions: CAD models show potential for the optical histological diagnosis of diminutive colorectal polyps via the use of endoscopic images. Trial Registration: PROSPERO CRD42021232189; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=232189 %M 34432643 %R 10.2196/29682 %U https://www.jmir.org/2021/8/e29682 %U https://doi.org/10.2196/29682 %U http://www.ncbi.nlm.nih.gov/pubmed/34432643 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 8 %P e25090 %T An Artificial Neural Network Prediction Model for Posttraumatic Epilepsy: Retrospective Cohort Study %A Wang,Xueping %A Zhong,Jie %A Lei,Ting %A Chen,Deng %A Wang,Haijiao %A Zhu,Lina %A Chu,Shanshan %A Liu,Ling %+ Department of Neurology, West China Hospital, Sichuan University, No. 37, Guo Xue Xiang, Chengdu, 610041, China, 86 151 1705 8487, zjllxx1968@163.com %K artificial neural network %K posttraumatic epilepsy %K traumatic brain injury %D 2021 %7 19.8.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Posttraumatic epilepsy (PTE) is a common sequela after traumatic brain injury (TBI), and identifying high-risk patients with PTE is necessary for their better treatment. Although artificial neural network (ANN) prediction models have been reported and are superior to traditional models, the ANN prediction model for PTE is lacking. Objective: We aim to train and validate an ANN model to anticipate the risks of PTE. Methods: The training cohort was TBI patients registered at West China Hospital. We used a 5-fold cross-validation approach to train and test the ANN model to avoid overfitting; 21 independent variables were used as input neurons in the ANN models, using a back-propagation algorithm to minimize the loss function. Finally, we obtained sensitivity, specificity, and accuracy of each ANN model from the 5 rounds of cross-validation and compared the accuracy with a nomogram prediction model built in our previous work based on the same population. In addition, we evaluated the performance of the model using patients registered at Chengdu Shang Jin Nan Fu Hospital (testing cohort 1) and Sichuan Provincial People’s Hospital (testing cohort 2) between January 1, 2013, and March 1, 2015. Results: For the training cohort, we enrolled 1301 TBI patients from January 1, 2011, to December 31, 2017. The prevalence of PTE was 12.8% (166/1301, 95% CI 10.9%-14.6%). Of the TBI patients registered in testing cohort 1, PTE prevalence was 10.5% (44/421, 95% CI 7.5%-13.4%). Of the TBI patients registered in testing cohort 2, PTE prevalence was 6.1% (25/413, 95% CI 3.7%-8.4%). The results of the ANN model show that, the area under the receiver operating characteristic curve in the training cohort was 0.907 (95% CI 0.889-0.924), testing cohort 1 was 0.867 (95% CI 0.842-0.893), and testing cohort 2 was 0.859 (95% CI 0.826-0.890). Second, the average accuracy of the training cohort was 0.557 (95% CI 0.510-0.620), with 0.470 (95% CI 0.414-0.526) in testing cohort 1 and 0.344 (95% CI 0.287-0.401) in testing cohort 2. In addition, sensitivity, specificity, positive predictive values and negative predictors in the training cohort (testing cohort 1 and testing cohort 2) were 0.80 (0.83 and 0.80), 0.86 (0.80 and 0.84), 91% (85% and 78%), and 86% (80% and 83%), respectively. When calibrating this ANN model, Brier scored 0.121 in testing cohort 1 and 0.127 in testing cohort 2. Compared with the nomogram model, the ANN prediction model had a higher accuracy (P=.01). Conclusions: This study shows that the ANN model can predict the risk of PTE and is superior to the risk estimated based on traditional statistical methods. However, the calibration of the model is a bit poor, and we need to calibrate it on a large sample size set and further improve the model. %M 34420931 %R 10.2196/25090 %U https://www.jmir.org/2021/8/e25090 %U https://doi.org/10.2196/25090 %U http://www.ncbi.nlm.nih.gov/pubmed/34420931 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 8 %P e25165 %T Gender Prediction for a Multiethnic Population via Deep Learning Across Different Retinal Fundus Photograph Fields: Retrospective Cross-sectional Study %A Betzler,Bjorn Kaijun %A Yang,Henrik Hee Seung %A Thakur,Sahil %A Yu,Marco %A Quek,Ten Cheer %A Soh,Zhi Da %A Lee,Geunyoung %A Tham,Yih-Chung %A Wong,Tien Yin %A Rim,Tyler Hyungtaek %A Cheng,Ching-Yu %+ Ophthalmology and Visual Science Academic Clinical Program, Duke-NUS Medical School, 8 College Rd, Singapore, 169857, Singapore, 65 65767228, tyler.rim@snec.com.sg %K deep learning %K artificial intelligence %K retina %K gender %K ophthalmology %D 2021 %7 17.8.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Deep learning algorithms have been built for the detection of systemic and eye diseases based on fundus photographs. The retina possesses features that can be affected by gender differences, and the extent to which these features are captured via photography differs depending on the retinal image field. Objective: We aimed to compare deep learning algorithms’ performance in predicting gender based on different fields of fundus photographs (optic disc–centered, macula-centered, and peripheral fields). Methods: This retrospective cross-sectional study included 172,170 fundus photographs of 9956 adults aged ≥40 years from the Singapore Epidemiology of Eye Diseases Study. Optic disc–centered, macula-centered, and peripheral field fundus images were included in this study as input data for a deep learning model for gender prediction. Performance was estimated at the individual level and image level. Receiver operating characteristic curves for binary classification were calculated. Results: The deep learning algorithms predicted gender with an area under the receiver operating characteristic curve (AUC) of 0.94 at the individual level and an AUC of 0.87 at the image level. Across the three image field types, the best performance was seen when using optic disc–centered field images (younger subgroups: AUC=0.91; older subgroups: AUC=0.86), and algorithms that used peripheral field images had the lowest performance (younger subgroups: AUC=0.85; older subgroups: AUC=0.76). Across the three ethnic subgroups, algorithm performance was lowest in the Indian subgroup (AUC=0.88) compared to that in the Malay (AUC=0.91) and Chinese (AUC=0.91) subgroups when the algorithms were tested on optic disc–centered images. Algorithms’ performance in gender prediction at the image level was better in younger subgroups (aged <65 years; AUC=0.89) than in older subgroups (aged ≥65 years; AUC=0.82). Conclusions: We confirmed that gender among the Asian population can be predicted with fundus photographs by using deep learning, and our algorithms’ performance in terms of gender prediction differed according to the field of fundus photographs, age subgroups, and ethnic groups. Our work provides a further understanding of using deep learning models for the prediction of gender-related diseases. Further validation of our findings is still needed. %M 34402800 %R 10.2196/25165 %U https://medinform.jmir.org/2021/8/e25165 %U https://doi.org/10.2196/25165 %U http://www.ncbi.nlm.nih.gov/pubmed/34402800 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 8 %P e24762 %T Development and Validation of an Arterial Pressure-Based Cardiac Output Algorithm Using a Convolutional Neural Network: Retrospective Study Based on Prospective Registry Data %A Yang,Hyun-Lim %A Jung,Chul-Woo %A Yang,Seong Mi %A Kim,Min-Soo %A Shim,Sungho %A Lee,Kook Hyun %A Lee,Hyung-Chul %+ Department of Anesthesiology and Pain Medicine, Seoul National University College of Medicine, 101 Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea, 82 2 2072 0640, vital@snu.ac.kr %K cardiac output %K deep learning %K arterial pressure %D 2021 %7 16.8.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Arterial pressure-based cardiac output (APCO) is a less invasive method for estimating cardiac output without concerns about complications from the pulmonary artery catheter (PAC). However, inaccuracies of currently available APCO devices have been reported. Improvements to the algorithm by researchers are impossible, as only a subset of the algorithm has been released. Objective: In this study, an open-source algorithm was developed and validated using a convolutional neural network and a transfer learning technique. Methods: A retrospective study was performed using data from a prospective cohort registry of intraoperative bio-signal data from a university hospital. The convolutional neural network model was trained using the arterial pressure waveform as input and the stroke volume (SV) value as the output. The model parameters were pretrained using the SV values from a commercial APCO device (Vigileo or EV1000 with the FloTrac algorithm) and adjusted with a transfer learning technique using SV values from the PAC. The performance of the model was evaluated using absolute error for the PAC on the testing dataset from separate periods. Finally, we compared the performance of the deep learning model and the FloTrac with the SV values from the PAC. Results: A total of 2057 surgical cases (1958 training and 99 testing cases) were used in the registry. In the deep learning model, the absolute errors of SV were 14.5 (SD 13.4) mL (10.2 [SD 8.4] mL in cardiac surgery and 17.4 [SD 15.3] mL in liver transplantation). Compared with FloTrac, the absolute errors of the deep learning model were significantly smaller (16.5 [SD 15.4] and 18.3 [SD 15.1], P<.001). Conclusions: The deep learning–based APCO algorithm showed better performance than the commercial APCO device. Further improvement of the algorithm developed in this study may be helpful for estimating cardiac output accurately in clinical practice and optimizing high-risk patient care. %M 34398790 %R 10.2196/24762 %U https://medinform.jmir.org/2021/8/e24762 %U https://doi.org/10.2196/24762 %U http://www.ncbi.nlm.nih.gov/pubmed/34398790 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 8 %P e26398 %T Current-Visit and Next-Visit Prediction for Fatty Liver Disease With a Large-Scale Dataset: Model Development and Performance Comparison %A Wu,Cheng-Tse %A Chu,Ta-Wei %A Jang,Jyh-Shing Roger %+ Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, No. 325, Sec. 2, Chenggong Rd., Neihu Dist.,, Taipei, 114, Taiwan, 886 287923311 ext 88083, taweichu@gmail.com %K machine learning %K sequence forward selection %K one-pass ranking %K fatty liver diseases %K alcohol fatty liver disease %K nonalcoholic fatty liver disease %K long short-term memory %K current-visit prediction %K next-visit prediction %D 2021 %7 12.8.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Fatty liver disease (FLD) arises from the accumulation of fat in the liver and may cause liver inflammation, which, if not well controlled, may develop into liver fibrosis, cirrhosis, or even hepatocellular carcinoma. Objective: We describe the construction of machine-learning models for current-visit prediction (CVP), which can help physicians obtain more information for accurate diagnosis, and next-visit prediction (NVP), which can help physicians provide potential high-risk patients with advice to effectively prevent FLD. Methods: The large-scale and high-dimensional dataset used in this study comes from Taipei MJ Health Research Foundation in Taiwan. We used one-pass ranking and sequential forward selection (SFS) for feature selection in FLD prediction. For CVP, we explored multiple models, including k-nearest-neighbor classifier (KNNC), Adaboost, support vector machine (SVM), logistic regression (LR), random forest (RF), Gaussian naïve Bayes (GNB), decision trees C4.5 (C4.5), and classification and regression trees (CART). For NVP, we used long short-term memory (LSTM) and several of its variants as sequence classifiers that use various input sets for prediction. Model performance was evaluated based on two criteria: the accuracy of the test set and the intersection over union/coverage between the features selected by one-pass ranking/SFS and by domain experts. The accuracy, precision, recall, F-measure, and area under the receiver operating characteristic curve were calculated for both CVP and NVP for males and females, respectively. Results: After data cleaning, the dataset included 34,856 and 31,394 unique visits respectively for males and females for the period 2009-2016. The test accuracy of CVP using KNNC, Adaboost, SVM, LR, RF, GNB, C4.5, and CART was respectively 84.28%, 83.84%, 82.22%, 82.21%, 76.03%, 75.78%, and 75.53%. The test accuracy of NVP using LSTM, bidirectional LSTM (biLSTM), Stack-LSTM, Stack-biLSTM, and Attention-LSTM was respectively 76.54%, 76.66%, 77.23%, 76.84%, and 77.31% for fixed-interval features, and was 79.29%, 79.12%, 79.32%, 79.29%, and 78.36%, respectively, for variable-interval features. Conclusions: This study explored a large-scale FLD dataset with high dimensionality. We developed FLD prediction models for CVP and NVP. We also implemented efficient feature selection schemes for current- and next-visit prediction to compare the automatically selected features with expert-selected features. In particular, NVP emerged as more valuable from the viewpoint of preventive medicine. For NVP, we propose use of feature set 2 (with variable intervals), which is more compact and flexible. We have also tested several variants of LSTM in combination with two feature sets to identify the best match for male and female FLD prediction. More specifically, the best model for males was Stack-LSTM using feature set 2 (with 79.32% accuracy), whereas the best model for females was LSTM using feature set 1 (with 81.90% accuracy). %M 34387552 %R 10.2196/26398 %U https://medinform.jmir.org/2021/8/e26398 %U https://doi.org/10.2196/26398 %U http://www.ncbi.nlm.nih.gov/pubmed/34387552 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 5 %N 8 %P e20678 %T Artificial Intelligence–Based Chatbot for Anxiety and Depression in University Students: Pilot Randomized Controlled Trial %A Klos,Maria Carolina %A Escoredo,Milagros %A Joerin,Angela %A Lemos,Viviana Noemí %A Rauws,Michiel %A Bunge,Eduardo L %+ Interdisciplinary Center for Research in Health and Behavioral Sciences (CIICSAC), Universidad Adventista del Plata (UAP)., National Scientific and Technical Research Council (CONICET)., 25 de Mayo 99, Libertador San Martín, Entre Ríos, 3103, Argentina, 54 3435064263, mcarolinaklos@gmail.com %K artificial intelligence %K chatbots %K conversational agents %K mental health %K anxiety %K depression %K college students %D 2021 %7 12.8.2021 %9 Original Paper %J JMIR Form Res %G English %X Background: Artificial intelligence–based chatbots are emerging as instruments of psychological intervention; however, no relevant studies have been reported in Latin America. Objective: The objective of the present study was to evaluate the viability, acceptability, and potential impact of using Tess, a chatbot, for examining symptoms of depression and anxiety in university students. Methods: This was a pilot randomized controlled trial. The experimental condition used Tess for 8 weeks, and the control condition was assigned to a psychoeducation book on depression. Comparisons were conducted using Mann-Whitney U and Wilcoxon tests for depressive symptoms, and independent and paired sample t tests to analyze anxiety symptoms. Results: The initial sample consisted of 181 Argentinian college students (158, 87.2% female) aged 18 to 33. Data at week 8 were provided by 39 out of the 99 (39%) participants in the experimental condition and 34 out of the 82 (41%) in the control group. On an average, 472 (SD 249.52) messages were exchanged, with 116 (SD 73.87) of the messages sent from the users in response to Tess. A higher number of messages exchanged with Tess was associated with positive feedback (F2,36=4.37; P=.02). No significant differences between the experimental and control groups were found from the baseline to week 8 for depressive and anxiety symptoms. However, significant intragroup differences demonstrated that the experimental group showed a significant decrease in anxiety symptoms; no such differences were observed for the control group. Further, no significant intragroup differences were found for depressive symptoms. Conclusions: The students spent a considerable amount of time exchanging messages with Tess and positive feedback was associated with a higher number of messages exchanged. The initial results show promising evidence for the usability and acceptability of Tess in the Argentinian population. Research on chatbots is still in its initial stages and further research is needed. %M 34092548 %R 10.2196/20678 %U https://formative.jmir.org/2021/8/e20678 %U https://doi.org/10.2196/20678 %U http://www.ncbi.nlm.nih.gov/pubmed/34092548 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 8 %P e28287 %T Ranking Rule-Based Automatic Explanations for Machine Learning Predictions on Asthma Hospital Encounters in Patients With Asthma: Retrospective Cohort Study %A Zhang,Xiaoyi %A Luo,Gang %+ Department of Biomedical Informatics and Medical Education, University of Washington, UW Medicine South Lake Union, 850 Republican Street, Building C, Box 358047, Seattle, WA, 98195, United States, 1 206 221 4596, gangluo@cs.wisc.edu %K asthma %K clinical decision support %K machine learning %K patient care management %K forecasting %D 2021 %7 11.8.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Asthma hospital encounters impose a heavy burden on the health care system. To improve preventive care and outcomes for patients with asthma, we recently developed a black-box machine learning model to predict whether a patient with asthma will have one or more asthma hospital encounters in the succeeding 12 months. Our model is more accurate than previous models. However, black-box machine learning models do not explain their predictions, which forms a barrier to widespread clinical adoption. To solve this issue, we previously developed a method to automatically provide rule-based explanations for the model’s predictions and to suggest tailored interventions without sacrificing model performance. For an average patient correctly predicted by our model to have future asthma hospital encounters, our explanation method generated over 5000 rule-based explanations, if any. However, the user of the automated explanation function, often a busy clinician, will want to quickly obtain the most useful information for a patient by viewing only the top few explanations. Therefore, a methodology is required to appropriately rank the explanations generated for a patient. However, this is currently an open problem. Objective: The aim of this study is to develop a method to appropriately rank the rule-based explanations that our automated explanation method generates for a patient. Methods: We developed a ranking method that struck a balance among multiple factors. Through a secondary analysis of 82,888 data instances of adults with asthma from the University of Washington Medicine between 2011 and 2018, we demonstrated our ranking method on the test case of predicting asthma hospital encounters in patients with asthma. Results: For each patient predicted to have asthma hospital encounters in the succeeding 12 months, the top few explanations returned by our ranking method typically have high quality and low redundancy. Many top-ranked explanations provide useful insights on the various aspects of the patient’s situation, which cannot be easily obtained by viewing the patient’s data in the current electronic health record system. Conclusions: The explanation ranking module is an essential component of the automated explanation function, and it addresses the interpretability issue that deters the widespread adoption of machine learning predictive models in clinical practice. In the next few years, we plan to test our explanation ranking method on predictive modeling problems addressing other diseases as well as on data from other health care systems. International Registered Report Identifier (IRRID): RR2-10.2196/5039 %M 34383673 %R 10.2196/28287 %U https://medinform.jmir.org/2021/8/e28287 %U https://doi.org/10.2196/28287 %U http://www.ncbi.nlm.nih.gov/pubmed/34383673 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 8 %P e23508 %T Development and Validation of Unplanned Extubation Prediction Models Using Intensive Care Unit Data: Retrospective, Comparative, Machine Learning Study %A Hur,Sujeong %A Min,Ji Young %A Yoo,Junsang %A Kim,Kyunga %A Chung,Chi Ryang %A Dykes,Patricia C %A Cha,Won Chul %+ Department of Emergency Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-ro, Gangnam-gu, Seoul, 06351, Republic of Korea, 82 2 3410 2053, wc.cha@samsung.com %K intensive care unit %K machine learning %K mechanical ventilator %K patient safety %K unplanned extubation %D 2021 %7 11.8.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Patient safety in the intensive care unit (ICU) is one of the most critical issues, and unplanned extubation (UE) is considered the most adverse event for patient safety. Prevention and early detection of such an event is an essential but difficult component of quality care. Objective: This study aimed to develop and validate prediction models for UE in ICU patients using machine learning. Methods: This study was conducted in an academic tertiary hospital in Seoul, Republic of Korea. The hospital had approximately 2000 inpatient beds and 120 ICU beds. As of January 2019, the hospital had approximately 9000 outpatients on a daily basis. The number of annual ICU admissions was approximately 10,000. We conducted a retrospective study between January 1, 2010, and December 31, 2018. A total of 6914 extubation cases were included. We developed a UE prediction model using machine learning algorithms, which included random forest (RF), logistic regression (LR), artificial neural network (ANN), and support vector machine (SVM). For evaluating the model’s performance, we used the area under the receiver operating characteristic curve (AUROC). The sensitivity, specificity, positive predictive value, negative predictive value, and F1 score were also determined for each model. For performance evaluation, we also used a calibration curve, the Brier score, and the integrated calibration index (ICI) to compare different models. The potential clinical usefulness of the best model at the best threshold was assessed through a net benefit approach using a decision curve. Results: Among the 6914 extubation cases, 248 underwent UE. In the UE group, there were more males than females, higher use of physical restraints, and fewer surgeries. The incidence of UE was higher during the night shift as compared to the planned extubation group. The rate of reintubation within 24 hours and hospital mortality were higher in the UE group. The UE prediction algorithm was developed, and the AUROC for RF was 0.787, for LR was 0.762, for ANN was 0.763, and for SVM was 0.740. Conclusions: We successfully developed and validated machine learning–based prediction models to predict UE in ICU patients using electronic health record data. The best AUROC was 0.787 and the sensitivity was 0.949, which was obtained using the RF algorithm. The RF model was well-calibrated, and the Brier score and ICI were 0.129 and 0.048, respectively. The proposed prediction model uses widely available variables to limit the additional workload on the clinician. Further, this evaluation suggests that the model holds potential for clinical usefulness. %M 34382940 %R 10.2196/23508 %U https://www.jmir.org/2021/8/e23508 %U https://doi.org/10.2196/23508 %U http://www.ncbi.nlm.nih.gov/pubmed/34382940 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 8 %P e28292 %T Improving Human Happiness Analysis Based on Transfer Learning: Algorithm Development and Validation %A Yu,Lele %A Zhang,Shaowu %A Zhang,Yijia %A Lin,Hongfei %+ College of Computer Science and Technology, Dalian University of Technology, No 2 Linggong Road, Dalian, 116023, China, 86 411 84708704, zhangyijia1979@gmail.com %K happiness analysis %K sentiment analysis %K transfer learning %K text classification %D 2021 %7 6.8.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Happiness refers to the joyful and pleasant emotions that humans produce subjectively. It is the positive part of emotions, and it affects the quality of human life. Therefore, understanding human happiness is a meaningful task in sentiment analysis. Objective: We mainly discuss 2 facets (Agency/Sociality) of happiness in this paper. Through analysis and research on happiness, we can expand on new concepts that define happiness and enrich our understanding of emotions. Methods: This paper treated each happy moment as a sequence of short sentences, then proposed a short happiness detection model based on transfer learning to analyze the Agency and Sociality aspects of happiness. First, we utilized the unlabeled training set to retrain the pretraining language model Bidirectional Encoder Representations from Transformers (BERT) and got a semantically enhanced language model happyBERT in the target domain. Then, we got several single text classification models by fine-tuning BERT and happyBERT. Finally, an improved voting strategy was proposed to integrate multiple single models, and “pseudo data” were introduced to retrain the combined models. Results: The proposed approach was evaluated on the public dataset happyDB. Experimental results showed that our approach significantly outperforms the baselines. When predicting the Agency aspect of happiness, our approach achieved an accuracy of 0.8653 and an F1 score of 0.9126. When predicting Sociality, our approach achieved an accuracy of 0.9367 and an F1 score of 0.9491. Conclusions: By evaluating the dataset, the comparison results demonstrated the effectiveness of our approach for happiness analysis. Experimental results confirmed that our method achieved state-of-the-art performance and transfer learning effectively improved happiness analysis. %M 34383680 %R 10.2196/28292 %U https://medinform.jmir.org/2021/8/e28292 %U https://doi.org/10.2196/28292 %U http://www.ncbi.nlm.nih.gov/pubmed/34383680 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 8 %N 8 %P e19824 %T Deep Learning With Anaphora Resolution for the Detection of Tweeters With Depression: Algorithm Development and Validation Study %A Wongkoblap,Akkapon %A Vadillo,Miguel A %A Curcin,Vasa %+ DIGITECH, Suranaree University of Technology, 111 University Avenue, Muang, Nakhon Ratchasima, 30000, Thailand, 66 44224336, wongkoblap@sut.ac.th %K depression %K mental health %K Twitter %K social media %K deep learning %K anaphora resolution %K multiple-instance learning %K depression markers %D 2021 %7 6.8.2021 %9 Original Paper %J JMIR Ment Health %G English %X Background: Mental health problems are widely recognized as a major public health challenge worldwide. This concern highlights the need to develop effective tools for detecting mental health disorders in the population. Social networks are a promising source of data wherein patients publish rich personal information that can be mined to extract valuable psychological cues; however, these data come with their own set of challenges, such as the need to disambiguate between statements about oneself and third parties. Traditionally, natural language processing techniques for social media have looked at text classifiers and user classification models separately, hence presenting a challenge for researchers who want to combine text sentiment and user sentiment analysis. Objective: The objective of this study is to develop a predictive model that can detect users with depression from Twitter posts and instantly identify textual content associated with mental health topics. The model can also address the problem of anaphoric resolution and highlight anaphoric interpretations. Methods: We retrieved the data set from Twitter by using a regular expression or stream of real-time tweets comprising 3682 users, of which 1983 self-declared their depression and 1699 declared no depression. Two multiple instance learning models were developed—one with and one without an anaphoric resolution encoder—to identify users with depression and highlight posts related to the mental health of the author. Several previously published models were applied to our data set, and their performance was compared with that of our models. Results: The maximum accuracy, F1 score, and area under the curve of our anaphoric resolution model were 92%, 92%, and 90%, respectively. The model outperformed alternative predictive models, which ranged from classical machine learning models to deep learning models. Conclusions: Our model with anaphoric resolution shows promising results when compared with other predictive models and provides valuable insights into textual content that is relevant to the mental health of the tweeter. %M 34383688 %R 10.2196/19824 %U https://mental.jmir.org/2021/8/e19824 %U https://doi.org/10.2196/19824 %U http://www.ncbi.nlm.nih.gov/pubmed/34383688 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 8 %P e25670 %T Construction of Genealogical Knowledge Graphs From Obituaries: Multitask Neural Network Extraction System %A He,Kai %A Yao,Lixia %A Zhang,JiaWei %A Li,Yufei %A Li,Chen %+ School of Computer Science and Technology, Xi’an Jiaotong University, Xianning West Road, 27th, Xi’an, 0086 710049, China, 86 158 0290 2703, cli@xjtu.edu.cn %K genealogical knowledge graph %K EHR %K information extraction %K genealogy %K neural network %D 2021 %7 4.8.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Genealogical information, such as that found in family trees, is imperative for biomedical research such as disease heritability and risk prediction. Researchers have used policyholder and their dependent information in medical claims data and emergency contacts in electronic health records (EHRs) to infer family relationships at a large scale. We have previously demonstrated that online obituaries can be a novel data source for building more complete and accurate family trees. Objective: Aiming at supplementing EHR data with family relationships for biomedical research, we built an end-to-end information extraction system using a multitask-based artificial neural network model to construct genealogical knowledge graphs (GKGs) from online obituaries. GKGs are enriched family trees with detailed information including age, gender, death and birth dates, and residence. Methods: Built on a predefined family relationship map consisting of 4 types of entities (eg, people’s name, residence, birth date, and death date) and 71 types of relationships, we curated a corpus containing 1700 online obituaries from the metropolitan area of Minneapolis and St Paul in Minnesota. We also adopted data augmentation technology to generate additional synthetic data to alleviate the issue of data scarcity for rare family relationships. A multitask-based artificial neural network model was then built to simultaneously detect names, extract relationships between them, and assign attributes (eg, birth dates and death dates, residence, age, and gender) to each individual. In the end, we assemble related GKGs into larger ones by identifying people appearing in multiple obituaries. Results: Our system achieved satisfying precision (94.79%), recall (91.45%), and F-1 measures (93.09%) on 10-fold cross-validation. We also constructed 12,407 GKGs, with the largest one made up of 4 generations and 30 people. Conclusions: In this work, we discussed the meaning of GKGs for biomedical research, presented a new version of a corpus with a predefined family relationship map and augmented training data, and proposed a multitask deep neural system to construct and assemble GKGs. The results show our system can extract and demonstrate the potential of enriching EHR data for more genetic research. We share the source codes and system with the entire scientific community on GitHub without the corpus for privacy protection. %M 34346903 %R 10.2196/25670 %U https://www.jmir.org/2021/8/e25670 %U https://doi.org/10.2196/25670 %U http://www.ncbi.nlm.nih.gov/pubmed/34346903 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 9 %N 8 %P e23938 %T Comparison of the Validity and Generalizability of Machine Learning Algorithms for the Prediction of Energy Expenditure: Validation Study %A O'Driscoll,Ruairi %A Turicchi,Jake %A Hopkins,Mark %A Duarte,Cristiana %A Horgan,Graham W %A Finlayson,Graham %A Stubbs,R James %+ Appetite Control and Energy Balance Group, School of Psychology, University of Leeds, Woodhouse, Leeds, United Kingdom, 44 113 343 2846, psrod@leeds.ac.uk %K bioenergetics %K energy balance %K accelerometers %K machine learning %K validation %D 2021 %7 4.8.2021 %9 Original Paper %J JMIR Mhealth Uhealth %G English %X Background: Accurate solutions for the estimation of physical activity and energy expenditure at scale are needed for a range of medical and health research fields. Machine learning techniques show promise in research-grade accelerometers, and some evidence indicates that these techniques can be applied to more scalable commercial devices. Objective: This study aims to test the validity and out-of-sample generalizability of algorithms for the prediction of energy expenditure in several wearables (ie, Fitbit Charge 2, ActiGraph GT3-x, SenseWear Armband Mini, and Polar H7) using two laboratory data sets comprising different activities. Methods: Two laboratory studies (study 1: n=59, age 44.4 years, weight 75.7 kg; study 2: n=30, age=31.9 years, weight=70.6 kg), in which adult participants performed a sequential lab-based activity protocol consisting of resting, household, ambulatory, and nonambulatory tasks, were combined in this study. In both studies, accelerometer and physiological data were collected from the wearables alongside energy expenditure using indirect calorimetry. Three regression algorithms were used to predict metabolic equivalents (METs; ie, random forest, gradient boosting, and neural networks), and five classification algorithms (ie, k-nearest neighbor, support vector machine, random forest, gradient boosting, and neural networks) were used for physical activity intensity classification as sedentary, light, or moderate to vigorous. Algorithms were evaluated using leave-one-subject-out cross-validations and out-of-sample validations. Results: The root mean square error (RMSE) was lowest for gradient boosting applied to SenseWear and Polar H7 data (0.91 METs), and in the classification task, gradient boost applied to SenseWear and Polar H7 was the most accurate (85.5%). Fitbit models achieved an RMSE of 1.36 METs and 78.2% accuracy for classification. Errors tended to increase in out-of-sample validations with the SenseWear neural network achieving RMSE values of 1.22 METs in the regression tasks and the SenseWear gradient boost and random forest achieving an accuracy of 80% in classification tasks. Conclusions: Algorithms trained on combined data sets demonstrated high predictive accuracy, with a tendency for superior performance of random forests and gradient boosting for most but not all wearable devices. Predictions were poorer in the between-study validations, which creates uncertainty regarding the generalizability of the tested algorithms. %M 34346890 %R 10.2196/23938 %U https://mhealth.jmir.org/2021/8/e23938 %U https://doi.org/10.2196/23938 %U http://www.ncbi.nlm.nih.gov/pubmed/34346890 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 8 %P e26256 %T Artificial Intelligence–Based Prediction of Lung Cancer Risk Using Nonimaging Electronic Medical Records: Deep Learning Approach %A Yeh,Marvin Chia-Han %A Wang,Yu-Hsiang %A Yang,Hsuan-Chia %A Bai,Kuan-Jen %A Wang,Hsiao-Han %A Li,Yu-Chuan Jack %+ Department of Dermatology, Wan Fang Hospital, Taipei Medical University, No 111, Section 3, Xinglong Road, Wenshan District, Taipei, 116, Taiwan, 886 29307930 ext 2980, jaak88@gmail.com %K artificial intelligence %K lung cancer screening %K electronic medical record %D 2021 %7 3.8.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence approaches can integrate complex features and can be used to predict a patient’s risk of developing lung cancer, thereby decreasing the need for unnecessary and expensive diagnostic interventions. Objective: The aim of this study was to use electronic medical records to prescreen patients who are at risk of developing lung cancer. Methods: We randomly selected 2 million participants from the Taiwan National Health Insurance Research Database who received care between 1999 and 2013. We built a predictive lung cancer screening model with neural networks that were trained and validated using pre-2012 data, and we tested the model prospectively on post-2012 data. An age- and gender-matched subgroup that was 10 times larger than the original lung cancer group was used to assess the predictive power of the electronic medical record. Discrimination (area under the receiver operating characteristic curve [AUC]) and calibration analyses were performed. Results: The analysis included 11,617 patients with lung cancer and 1,423,154 control patients. The model achieved AUCs of 0.90 for the overall population and 0.87 in patients ≥55 years of age. The AUC in the matched subgroup was 0.82. The positive predictive value was highest (14.3%) among people aged ≥55 years with a pre-existing history of lung disease. Conclusions: Our model achieved excellent performance in predicting lung cancer within 1 year and has potential to be deployed for digital patient screening. Convolution neural networks facilitate the effective use of EMRs to identify individuals at high risk for developing lung cancer. %M 34342588 %R 10.2196/26256 %U https://www.jmir.org/2021/8/e26256 %U https://doi.org/10.2196/26256 %U http://www.ncbi.nlm.nih.gov/pubmed/34342588 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 8 %P e29433 %T Foodborne Disease Risk Prediction Using Multigraph Structural Long Short-term Memory Networks: Algorithm Design and Validation Study %A Du,Yi %A Wang,Hanxue %A Cui,Wenjuan %A Zhu,Hengshu %A Guo,Yunchang %A Dharejo,Fayaz Ali %A Zhou,Yuanchun %+ Computer Network Information Center, Chinese Academy of Sciences, Information Technology Building of Chinese Academy of Sciences, No. 2 Dongsheng South Road, Zhongguancun, Haidian District, Beijing, 100089, China, 86 15810134970, duyi@cnic.cn %K foodborne disease %K risk %K prediction %K spatial–temporal data %D 2021 %7 2.8.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Foodborne disease is a common threat to human health worldwide, leading to millions of deaths every year. Thus, the accurate prediction foodborne disease risk is very urgent and of great importance for public health management. Objective: We aimed to design a spatial–temporal risk prediction model suitable for predicting foodborne disease risks in various regions, to provide guidance for the prevention and control of foodborne diseases. Methods: We designed a novel end-to-end framework to predict foodborne disease risk by using a multigraph structural long short-term memory neural network, which can utilize an encoder–decoder to achieve multistep prediction. In particular, to capture multiple spatial correlations, we divided regions by administrative area and constructed adjacent graphs with metrics that included region proximity, historical data similarity, regional function similarity, and exposure food similarity. We also integrated an attention mechanism in both spatial and temporal dimensions, as well as external factors, to refine prediction accuracy. We validated our model with a long-term real-world foodborne disease data set, comprising data from 2015 to 2019 from multiple provinces in China. Results: Our model can achieve F1 scores of 0.822, 0.679, 0.709, and 0.720 for single-month forecasts for the provinces of Beijing, Zhejiang, Shanxi and Hebei, respectively, and the highest F1 score was 20% higher than the best results of the other models. The experimental results clearly demonstrated that our approach can outperform other state-of-the-art models, with a margin. Conclusions: The spatial–temporal risk prediction model can take into account the spatial–temporal characteristics of foodborne disease data and accurately determine future disease spatial–temporal risks, thereby providing support for the prevention and risk assessment of foodborne disease. %M 34338648 %R 10.2196/29433 %U https://medinform.jmir.org/2021/8/e29433 %U https://doi.org/10.2196/29433 %U http://www.ncbi.nlm.nih.gov/pubmed/34338648 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 7 %P e26000 %T Effects of Background Colors, Flashes, and Exposure Values on the Accuracy of a Smartphone-Based Pill Recognition System Using a Deep Convolutional Neural Network: Deep Learning and Experimental Approach %A Cha,KyeongMin %A Woo,Hyun-Ki %A Park,Dohyun %A Chang,Dong Kyung %A Kang,Mira %+ Department of Digital Health, Samsung Advanced Institute of Health Sciences & Technology, Sungkyunkwan University, 81 Irwon-ro, Gangnam-gu, Seoul, 06351, Republic of Korea, 82 01099336838, kang.mirad@gmail.com %K pill recognition %K deep neural network %K image processing %K color space %K color difference %K pharmaceutical %K imaging %K photography %K neural network %K mobile phone %D 2021 %7 28.7.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Pill image recognition systems are difficult to develop due to differences in pill color, which are influenced by external factors such as the illumination from and the presence of a flash. Objective: In this study, the differences in color between reference images and real-world images were measured to determine the accuracy of a pill recognition system under 12 real-world conditions (ie, different background colors, the presence and absence of a flash, and different exposure values [EVs]). Methods: We analyzed 19 medications with different features (ie, different colors, shapes, and dosages). The average color difference was calculated based on the color distance between a reference image and a real-world image. Results: For images with black backgrounds, as the EV decreased, the top-1 and top-5 accuracies increased independently of the presence of a flash. The top-5 accuracy for images with black backgrounds increased from 26.8% to 72.6% when the flash was on and increased from 29.5% to 76.8% when the flash was off as the EV decreased. However, the top-5 accuracy increased from 62.1% to 78.4% for images with white backgrounds when the flash was on. The best top-1 accuracy was 51.1% (white background; flash on; EV of +2.0). The best top-5 accuracy was 78.4% (white background; flash on; EV of 0). Conclusions: The accuracy generally increased as the color difference decreased, except for images with black backgrounds and an EV of −2.0. This study revealed that background colors, the presence of a flash, and EVs in real-world conditions are important factors that affect the performance of a pill recognition model. %M 34319239 %R 10.2196/26000 %U https://medinform.jmir.org/2021/7/e26000 %U https://doi.org/10.2196/26000 %U http://www.ncbi.nlm.nih.gov/pubmed/34319239 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 5 %N 7 %P e27992 %T Clinical Utility and Functionality of an Artificial Intelligence–Based App to Predict Mortality in COVID-19: Mixed Methods Analysis %A Abdulaal,Ahmed %A Patel,Aatish %A Al-Hindawi,Ahmed %A Charani,Esmita %A Alqahtani,Saleh A %A Davies,Gary W %A Mughal,Nabeela %A Moore,Luke Stephen Prockter %+ National Institute for Health Research Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, Imperial College London, Commonwealth Building 8th Floor, Du Cane Road, London, W12 0NN, United Kingdom, 44 2033158273, l.moore@imperial.ac.uk %K app %K artificial intelligence %K coronavirus %K COVID-19 %K development %K function %K graphical user interface %K machine learning %K model %K mortality %K neural network %K prediction %K usability %K utility %D 2021 %7 28.7.2021 %9 Original Paper %J JMIR Form Res %G English %X Background: The artificial neural network (ANN) is an increasingly important tool in the context of solving complex medical classification problems. However, one of the principal challenges in leveraging artificial intelligence technology in the health care setting has been the relative inability to translate models into clinician workflow. Objective: Here we demonstrate the development of a COVID-19 outcome prediction app that utilizes an ANN and assesses its usability in the clinical setting. Methods: Usability assessment was conducted using the app, followed by a semistructured end-user interview. Usability was specified by effectiveness, efficiency, and satisfaction measures. These data were reported with descriptive statistics. The end-user interview data were analyzed using the thematic framework method, which allowed for the development of themes from the interview narratives. In total, 31 National Health Service physicians at a West London teaching hospital, including foundation physicians, senior house officers, registrars, and consultants, were included in this study. Results: All participants were able to complete the assessment, with a mean time to complete separate patient vignettes of 59.35 (SD 10.35) seconds. The mean system usability scale score was 91.94 (SD 8.54), which corresponds to a qualitative rating of “excellent.” The clinicians found the app intuitive and easy to use, with the majority describing its predictions as a useful adjunct to their clinical practice. The main concern was related to the use of the app in isolation rather than in conjunction with other clinical parameters. However, most clinicians speculated that the app could positively reinforce or validate their clinical decision-making. Conclusions: Translating artificial intelligence technologies into the clinical setting remains an important but challenging task. We demonstrate the effectiveness, efficiency, and system usability of a web-based app designed to predict the outcomes of patients with COVID-19 from an ANN. %M 34115603 %R 10.2196/27992 %U https://formative.jmir.org/2021/7/e27992 %U https://doi.org/10.2196/27992 %U http://www.ncbi.nlm.nih.gov/pubmed/34115603 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 10 %N 7 %P e27227 %T Safety and Acceptability of a Natural Language Artificial Intelligence Assistant to Deliver Clinical Follow-up to Cataract Surgery Patients: Proposal %A de Pennington,Nick %A Mole,Guy %A Lim,Ernest %A Milne-Ives,Madison %A Normando,Eduardo %A Xue,Kanmin %A Meinert,Edward %+ Centre for Health Technology, University of Plymouth, 6 Kirkby Place, Plymouth, PL4 6DR, United Kingdom, 44 7824446808, edward.meinert@plymouth.ac.uk %K artificial intelligence %K natural language processing %K telemedicine %K cataract %K aftercare %K speech recognition software %K medical informatics %K health services %K health communication %K delivery of health care %K patient acceptance of health care %K mental health %K cell phone %K internet %K conversational agent %K chatbot %K expert systems %K dialogue system %K relational agent %D 2021 %7 28.7.2021 %9 Proposal %J JMIR Res Protoc %G English %X Background: Due to an aging population, the demand for many services is exceeding the capacity of the clinical workforce. As a result, staff are facing a crisis of burnout from being pressured to deliver high-volume workloads, driving increasing costs for providers. Artificial intelligence (AI), in the form of conversational agents, presents a possible opportunity to enable efficiency in the delivery of care. Objective: This study aims to evaluate the effectiveness, usability, and acceptability of Dora agent: Ufonia’s autonomous voice conversational agent, an AI-enabled autonomous telemedicine call for the detection of postoperative cataract surgery patients who require further assessment. The objectives of this study are to establish Dora’s efficacy in comparison with an expert clinician, determine baseline sensitivity and specificity for the detection of true complications, evaluate patient acceptability, collect evidence for cost-effectiveness, and capture data to support further development and evaluation. Methods: Using an implementation science construct, the interdisciplinary study will be a mixed methods phase 1 pilot establishing interobserver reliability of the system, usability, and acceptability. This will be done using the following scales and frameworks: the system usability scale; assessment of Health Information Technology Interventions in Evidence-Based Medicine Evaluation Framework; the telehealth usability questionnaire; and the Non-Adoption, Abandonment, and Challenges to the Scale-up, Spread and Suitability framework. Results: The evaluation is expected to show that conversational technology can be used to conduct an accurate assessment and that it is acceptable to different populations with different backgrounds. In addition, the results will demonstrate how successfully the system can be delivered in organizations with different clinical pathways and how it can be integrated with their existing platforms. Conclusions: The project’s key contributions will be evidence of the effectiveness of AI voice conversational agents and their associated usability and acceptability. International Registered Report Identifier (IRRID): PRR1-10.2196/27227 %M 34319248 %R 10.2196/27227 %U https://www.researchprotocols.org/2021/7/e27227 %U https://doi.org/10.2196/27227 %U http://www.ncbi.nlm.nih.gov/pubmed/34319248 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 7 %P e23401 %T A Machine Learning–Based Algorithm for the Prediction of Intensive Care Unit Delirium (PRIDE): Retrospective Study %A Hur,Sujeong %A Ko,Ryoung-Eun %A Yoo,Junsang %A Ha,Juhyung %A Cha,Won Chul %A Chung,Chi Ryang %+ Department of Critical Care Medicine and Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-ro, Gangnam-gu, Seoul, 06351, Republic of Korea, 82 2 3410 3430, chiryang.chung@gmail.com %K clinical prediction %K delirium %K electronic health record %K intensive care unit %K machine learning %D 2021 %7 26.7.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Delirium frequently occurs among patients admitted to the intensive care unit (ICU). There is limited evidence to support interventions to treat or resolve delirium in patients who have already developed delirium. Therefore, the early recognition and prevention of delirium are important in the management of critically ill patients. Objective: This study aims to develop and validate a delirium prediction model within 24 hours of admission to the ICU using electronic health record data. The algorithm was named the Prediction of ICU Delirium (PRIDE). Methods: This is a retrospective cohort study performed at a tertiary referral hospital with 120 ICU beds. We only included patients who were 18 years or older at the time of admission and who stayed in the medical or surgical ICU. Patients were excluded if they lacked a Confusion Assessment Method for the ICU record from the day of ICU admission or if they had a positive Confusion Assessment Method for the ICU record at the time of ICU admission. The algorithm to predict delirium was developed using patient data from the first 2 years of the study period and validated using patient data from the last 6 months. Random forest (RF), Extreme Gradient Boosting (XGBoost), deep neural network (DNN), and logistic regression (LR) were used. The algorithms were externally validated using MIMIC-III data, and the algorithm with the largest area under the receiver operating characteristics (AUROC) curve in the external data set was named the PRIDE algorithm. Results: A total of 37,543 cases were collected. After patient exclusion, 12,409 remained as our study population, of which 3816 (30.8%) patients experienced delirium incidents during the study period. Based on the exclusion criteria, out of the 96,016 ICU admission cases in the MIMIC-III data set, 2061 cases were included, and 272 (13.2%) delirium incidents occurred. The average AUROCs and 95% CIs for internal validation were 0.916 (95% CI 0.916-0.916) for RF, 0.919 (95% CI 0.919-0.919) for XGBoost, 0.881 (95% CI 0.878-0.884) for DNN, and 0.875 (95% CI 0.875-0.875) for LR. Regarding the external validation, the best AUROC were 0.721 (95% CI 0.72-0.721) for RF, 0.697 (95% CI 0.695-0.699) for XGBoost, 0.655 (95% CI 0.654-0.657) for DNN, and 0.631 (95% CI 0.631-0.631) for LR. The Brier score of the RF model is 0.168, indicating that it is well-calibrated. Conclusions: A machine learning approach based on electronic health record data can be used to predict delirium within 24 hours of ICU admission. RF, XGBoost, DNN, and LR models were used, and they effectively predicted delirium. However, with the potential to advise ICU physicians and prevent ICU delirium, prospective studies are required to verify the algorithm’s performance. %M 34309567 %R 10.2196/23401 %U https://medinform.jmir.org/2021/7/e23401 %U https://doi.org/10.2196/23401 %U http://www.ncbi.nlm.nih.gov/pubmed/34309567 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 7 %P e19905 %T Patient Representation From Structured Electronic Medical Records Based on Embedding Technique: Development and Validation Study %A Huang,Yanqun %A Wang,Ni %A Zhang,Zhiqiang %A Liu,Honglei %A Fei,Xiaolu %A Wei,Lan %A Chen,Hui %+ School of Biomedical Engineering, Capital Medical University, No 10, Xitoutiao, Youanmenwai, Fengtai District, Beijing, 100069, China, 86 1083911545, chenhui@ccmu.edu.cn %K electronic medical records %K Skip-gram %K feature representation %K patient representation %K stroke %D 2021 %7 23.7.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: The secondary use of structured electronic medical record (sEMR) data has become a challenge due to the diversity, sparsity, and high dimensionality of the data representation. Constructing an effective representation for sEMR data is becoming more and more crucial for subsequent data applications. Objective: We aimed to apply the embedding technique used in the natural language processing domain for the sEMR data representation and to explore the feasibility and superiority of the embedding-based feature and patient representations in clinical application. Methods: The entire training corpus consisted of records of 104,752 hospitalized patients with 13,757 medical concepts of disease diagnoses, physical examinations and procedures, laboratory tests, medications, etc. Each medical concept was embedded into a 200-dimensional real number vector using the Skip-gram algorithm with some adaptive changes from shuffling the medical concepts in a record 20 times. The average of vectors for all medical concepts in a patient record represented the patient. For embedding-based feature representation evaluation, we used the cosine similarities among the medical concept vectors to capture the latent clinical associations among the medical concepts. We further conducted a clustering analysis on stroke patients to evaluate and compare the embedding-based patient representations. The Hopkins statistic, Silhouette index (SI), and Davies-Bouldin index were used for the unsupervised evaluation, and the precision, recall, and F1 score were used for the supervised evaluation. Results: The dimension of patient representation was reduced from 13,757 to 200 using the embedding-based representation. The average cosine similarity of the selected disease (subarachnoid hemorrhage) and its 15 clinically relevant medical concepts was 0.973. Stroke patients were clustered into two clusters with the highest SI (0.852). Clustering analyses conducted on patients with the embedding representations showed higher applicability (Hopkins statistic 0.931), higher aggregation (SI 0.862), and lower dispersion (Davies-Bouldin index 0.551) than those conducted on patients with reference representation methods. The clustering solutions for patients with the embedding-based representation achieved the highest F1 scores of 0.944 and 0.717 for two clusters. Conclusions: The feature-level embedding-based representations can reflect the potential clinical associations among medical concepts effectively. The patient-level embedding-based representation is easy to use as continuous input to standard machine learning algorithms and can bring performance improvements. It is expected that the embedding-based representation will be helpful in a wide range of secondary uses of sEMR data. %M 34297000 %R 10.2196/19905 %U https://medinform.jmir.org/2021/7/e19905 %U https://doi.org/10.2196/19905 %U http://www.ncbi.nlm.nih.gov/pubmed/34297000 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 7 %P e29226 %T Predicting Antituberculosis Drug–Induced Liver Injury Using an Interpretable Machine Learning Method: Model Development and Validation Study %A Zhong,Tao %A Zhuang,Zian %A Dong,Xiaoli %A Wong,Ka Hing %A Wong,Wing Tak %A Wang,Jian %A He,Daihai %A Liu,Shengyuan %+ Department of Tuberculosis Control, Shenzhen Nanshan Center for Chronic Disease Control, Hua Ming Road No 7, Nanshan District, Shenzhen, 518000, China, 86 13543301395, jfk@sznsmby.com %K accuracy %K drug %K drug-induced liver injury %K high accuracy %K injury %K interpretability %K interpretation %K liver %K machine learning %K model %K prediction %K treatment %K tuberculosis %K XGBoost algorithm %D 2021 %7 20.7.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Tuberculosis (TB) is a pandemic, being one of the top 10 causes of death and the main cause of death from a single source of infection. Drug-induced liver injury (DILI) is the most common and serious side effect during the treatment of TB. Objective: We aim to predict the status of liver injury in patients with TB at the clinical treatment stage. Methods: We designed an interpretable prediction model based on the XGBoost algorithm and identified the most robust and meaningful predictors of the risk of TB-DILI on the basis of clinical data extracted from the Hospital Information System of Shenzhen Nanshan Center for Chronic Disease Control from 2014 to 2019. Results: In total, 757 patients were included, and 287 (38%) had developed TB-DILI. Based on values of relative importance and area under the receiver operating characteristic curve, machine learning tools selected patients’ most recent alanine transaminase levels, average rate of change of patients’ last 2 measures of alanine transaminase levels, cumulative dose of pyrazinamide, and cumulative dose of ethambutol as the best predictors for assessing the risk of TB-DILI. In the validation data set, the model had a precision of 90%, recall of 74%, classification accuracy of 76%, and balanced error rate of 77% in predicting cases of TB-DILI. The area under the receiver operating characteristic curve score upon 10-fold cross-validation was 0.912 (95% CI 0.890-0.935). In addition, the model provided warnings of high risk for patients in advance of DILI onset for a median of 15 (IQR 7.3-27.5) days. Conclusions: Our model shows high accuracy and interpretability in predicting cases of TB-DILI, which can provide useful information to clinicians to adjust the medication regimen and avoid more serious liver injury in patients. %M 34283036 %R 10.2196/29226 %U https://medinform.jmir.org/2021/7/e29226 %U https://doi.org/10.2196/29226 %U http://www.ncbi.nlm.nih.gov/pubmed/34283036 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 7 %P e22021 %T Relative Performance of Machine Learning and Linear Regression in Predicting Quality of Life and Academic Performance of School Children in Norway: Data Analysis of a Quasi-Experimental Study %A Froud,Robert %A Hansen,Solveig Hakestad %A Ruud,Hans Kristian %A Foss,Jonathan %A Ferguson,Leila %A Fredriksen,Per Morten %+ School of Health Sciences, Kristiania University College, Prinsens Gate 7-9, Oslo, 0107, Norway, 47 1732494636, rob.froud@kristiania.no %K modelling %K linear regression %K machine learning %K artificial intelligence %K quality of life %K academic performance %K continuous/quasi-continuous health outcomes %D 2021 %7 16.7.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Machine learning techniques are increasingly being applied in health research. It is not clear how useful these approaches are for modeling continuous outcomes. Child quality of life is associated with parental socioeconomic status and physical activity and may be associated with aerobic fitness and strength. It is unclear whether diet or academic performance is associated with quality of life. Objective: The purpose of this study was to compare the predictive performance of machine learning techniques with that of linear regression in examining the extent to which continuous outcomes (physical activity, aerobic fitness, muscular strength, diet, and parental education) are predictive of academic performance and quality of life and whether academic performance and quality of life are associated. Methods: We modeled data from children attending 9 schools in a quasi-experimental study. We split data randomly into training and validation sets. Curvilinear, nonlinear, and heteroscedastic variables were simulated to examine the performance of machine learning techniques compared to that of linear models, with and without imputation. Results: We included data for 1711 children. Regression models explained 24% of academic performance variance in the real complete-case validation set, and up to 15% in quality of life. While machine learning techniques explained high proportions of variance in training sets, in validation, machine learning techniques explained approximately 0% of academic performance and 3% to 8% of quality of life. With imputation, machine learning techniques improved to 15% for academic performance. Machine learning outperformed regression for simulated nonlinear and heteroscedastic variables. The best predictors of academic performance in adjusted models were the child’s mother having a master-level education (P<.001; β=1.98, 95% CI 0.25 to 3.71), increased television and computer use (P=.03; β=1.19, 95% CI 0.25 to 3.71), and dichotomized self-reported exercise (P=.001; β=2.47, 95% CI 1.08 to 3.87). For quality of life, self-reported exercise (P<.001; β=1.09, 95% CI 0.53 to 1.66) and increased television and computer use (P=.002; β=−0.95, 95% CI −1.55 to −0.36) were the best predictors. Adjusted academic performance was associated with quality of life (P=.02; β=0.12, 95% CI 0.02 to 0.22). Conclusions: Linear regression was less prone to overfitting and outperformed commonly used machine learning techniques. Imputation improved the performance of machine learning, but not sufficiently to outperform regression. Machine learning techniques outperformed linear regression for modeling nonlinear and heteroscedastic relationships and may be of use in such cases. Regression with splines performed almost as well in nonlinear modeling. Lifestyle variables, including physical exercise, television and computer use, and parental education are predictive of academic performance or quality of life. Academic performance is associated with quality of life after adjusting for lifestyle variables and may offer another promising intervention target to improve quality of life in children. %M 34009128 %R 10.2196/22021 %U https://www.jmir.org/2021/7/e22021 %U https://doi.org/10.2196/22021 %U http://www.ncbi.nlm.nih.gov/pubmed/34009128 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 7 %P e27370 %T Diagnostic Accuracy of Artificial Intelligence and Computer-Aided Diagnosis for the Detection and Characterization of Colorectal Polyps: Systematic Review and Meta-analysis %A Nazarian,Scarlet %A Glover,Ben %A Ashrafian,Hutan %A Darzi,Ara %A Teare,Julian %+ Department of Surgery and Cancer, Imperial College London, 10th Floor, QEQM Building, St Mary’s Hospital, Praed Street, London, W2 1NY, United Kingdom, 44 2075895111, h.ashrafian@imperial.ac.uk %K artificial intelligence %K colonoscopy %K computer-aided diagnosis %K machine learning %K polyp %D 2021 %7 14.7.2021 %9 Review %J J Med Internet Res %G English %X Background: Colonoscopy reduces the incidence of colorectal cancer (CRC) by allowing detection and resection of neoplastic polyps. Evidence shows that many small polyps are missed on a single colonoscopy. There has been a successful adoption of artificial intelligence (AI) technologies to tackle the issues around missed polyps and as tools to increase the adenoma detection rate (ADR). Objective: The aim of this review was to examine the diagnostic accuracy of AI-based technologies in assessing colorectal polyps. Methods: A comprehensive literature search was undertaken using the databases of Embase, MEDLINE, and the Cochrane Library. PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines were followed. Studies reporting the use of computer-aided diagnosis for polyp detection or characterization during colonoscopy were included. Independent proportions and their differences were calculated and pooled through DerSimonian and Laird random-effects modeling. Results: A total of 48 studies were included. The meta-analysis showed a significant increase in pooled polyp detection rate in patients with the use of AI for polyp detection during colonoscopy compared with patients who had standard colonoscopy (odds ratio [OR] 1.75, 95% CI 1.56-1.96; P<.001). When comparing patients undergoing colonoscopy with the use of AI to those without, there was also a significant increase in ADR (OR 1.53, 95% CI 1.32-1.77; P<.001). Conclusions: With the aid of machine learning, there is potential to improve ADR and, consequently, reduce the incidence of CRC. The current generation of AI-based systems demonstrate impressive accuracy for the detection and characterization of colorectal polyps. However, this is an evolving field and before its adoption into a clinical setting, AI systems must prove worthy to patients and clinicians. Trial Registration: PROSPERO International Prospective Register of Systematic Reviews CRD42020169786; https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42020169786 %M 34259645 %R 10.2196/27370 %U https://www.jmir.org/2021/7/e27370 %U https://doi.org/10.2196/27370 %U http://www.ncbi.nlm.nih.gov/pubmed/34259645 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 7 %P e27822 %T Application of an Anomaly Detection Model to Screen for Ocular Diseases Using Color Retinal Fundus Images: Design and Evaluation Study %A Han,Yong %A Li,Weiming %A Liu,Mengmeng %A Wu,Zhiyuan %A Zhang,Feng %A Liu,Xiangtong %A Tao,Lixin %A Li,Xia %A Guo,Xiuhua %+ Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, No 10 Xitoutiao, You’anmen Wai, Fengtai District, Beijing, 1000069, China, 86 13661283546, statguo@ccmu.edu.cn %K anomaly detection %K artificial intelligence %K cataract %K diabetic retinopathy %K disease screening %K eye %K fundus image %K glaucoma %K macular degeneration %K ocular disease %K ophthalmology %D 2021 %7 13.7.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: The supervised deep learning approach provides state-of-the-art performance in a variety of fundus image classification tasks, but it is not applicable for screening tasks with numerous or unknown disease types. The unsupervised anomaly detection (AD) approach, which needs only normal samples to develop a model, may be a workable and cost-saving method of screening for ocular diseases. Objective: This study aimed to develop and evaluate an AD model for detecting ocular diseases on the basis of color fundus images. Methods: A generative adversarial network–based AD method for detecting possible ocular diseases was developed and evaluated using 90,499 retinal fundus images derived from 4 large-scale real-world data sets. Four other independent external test sets were used for external testing and further analysis of the model’s performance in detecting 6 common ocular diseases (diabetic retinopathy [DR], glaucoma, cataract, age-related macular degeneration, hypertensive retinopathy [HR], and myopia), DR of different severity levels, and 36 categories of abnormal fundus images. The area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity of the model’s performance were calculated and presented. Results: Our model achieved an AUC of 0.896 with 82.69% sensitivity and 82.63% specificity in detecting abnormal fundus images in the internal test set, and it achieved an AUC of 0.900 with 83.25% sensitivity and 85.19% specificity in 1 external proprietary data set. In the detection of 6 common ocular diseases, the AUCs for DR, glaucoma, cataract, AMD, HR, and myopia were 0.891, 0.916, 0.912, 0.867, 0.895, and 0.961, respectively. Moreover, the AD model had an AUC of 0.868 for detecting any DR, 0.908 for detecting referable DR, and 0.926 for detecting vision-threatening DR. Conclusions: The AD approach achieved high sensitivity and specificity in detecting ocular diseases on the basis of fundus images, which implies that this model might be an efficient and economical tool for optimizing current clinical pathways for ophthalmologists. Future studies are required to evaluate the practical applicability of the AD approach in ocular disease screening. %M 34255681 %R 10.2196/27822 %U https://www.jmir.org/2021/7/e27822 %U https://doi.org/10.2196/27822 %U http://www.ncbi.nlm.nih.gov/pubmed/34255681 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 7 %P e24633 %T Digital Medical Device Companion (MyIUS) for New Users of Intrauterine Systems: App Development Study %A Karakoyun,Toeresin %A Podhaisky,Hans-Peter %A Frenz,Ann-Kathrin %A Schuhmann-Giampieri,Gabriele %A Ushikusa,Thais %A Schröder,Daniel %A Zvolanek,Michal %A Lopes Da Silva Filho,Agnaldo %+ eHealth and Medical Software Solutions, Bayer AG, eHealth & Medical Software Solutions, Building 0459, Wuppertal, 42096, Germany, 49 152 23914568, toeresin.karakoyun@bayer.com %K medical device %K levonorgestrel-releasing intrauterine system %K mobile medical app %K mobile phone %D 2021 %7 13.7.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Women choosing a levonorgestrel-releasing intrauterine system may experience changes in their menstrual bleeding pattern during the first months following placement. Objective: Although health care professionals (HCPs) can provide counseling, no method of providing individualized information on the expected bleeding pattern or continued support is currently available for women experiencing postplacement bleeding changes. We aim to develop a mobile phone–based medical app (MyIUS) to meet this need and provide a digital companion to women after the placement of the intrauterine system. Methods: The MyIUS app is classified as a medical device and uses an artificial intelligence–based bleeding pattern prediction algorithm to estimate a woman’s future bleeding pattern in terms of intensity and regularity. We developed the app with the help of a multidisciplinary team by using a robust and high-quality design process in the context of a constantly evolving regulatory landscape. The development framework consisted of a phased approach including ideation, feasibility and concept finalization, product development, and product deployment or localization stages. Results: The MyIUS app was considered useful by HCPs and easy to use by women who were consulted during the development process. Following the launch of the sustainable app in selected pilot countries, performance metrics will be gathered to facilitate further technical and feature updates and enhancements. A real-world performance study will also be conducted to allow us to upgrade the app in accordance with the new European Commission Medical Device legislation and to validate the bleeding pattern prediction algorithm in a real-world setting. Conclusions: By providing a meaningful estimation of bleeding patterns and allowing an individualized approach to counseling and discussions about contraceptive method choice, the MyIUS app offers a useful tool that may benefit both women and HCPs. Further work is needed to validate the performance of the prediction algorithm and MyIUS app in a real-world setting. %M 34255688 %R 10.2196/24633 %U https://medinform.jmir.org/2021/7/e24633 %U https://doi.org/10.2196/24633 %U http://www.ncbi.nlm.nih.gov/pubmed/34255688 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 7 %P e26151 %T Clinically Applicable Segmentation of Head and Neck Anatomy for Radiotherapy: Deep Learning Algorithm Development and Validation Study %A Nikolov,Stanislav %A Blackwell,Sam %A Zverovitch,Alexei %A Mendes,Ruheena %A Livne,Michelle %A De Fauw,Jeffrey %A Patel,Yojan %A Meyer,Clemens %A Askham,Harry %A Romera-Paredes,Bernadino %A Kelly,Christopher %A Karthikesalingam,Alan %A Chu,Carlton %A Carnell,Dawn %A Boon,Cheng %A D'Souza,Derek %A Moinuddin,Syed Ali %A Garie,Bethany %A McQuinlan,Yasmin %A Ireland,Sarah %A Hampton,Kiarna %A Fuller,Krystle %A Montgomery,Hugh %A Rees,Geraint %A Suleyman,Mustafa %A Back,Trevor %A Hughes,Cían Owen %A Ledsam,Joseph R %A Ronneberger,Olaf %+ Google Health, 6 Pancras Square, London, N1C 4AG, United Kingdom, 1 650 253 0000, cianh@google.com %K radiotherapy %K segmentation %K contouring %K machine learning %K artificial intelligence %K UNet %K convolutional neural networks %K surface DSC %D 2021 %7 12.7.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Over half a million individuals are diagnosed with head and neck cancer each year globally. Radiotherapy is an important curative treatment for this disease, but it requires manual time to delineate radiosensitive organs at risk. This planning process can delay treatment while also introducing interoperator variability, resulting in downstream radiation dose differences. Although auto-segmentation algorithms offer a potentially time-saving solution, the challenges in defining, quantifying, and achieving expert performance remain. Objective: Adopting a deep learning approach, we aim to demonstrate a 3D U-Net architecture that achieves expert-level performance in delineating 21 distinct head and neck organs at risk commonly segmented in clinical practice. Methods: The model was trained on a data set of 663 deidentified computed tomography scans acquired in routine clinical practice and with both segmentations taken from clinical practice and segmentations created by experienced radiographers as part of this research, all in accordance with consensus organ at risk definitions. Results: We demonstrated the model’s clinical applicability by assessing its performance on a test set of 21 computed tomography scans from clinical practice, each with 21 organs at risk segmented by 2 independent experts. We also introduced surface Dice similarity coefficient, a new metric for the comparison of organ delineation, to quantify the deviation between organ at risk surface contours rather than volumes, better reflecting the clinical task of correcting errors in automated organ segmentations. The model’s generalizability was then demonstrated on 2 distinct open-source data sets, reflecting different centers and countries to model training. Conclusions: Deep learning is an effective and clinically applicable technique for the segmentation of the head and neck anatomy for radiotherapy. With appropriate validation studies and regulatory approvals, this system could improve the efficiency, consistency, and safety of radiotherapy pathways. %M 34255661 %R 10.2196/26151 %U https://www.jmir.org/2021/7/e26151 %U https://doi.org/10.2196/26151 %U http://www.ncbi.nlm.nih.gov/pubmed/34255661 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 10 %N 7 %P e27532 %T Predicting and Responding to Clinical Deterioration in Hospitalized Patients by Using Artificial Intelligence: Protocol for a Mixed Methods, Stepped Wedge Study %A Holdsworth,Laura M %A Kling,Samantha M R %A Smith,Margaret %A Safaeinili,Nadia %A Shieh,Lisa %A Vilendrer,Stacie %A Garvert,Donn W %A Winget,Marcy %A Asch,Steven M %A Li,Ron C %+ Department of Medicine, School of Medicine, Stanford University, 1265 Welch Rd, Stanford, CA, , United States, 1 650 736 3391, lmh1@stanford.edu %K artificial intelligence %K clinical deterioration %K rapid response team %K mixed methods %K workflow %K predictive models, SEIPS 2.0 %D 2021 %7 7.7.2021 %9 Protocol %J JMIR Res Protoc %G English %X Background: The early identification of clinical deterioration in patients in hospital units can decrease mortality rates and improve other patient outcomes; yet, this remains a challenge in busy hospital settings. Artificial intelligence (AI), in the form of predictive models, is increasingly being explored for its potential to assist clinicians in predicting clinical deterioration. Objective: Using the Systems Engineering Initiative for Patient Safety (SEIPS) 2.0 model, this study aims to assess whether an AI-enabled work system improves clinical outcomes, describe how the clinical deterioration index (CDI) predictive model and associated work processes are implemented, and define the emergent properties of the AI-enabled work system that mediate the observed clinical outcomes. Methods: This study will use a mixed methods approach that is informed by the SEIPS 2.0 model to assess both processes and outcomes and focus on how physician-nurse clinical teams are affected by the presence of AI. The intervention will be implemented in hospital medicine units based on a modified stepped wedge design featuring three stages over 11 months—stage 0 represents a baseline period 10 months before the implementation of the intervention; stage 1 introduces the CDI predictions to physicians only and triggers a physician-driven workflow; and stage 2 introduces the CDI predictions to the multidisciplinary team, which includes physicians and nurses, and triggers a nurse-driven workflow. Quantitative data will be collected from the electronic health record for the clinical processes and outcomes. Interviews will be conducted with members of the multidisciplinary team to understand how the intervention changes the existing work system and processes. The SEIPS 2.0 model will provide an analytic framework for a mixed methods analysis. Results: A pilot period for the study began in December 2020, and the results are expected in mid-2022. Conclusions: This protocol paper proposes an approach to evaluation that recognizes the importance of assessing both processes and outcomes to understand how a multifaceted AI-enabled intervention affects the complex team-based work of identifying and managing clinical deterioration. International Registered Report Identifier (IRRID): PRR1-10.2196/27532 %M 34255728 %R 10.2196/27532 %U https://www.researchprotocols.org/2021/7/e27532 %U https://doi.org/10.2196/27532 %U http://www.ncbi.nlm.nih.gov/pubmed/34255728 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 7 %P e28361 %T Developing a Time-Adaptive Prediction Model for Out-of-Hospital Cardiac Arrest: Nationwide Cohort Study in Korea %A Kim,Ji Woong %A Ha,Juhyung %A Kim,Taerim %A Yoon,Hee %A Hwang,Sung Yeon %A Jo,Ik Joon %A Shin,Tae Gun %A Sim,Min Seob %A Kim,Kyunga %A Cha,Won Chul %+ Department of Digital Health, Samsung Advanced Institute for Health Science & Technology, Sungkyunkwan University, 115, Irwon-ro, Gangnam-gu, Seoul, Republic of Korea, Seoul, Republic of Korea, 82 2 3410 2053, docchaster@gmail.com %K out-of-hospital cardiac arrest %K Republic of Korea %K machine learning %K artificial intelligence %K prognosis %K cardiology %K prediction model %D 2021 %7 5.7.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Out-of-hospital cardiac arrest (OHCA) is a serious public health issue, and predicting the prognosis of OHCA patients can assist clinicians in making decisions about the treatment of patients, use of hospital resources, or termination of resuscitation. Objective: This study aimed to develop a time-adaptive conditional prediction model (TACOM) to predict clinical outcomes every minute. Methods: We performed a retrospective observational study using data from the Korea OHCA Registry in South Korea. In this study, we excluded patients with trauma, those who experienced return of spontaneous circulation before arriving in the emergency department (ED), and those who did not receive cardiopulmonary resuscitation (CPR) in the ED. We selected patients who received CPR in the ED. To develop the time-adaptive prediction model, we organized the training data set as ongoing CPR patients by the minute. A total of 49,669 patients were divided into 39,602 subjects for training and 10,067 subjects for validation. We compared random forest, LightGBM, and artificial neural networks as the prediction model methods. Model performance was quantified using the prediction probability of the model, area under the receiver operating characteristic curve (AUROC), and area under the precision recall curve. Results: Among the three algorithms, LightGBM showed the best performance. From 0 to 30 min, the AUROC of the TACOM for predicting good neurological outcomes ranged from 0.910 (95% CI 0.910-0.911) to 0.869 (95% CI 0.865-0.871), whereas that for survival to hospital discharge ranged from 0.800 (95% CI 0.797-0.800) to 0.734 (95% CI 0.736-0.740). The prediction probability of the TACOM showed similar flow with cohort data based on a comparison with the conventional model’s prediction probability. Conclusions: The TACOM predicted the clinical outcome of OHCA patients per minute. This model for predicting patient outcomes by the minute can assist clinicians in making rational decisions for OHCA patients. %M 36260382 %R 10.2196/28361 %U https://www.jmir.org/2021/7/e28361/ %U https://doi.org/10.2196/28361 %U http://www.ncbi.nlm.nih.gov/pubmed/36260382 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 7 %P e23863 %T Performance and Limitation of Machine Learning Algorithms for Diabetic Retinopathy Screening: Meta-analysis %A Wu,Jo-Hsuan %A Liu,T Y Alvin %A Hsu,Wan-Ting %A Ho,Jennifer Hui-Chun %A Lee,Chien-Chang %+ Department of Emergency Medicine, National Taiwan University Hospital, No 7, Chung-Shan South Road, Taipei, 100, Taiwan, 886 2 23123456 ext 63485, hit3transparency@gmail.com %K machine learning %K diabetic retinopathy %K diabetes %K deep learning %K neural network %K diagnostic accuracy %D 2021 %7 5.7.2021 %9 Review %J J Med Internet Res %G English %X Background: Diabetic retinopathy (DR), whose standard diagnosis is performed by human experts, has high prevalence and requires a more efficient screening method. Although machine learning (ML)–based automated DR diagnosis has gained attention due to recent approval of IDx-DR, performance of this tool has not been examined systematically, and the best ML technique for use in a real-world setting has not been discussed. Objective: The aim of this study was to systematically examine the overall diagnostic accuracy of ML in diagnosing DR of different categories based on color fundus photographs and to determine the state-of-the-art ML approach. Methods: Published studies in PubMed and EMBASE were searched from inception to June 2020. Studies were screened for relevant outcomes, publication types, and data sufficiency, and a total of 60 out of 2128 (2.82%) studies were retrieved after study selection. Extraction of data was performed by 2 authors according to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses), and the quality assessment was performed according to the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2). Meta-analysis of diagnostic accuracy was pooled using a bivariate random effects model. The main outcomes included diagnostic accuracy, sensitivity, and specificity of ML in diagnosing DR based on color fundus photographs, as well as the performances of different major types of ML algorithms. Results: The primary meta-analysis included 60 color fundus photograph studies (445,175 interpretations). Overall, ML demonstrated high accuracy in diagnosing DR of various categories, with a pooled area under the receiver operating characteristic (AUROC) ranging from 0.97 (95% CI 0.96-0.99) to 0.99 (95% CI 0.98-1.00). The performance of ML in detecting more-than-mild DR was robust (sensitivity 0.95; AUROC 0.97), and by subgroup analyses, we observed that robust performance of ML was not limited to benchmark data sets (sensitivity 0.92; AUROC 0.96) but could be generalized to images collected in clinical practice (sensitivity 0.97; AUROC 0.97). Neural network was the most widely used method, and the subgroup analysis revealed a pooled AUROC of 0.98 (95% CI 0.96-0.99) for studies that used neural networks to diagnose more-than-mild DR. Conclusions: This meta-analysis demonstrated high diagnostic accuracy of ML algorithms in detecting DR on color fundus photographs, suggesting that state-of-the-art, ML-based DR screening algorithms are likely ready for clinical applications. However, a significant portion of the earlier published studies had methodology flaws, such as the lack of external validation and presence of spectrum bias. The results of these studies should be interpreted with caution. %M 34407500 %R 10.2196/23863 %U https://www.jmir.org/2021/7/e23863 %U https://doi.org/10.2196/23863 %U http://www.ncbi.nlm.nih.gov/pubmed/34407500 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 7 %P e20708 %T Integrating Patient Data Into Skin Cancer Classification Using Convolutional Neural Networks: Systematic Review %A Höhn,Julia %A Hekler,Achim %A Krieghoff-Henning,Eva %A Kather,Jakob Nikolas %A Utikal,Jochen Sven %A Meier,Friedegund %A Gellrich,Frank Friedrich %A Hauschild,Axel %A French,Lars %A Schlager,Justin Gabriel %A Ghoreschi,Kamran %A Wilhelm,Tabea %A Kutzner,Heinz %A Heppt,Markus %A Haferkamp,Sebastian %A Sondermann,Wiebke %A Schadendorf,Dirk %A Schilling,Bastian %A Maron,Roman C %A Schmitt,Max %A Jutzi,Tanja %A Fröhling,Stefan %A Lipka,Daniel B %A Brinker,Titus Josef %+ Digital Biomarkers for Oncology Group (DBO), National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 460, Heidelberg, Germany, 49 62213219304, titus.brinker@nct-heidelberg.de %K skin cancer classification %K convolutional neural networks %K patient data %D 2021 %7 2.7.2021 %9 Review %J J Med Internet Res %G English %X Background: Recent years have been witnessing a substantial improvement in the accuracy of skin cancer classification using convolutional neural networks (CNNs). CNNs perform on par with or better than dermatologists with respect to the classification tasks of single images. However, in clinical practice, dermatologists also use other patient data beyond the visual aspects present in a digitized image, further increasing their diagnostic accuracy. Several pilot studies have recently investigated the effects of integrating different subtypes of patient data into CNN-based skin cancer classifiers. Objective: This systematic review focuses on the current research investigating the impact of merging information from image features and patient data on the performance of CNN-based skin cancer image classification. This study aims to explore the potential in this field of research by evaluating the types of patient data used, the ways in which the nonimage data are encoded and merged with the image features, and the impact of the integration on the classifier performance. Methods: Google Scholar, PubMed, MEDLINE, and ScienceDirect were screened for peer-reviewed studies published in English that dealt with the integration of patient data within a CNN-based skin cancer classification. The search terms skin cancer classification, convolutional neural network(s), deep learning, lesions, melanoma, metadata, clinical information, and patient data were combined. Results: A total of 11 publications fulfilled the inclusion criteria. All of them reported an overall improvement in different skin lesion classification tasks with patient data integration. The most commonly used patient data were age, sex, and lesion location. The patient data were mostly one-hot encoded. There were differences in the complexity that the encoded patient data were processed with regarding deep learning methods before and after fusing them with the image features for a combined classifier. Conclusions: This study indicates the potential benefits of integrating patient data into CNN-based diagnostic algorithms. However, how exactly the individual patient data enhance classification performance, especially in the case of multiclass classification problems, is still unclear. Moreover, a substantial fraction of patient data used by dermatologists remains to be analyzed in the context of CNN-based skin cancer classification. Further exploratory analyses in this promising field may optimize patient data integration into CNN-based skin cancer diagnostics for patients’ benefits. %M 34255646 %R 10.2196/20708 %U https://www.jmir.org/2021/7/e20708 %U https://doi.org/10.2196/20708 %U http://www.ncbi.nlm.nih.gov/pubmed/34255646 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 10 %N 7 %P e29631 %T Predictive Monitoring–Impact in Acute Care Cardiology Trial (PM-IMPACCT): Protocol for a Randomized Controlled Trial %A Keim-Malpass,Jessica %A Ratcliffe,Sarah J %A Moorman,Liza P %A Clark,Matthew T %A Krahn,Katy N %A Monfredi,Oliver J %A Hamil,Susan %A Yousefvand,Gholamreza %A Moorman,J Randall %A Bourque,Jamieson M %+ University of Virginia, PO Box 800782, Charlottesville, VA, 22908, United States, 1 434 243 3961, jesskeim@gmail.com %K predictive analytics monitoring %K AI %K randomized controlled trial %K risk estimation %K clinical deterioration %K visual analytics %K artificial intelligence %K monitoring %K risk %K prediction %K impact %K cardiology %K acute care %D 2021 %7 2.7.2021 %9 Protocol %J JMIR Res Protoc %G English %X Background: Patients in acute care wards who deteriorate and are emergently transferred to intensive care units (ICUs) have poor outcomes. Early identification of patients who are decompensating might allow for earlier clinical intervention and reduced morbidity and mortality. Advances in bedside continuous predictive analytics monitoring (ie, artificial intelligence [AI]–based risk prediction) have made complex data easily available to health care providers and have provided early warning of potentially catastrophic clinical events. We present a dynamic, visual, predictive analytics monitoring tool that integrates real-time bedside telemetric physiologic data into robust clinical models to estimate and communicate risk of imminent events. This tool, Continuous Monitoring of Event Trajectories (CoMET), has been shown in retrospective observational studies to predict clinical decompensation on the acute care ward. There is a need to more definitively study this advanced predictive analytics or AI monitoring system in a prospective, randomized controlled, clinical trial. Objective: The goal of this trial is to determine the impact of an AI-based visual risk analytic, CoMET, on improving patient outcomes related to clinical deterioration, response time to proactive clinical action, and costs to the health care system. Methods: We propose a cluster randomized controlled trial to test the impact of using the CoMET display in an acute care cardiology and cardiothoracic surgery hospital floor. The number of admissions to a room undergoing cluster randomization was estimated to be 10,424 over the 20-month study period. Cluster randomization based on bed number will occur every 2 months. The intervention cluster will have the CoMET score displayed (along with standard of care), while the usual care group will receive standard of care only. Results: The primary outcome will be hours free from events of clinical deterioration. Hours of acute clinical events are defined as time when one or more of the following occur: emergent ICU transfer, emergent surgery prior to ICU transfer, cardiac arrest prior to ICU transfer, emergent intubation, or death. The clinical trial began randomization in January 2021. Conclusions: Very few AI-based health analytics have been translated from algorithm to real-world use. This study will use robust, prospective, randomized controlled, clinical trial methodology to assess the effectiveness of an advanced AI predictive analytics monitoring system in incorporating real-time telemetric data for identifying clinical deterioration on acute care wards. This analysis will strengthen the ability of health care organizations to evolve as learning health systems, in which bioinformatics data are applied to improve patient outcomes by incorporating AI into knowledge tools that are successfully integrated into clinical practice by health care providers. Trial Registration: ClinicalTrials.gov NCT04359641; https://clinicaltrials.gov/ct2/show/NCT04359641 International Registered Report Identifier (IRRID): DERR1-10.2196/29631 %M 34043525 %R 10.2196/29631 %U https://www.researchprotocols.org/2021/7/e29631 %U https://doi.org/10.2196/29631 %U http://www.ncbi.nlm.nih.gov/pubmed/34043525 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 6 %P e28272 %T Document Retrieval for Precision Medicine Using a Deep Learning Ensemble Method %A Liu,Zhiqiang %A Feng,Jingkun %A Yang,Zhihao %A Wang,Lei %+ College of Computer Science and Technology, Dalian University of Technology, No. 2 Ling Gong Road, Gan Jing Zi District, Dalian, China, 86 131 9011 4398, yangzh@dlut.edu.cn %K biomedical information retrieval %K document ranking %K precision medicine %K deep learning %D 2021 %7 29.6.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: With the development of biomedicine, the number of biomedical documents has increased rapidly bringing a great challenge for researchers trying to retrieve the information they need. Information retrieval aims to meet this challenge by searching relevant documents from abundant documents based on the given query. However, sometimes the relevance of search results needs to be evaluated from multiple aspects in specific retrieval tasks, thereby increasing the difficulty of biomedical information retrieval. Objective: This study aimed to find a more systematic method for retrieving relevant scientific literature for a given patient. Methods: In the initial retrieval stage, we supplemented query terms through query expansion strategies and applied query boosting to obtain an initial ranking list of relevant documents. In the re-ranking phase, we employed a text classification model and relevance matching model to evaluate documents from different dimensions and then combined the outputs through logistic regression to re-rank all the documents from the initial ranking list. Results: The proposed ensemble method contributed to the improvement of biomedical retrieval performance. Compared with the existing deep learning–based methods, experimental results showed that our method achieved state-of-the-art performance on the data collection provided by the Text Retrieval Conference 2019 Precision Medicine Track. Conclusions: In this paper, we proposed a novel ensemble method based on deep learning. As shown in the experiments, the strategies we used in the initial retrieval phase such as query expansion and query boosting are effective. The application of the text classification model and relevance matching model better captured semantic context information and improved retrieval performance. %M 34185006 %R 10.2196/28272 %U https://medinform.jmir.org/2021/6/e28272 %U https://doi.org/10.2196/28272 %U http://www.ncbi.nlm.nih.gov/pubmed/34185006 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 6 %P e27344 %T Discovery of Depression-Associated Factors From a Nationwide Population-Based Survey: Epidemiological Study Using Machine Learning and Network Analysis %A Nam,Sang Min %A Peterson,Thomas A %A Seo,Kyoung Yul %A Han,Hyun Wook %A Kang,Jee In %+ Department of Psychiatry, Institute of Behavioral Science in Medicine, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea, 82 2 2228 1620, jeeinkang@yuhs.ac %K depression %K epidemiology %K machine learning %K network %K prediction model %K XGBoost %D 2021 %7 24.6.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: In epidemiological studies, finding the best subset of factors is challenging when the number of explanatory variables is large. Objective: Our study had two aims. First, we aimed to identify essential depression-associated factors using the extreme gradient boosting (XGBoost) machine learning algorithm from big survey data (the Korea National Health and Nutrition Examination Survey, 2012-2016). Second, we aimed to achieve a comprehensive understanding of multifactorial features in depression using network analysis. Methods: An XGBoost model was trained and tested to classify “current depression” and “no lifetime depression” for a data set of 120 variables for 12,596 cases. The optimal XGBoost hyperparameters were set by an automated machine learning tool (TPOT), and a high-performance sparse model was obtained by feature selection using the feature importance value of XGBoost. We performed statistical tests on the model and nonmodel factors using survey-weighted multiple logistic regression and drew a correlation network among factors. We also adopted statistical tests for the confounder or interaction effect of selected risk factors when it was suspected on the network. Results: The XGBoost-derived depression model consisted of 18 factors with an area under the weighted receiver operating characteristic curve of 0.86. Two nonmodel factors could be found using the model factors, and the factors were classified into direct (P<.05) and indirect (P≥.05), according to the statistical significance of the association with depression. Perceived stress and asthma were the most remarkable risk factors, and urine specific gravity was a novel protective factor. The depression-factor network showed clusters of socioeconomic status and quality of life factors and suggested that educational level and sex might be predisposing factors. Indirect factors (eg, diabetes, hypercholesterolemia, and smoking) were involved in confounding or interaction effects of direct factors. Triglyceride level was a confounder of hypercholesterolemia and diabetes, smoking had a significant risk in females, and weight gain was associated with depression involving diabetes. Conclusions: XGBoost and network analysis were useful to discover depression-related factors and their relationships and can be applied to epidemiological studies using big survey data. %M 34184998 %R 10.2196/27344 %U https://www.jmir.org/2021/6/e27344/ %U https://doi.org/10.2196/27344 %U http://www.ncbi.nlm.nih.gov/pubmed/34184998 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 6 %P e26391 %T A Typology of Existing Machine Learning–Based Predictive Analytic Tools Focused on Reducing Costs and Improving Quality in Health Care: Systematic Search and Content Analysis %A Nichol,Ariadne A %A Batten,Jason N %A Halley,Meghan C %A Axelrod,Julia K %A Sankar,Pamela L %A Cho,Mildred K %+ Stanford School of Medicine, Stanford Center for Biomedical Ethics, 1215 Welch Road, Modular A, Stanford, CA, 94305, United States, 1 650 723 5760, ariadnen@stanford.edu %K machine learning %K artificial intelligence %K ethics %K regulation %K health care quality %K costs %D 2021 %7 22.6.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Considerable effort has been devoted to the development of artificial intelligence, including machine learning–based predictive analytics (MLPA) for use in health care settings. The growth of MLPA could be fueled by payment reforms that hold health care organizations responsible for providing high-quality, cost-effective care. Policy analysts, ethicists, and computer scientists have identified unique ethical and regulatory challenges from the use of MLPA in health care. However, little is known about the types of MLPA health care products available on the market today or their stated goals. Objective: This study aims to better characterize available MLPA health care products, identifying and characterizing claims about products recently or currently in use in US health care settings that are marketed as tools to improve health care efficiency by improving quality of care while reducing costs. Methods: We conducted systematic database searches of relevant business news and academic research to identify MLPA products for health care efficiency meeting our inclusion and exclusion criteria. We used content analysis to generate MLPA product categories and characterize the organizations marketing the products. Results: We identified 106 products and characterized them based on publicly available information in terms of the types of predictions made and the size, type, and clinical training of the leadership of the companies marketing them. We identified 5 categories of predictions made by MLPA products based on publicly available product marketing materials: disease onset and progression, treatment, cost and utilization, admissions and readmissions, and decompensation and adverse events. Conclusions: Our findings provide a foundational reference to inform the analysis of specific ethical and regulatory challenges arising from the use of MLPA to improve health care efficiency. %M 34156338 %R 10.2196/26391 %U https://www.jmir.org/2021/6/e26391 %U https://doi.org/10.2196/26391 %U http://www.ncbi.nlm.nih.gov/pubmed/34156338 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 6 %P e26771 %T Acceptability and Effectiveness of Artificial Intelligence Therapy for Anxiety and Depression (Youper): Longitudinal Observational Study %A Mehta,Ashish %A Niles,Andrea Nicole %A Vargas,Jose Hamilton %A Marafon,Thiago %A Couto,Diego Dotta %A Gross,James Jonathan %+ Department of Psychology, Stanford University, Building 420, 450 Jane Stanford Way, Stanford, CA, 94305, United States, 1 650 724 5436, ashm@stanford.edu %K digital mental health treatment %K acceptability %K effectiveness %K anxiety %K depression %D 2021 %7 22.6.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Youper is a widely used, commercially available mobile app that uses artificial intelligence therapy for the treatment of anxiety and depression. Objective: Our study examined the acceptability and effectiveness of Youper. Further, we tested the cumulative regulation hypothesis, which posits that cumulative emotion regulation successes with repeated intervention engagement will predict longer-term anxiety and depression symptom reduction. Methods: We examined data from paying Youper users (N=4517) who allowed their data to be used for research. To characterize the acceptability of Youper, we asked users to rate the app on a 5-star scale and measured retention statistics for users’ first 4 weeks of subscription. To examine effectiveness, we examined longitudinal measures of anxiety and depression symptoms. To test the cumulative regulation hypothesis, we used the proportion of successful emotion regulation attempts to predict symptom reduction. Results: Youper users rated the app highly (mean 4.36 stars, SD 0.84), and 42.66% (1927/4517) of users were retained by week 4. Symptoms decreased in the first 2 weeks of app use (anxiety: d=0.57; depression: d=0.46). Anxiety improvements were maintained in the subsequent 2 weeks, but depression symptoms increased slightly with a very small effect size (d=0.05). A higher proportion of successful emotion regulation attempts significantly predicted greater anxiety and depression symptom reduction. Conclusions: Youper is a low-cost, completely self-guided treatment that is accessible to users who may not otherwise access mental health care. Our findings demonstrate the acceptability and effectiveness of Youper as a treatment for anxiety and depression symptoms and support continued study of Youper in a randomized clinical trial. %M 34155984 %R 10.2196/26771 %U https://www.jmir.org/2021/6/e26771 %U https://doi.org/10.2196/26771 %U http://www.ncbi.nlm.nih.gov/pubmed/34155984 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 6 %P e26139 %T Use of Multiprognostic Index Domain Scores, Clinical Data, and Machine Learning to Improve 12-Month Mortality Risk Prediction in Older Hospitalized Patients: Prospective Cohort Study %A Woodman,Richard John %A Bryant,Kimberley %A Sorich,Michael J %A Pilotto,Alberto %A Mangoni,Arduino Aleksander %+ College of Medicine and Public Health, Flinders University, Room 3,12 Health Sciences Building, Sturt Road, Bedford Park, Adelaide, 5042, Australia, 61 0872218537, richard.woodman@flinders.edu.au %K machine learning %K Multidimensional Prognostic Index %K mortality %K diagnostic accuracy %K XGBoost %D 2021 %7 21.6.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: The Multidimensional Prognostic Index (MPI) is an aggregate, comprehensive, geriatric assessment scoring system derived from eight domains that predict adverse outcomes, including 12-month mortality. However, the prediction accuracy of using the three MPI categories (mild, moderate, and severe risk) was relatively poor in a study of older hospitalized Australian patients. Prediction modeling using the component domains of the MPI together with additional clinical features and machine learning (ML) algorithms might improve prediction accuracy. Objective: This study aims to assess whether the accuracy of prediction for 12-month mortality using logistic regression with maximum likelihood estimation (LR-MLE) with the 3-category MPI together with age and gender (feature set 1) can be improved with the addition of 10 clinical features (sodium, hemoglobin, albumin, creatinine, urea, urea-to-creatinine ratio, estimated glomerular filtration rate, C-reactive protein, BMI, and anticholinergic risk score; feature set 2) and the replacement of the 3-category MPI in feature sets 1 and 2 with the eight separate MPI domains (feature sets 3 and 4, respectively), and to assess the prediction accuracy of the ML algorithms using the same feature sets. Methods: MPI and clinical features were collected from patients aged 65 years and above who were admitted to either the general medical or acute care of the elderly wards of a South Australian hospital between September 2015 and February 2017. The diagnostic accuracy of LR-MLE was assessed together with nine ML algorithms: decision trees, random forests, extreme gradient boosting (XGBoost), support-vector machines, naïve Bayes, K-nearest neighbors, ridge regression, logistic regression without regularization, and neural networks. A 70:30 training set:test set split of the data and a grid search of hyper-parameters with 10-fold cross-validation—was used during model training. The area under the curve was used as the primary measure of accuracy. Results: A total of 737 patients (female: 370/737, 50.2%; male: 367/737, 49.8%) with a median age of 80 (IQR 72-86) years had complete MPI data recorded on admission and had completed the 12-month follow-up. The area under the receiver operating curve for LR-MLE was 0.632, 0.688, 0.738, and 0.757 for feature sets 1 to 4, respectively. The best overall accuracy for the nine ML algorithms was obtained using the XGBoost algorithm (0.635, 0.706, 0.756, and 0.757 for feature sets 1 to 4, respectively). Conclusions: The use of MPI domains with LR-MLE considerably improved the prediction accuracy compared with that obtained using the traditional 3-category MPI. The XGBoost ML algorithm slightly improved accuracy compared with LR-MLE, and adding clinical data improved accuracy. These results build on previous work on the MPI and suggest that implementing risk scores based on MPI domains and clinical data by using ML prediction models can support clinical decision-making with respect to risk stratification for the follow-up care of older hospitalized patients. %M 34152274 %R 10.2196/26139 %U https://www.jmir.org/2021/6/e26139 %U https://doi.org/10.2196/26139 %U http://www.ncbi.nlm.nih.gov/pubmed/34152274 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 6 %P e25913 %T Exploratory Outlier Detection for Acceleromyographic Neuromuscular Monitoring: Machine Learning Approach %A Verdonck,Michaël %A Carvalho,Hugo %A Berghmans,Johan %A Forget,Patrice %A Poelaert,Jan %+ Department of Anesthesiology and Perioperative Medicine, Vrije Universiteit Brussel, Laarbeeklaan 101, Jette, 1050, Belgium, 32 474683824, michael.verdonck@vub.be %K neuromuscular monitoring %K outlier analysis %K acceleromyography %K postoperative residual curarization %K train-of-four %K monitoring devices %K neuromuscular %K machine learning %K monitors %K anesthesiology %D 2021 %7 21.6.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Perioperative quantitative monitoring of neuromuscular function in patients receiving neuromuscular blockers has become internationally recognized as an absolute and core necessity in modern anesthesia care. Because of their kinetic nature, artifactual recordings of acceleromyography-based neuromuscular monitoring devices are not unusual. These generate a great deal of cynicism among anesthesiologists, constituting an obstacle toward their widespread adoption. Through outlier analysis techniques, monitoring devices can learn to detect and flag signal abnormalities. Outlier analysis (or anomaly detection) refers to the problem of finding patterns in data that do not conform to expected behavior. Objective: This study was motivated by the development of a smartphone app intended for neuromuscular monitoring based on combined accelerometric and angular hand movement data. During the paired comparison stage of this app against existing acceleromyography monitoring devices, it was noted that the results from both devices did not always concur. This study aims to engineer a set of features that enable the detection of outliers in the form of erroneous train-of-four (TOF) measurements from an acceleromyographic-based device. These features are tested for their potential in the detection of erroneous TOF measurements by developing an outlier detection algorithm. Methods: A data set encompassing 533 high-sensitivity TOF measurements from 35 patients was created based on a multicentric open label trial of a purpose-built accelero- and gyroscopic-based neuromuscular monitoring app. A basic set of features was extracted based on raw data while a second set of features was purpose engineered based on TOF pattern characteristics. Two cost-sensitive logistic regression (CSLR) models were deployed to evaluate the performance of these features. The final output of the developed models was a binary classification, indicating if a TOF measurement was an outlier or not. Results: A total of 7 basic features were extracted based on raw data, while another 8 features were engineered based on TOF pattern characteristics. The model training and testing were based on separate data sets: one with 319 measurements (18 outliers) and a second with 214 measurements (12 outliers). The F1 score (95% CI) was 0.86 (0.48-0.97) for the CSLR model with engineered features, significantly larger than the CSLR model with the basic features (0.29 [0.17-0.53]; P<.001). Conclusions: The set of engineered features and their corresponding incorporation in an outlier detection algorithm have the potential to increase overall neuromuscular monitoring data consistency. Integrating outlier flagging algorithms within neuromuscular monitors could potentially reduce overall acceleromyography-based reliability issues. Trial Registration: ClinicalTrials.gov NCT03605225; https://clinicaltrials.gov/ct2/show/NCT03605225 %M 34152273 %R 10.2196/25913 %U https://www.jmir.org/2021/6/e25913/ %U https://doi.org/10.2196/25913 %U http://www.ncbi.nlm.nih.gov/pubmed/34152273 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 8 %N 2 %P e28236 %T Research Trends in Artificial Intelligence Applications in Human Factors Health Care: Mapping Review %A Asan,Onur %A Choudhury,Avishek %+ School of Systems and Enterprises, Stevens Institute of Technology, 1 Castle Point Terrace, Hoboken, NJ, 07030, United States, 1 4145264330, oasan@stevens.edu %K artificial intelligence %K human factors %K health care systems %K ecological validity %K usability %K trust %K perception %K workload %D 2021 %7 18.6.2021 %9 Review %J JMIR Hum Factors %G English %X Background: Despite advancements in artificial intelligence (AI) to develop prediction and classification models, little research has been devoted to real-world translations with a user-centered design approach. AI development studies in the health care context have often ignored two critical factors of ecological validity and human cognition, creating challenges at the interface with clinicians and the clinical environment. Objective: The aim of this literature review was to investigate the contributions made by major human factors communities in health care AI applications. This review also discusses emerging research gaps, and provides future research directions to facilitate a safer and user-centered integration of AI into the clinical workflow. Methods: We performed an extensive mapping review to capture all relevant articles published within the last 10 years in the major human factors journals and conference proceedings listed in the “Human Factors and Ergonomics” category of the Scopus Master List. In each published volume, we searched for studies reporting qualitative or quantitative findings in the context of AI in health care. Studies are discussed based on the key principles such as evaluating workload, usability, trust in technology, perception, and user-centered design. Results: Forty-eight articles were included in the final review. Most of the studies emphasized user perception, the usability of AI-based devices or technologies, cognitive workload, and user’s trust in AI. The review revealed a nascent but growing body of literature focusing on augmenting health care AI; however, little effort has been made to ensure ecological validity with user-centered design approaches. Moreover, few studies (n=5 against clinical/baseline standards, n=5 against clinicians) compared their AI models against a standard measure. Conclusions: Human factors researchers should actively be part of efforts in AI design and implementation, as well as dynamic assessments of AI systems’ effects on interaction, workflow, and patient outcomes. An AI system is part of a greater sociotechnical system. Investigators with human factors and ergonomics expertise are essential when defining the dynamic interaction of AI within each element, process, and result of the work system. %M 34142968 %R 10.2196/28236 %U https://humanfactors.jmir.org/2021/2/e28236 %U https://doi.org/10.2196/28236 %U http://www.ncbi.nlm.nih.gov/pubmed/34142968 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 6 %P e17551 %T Reduction of Time on the Ground Related to Real-Time Video Detection of Falls in Memory Care Facilities: Observational Study %A Bayen,Eleonore %A Nickels,Shirley %A Xiong,Glen %A Jacquemot,Julien %A Subramaniam,Raghav %A Agrawal,Pulkit %A Hemraj,Raheema %A Bayen,Alexandre %A Miller,Bruce L %A Netscher,George %+ Department of Neuro-rehabilitation, Hôpital Pitié-Salpêtrière, Assistance Publique des Hôpitaux de Paris, Sorbonne Université, 47 Bd de l'Hôpital, Paris, 75013, France, 33 142161101, eleonore.bayen@gbhi.org %K artificial intelligence %K video monitoring %K real-time video detection %K fall %K time on the ground %K Alzheimer disease %K dementia %K memory care facilities %D 2021 %7 17.6.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Lying on the floor for a long period of time has been described as a critical determinant of prognosis following a fall. In addition to fall-related injuries due to the trauma itself, prolonged immobilization on the floor results in a wide range of comorbidities and may double the risk of death in elderly. Thus, reducing the length of Time On the Ground (TOG) in fallers seems crucial in vulnerable individuals with cognitive disorders who cannot get up independently. Objective: This study aimed to examine the effect of a new technology called SafelyYou Guardian (SYG) on early post-fall care including reduction of Time Until staff Assistance (TUA) and TOG. Methods: SYG uses continuous video monitoring, artificial intelligence, secure networks, and customized computer applications to detect and notify caregivers about falls in real time while providing immediate access to video footage of falls. The present observational study was conducted in 6 California memory care facilities where SYG was installed in bedrooms of consenting residents and families. Fall events were video recorded over 10 months. During the baseline installation period (November 2017 to December 2017), SYG video captures of falls were not provided on a regular basis to facility staff review. During a second period (January 2018 to April 2018), video captures were delivered to facility staff on a regular weekly basis. During the third period (May 2018 to August 2018), real-time notification (RTN) of any fall was provided to facility staff. Two digital markers (TUA, TOG) were automatically measured and compared between the baseline period (first 2 months) and the RTN period (last 4 months). The total number of falls including those happening outside of the bedroom (such as common areas and bathrooms) was separately reported by facility staff. Results: A total of 436 falls were recorded in 66 participants suffering from Alzheimer disease or related dementias (mean age 87 years; minimum 65, maximum 104 years). Over 80% of the falls happened in bedrooms, with two-thirds occurring overnight (8 PM to 8 AM). While only 8.1% (22/272) of falls were scored as moderate or severe, fallers were not able to stand up alone in 97.6% (247/253) of the cases. Reductions of 28.3 (CI 19.6-37.1) minutes in TUA and 29.6 (CI 20.3-38.9) minutes in TOG were observed between the baseline and RTN periods. The proportion of fallers with TOG >1 hour fell from 31% (8/26; baseline) to zero events (RTN period). During the RTN period, 76.6% (108/141) of fallers received human staff assistance in less than 10 minutes, and 55.3% (78/141) of them spent less than 10 minutes on the ground. Conclusions: SYG technology is capable of reducing TOG and TUA while efficiently covering the area (bedroom) and time zone (nighttime) that are at highest risk. After 6 months of SYG monitoring, TOG was reduced by a factor of 3. The drastic reduction of TOG is likely to decrease secondary comorbid complications, improve post-fall prognosis, and reduce health care costs. %M 34137723 %R 10.2196/17551 %U https://www.jmir.org/2021/6/e17551 %U https://doi.org/10.2196/17551 %U http://www.ncbi.nlm.nih.gov/pubmed/34137723 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 6 %P e26601 %T Hyperpolarized Magnetic Resonance and Artificial Intelligence: Frontiers of Imaging in Pancreatic Cancer %A Enriquez,José S %A Chu,Yan %A Pudakalakatti,Shivanand %A Hsieh,Kang Lin %A Salmon,Duncan %A Dutta,Prasanta %A Millward,Niki Zacharias %A Lurie,Eugene %A Millward,Steven %A McAllister,Florencia %A Maitra,Anirban %A Sen,Subrata %A Killary,Ann %A Zhang,Jian %A Jiang,Xiaoqian %A Bhattacharya,Pratip K %A Shams,Shayan %+ School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin St, Houston, TX, 77030, United States, 1 713 500 3940, shayan.shams@uth.tmc.edu %K artificial intelligence %K deep learning %K hyperpolarization %K metabolic imaging %K MRI %K 13C %K HP-MR %K pancreatic ductal adenocarcinoma %K pancreatic cancer %K early detection %K assessment of treatment response %K probes %K cancer %K marker %K imaging %K treatment %K review %K detection %K efficacy %D 2021 %7 17.6.2021 %9 Review %J JMIR Med Inform %G English %X Background: There is an unmet need for noninvasive imaging markers that can help identify the aggressive subtype(s) of pancreatic ductal adenocarcinoma (PDAC) at diagnosis and at an earlier time point, and evaluate the efficacy of therapy prior to tumor reduction. In the past few years, there have been two major developments with potential for a significant impact in establishing imaging biomarkers for PDAC and pancreatic cancer premalignancy: (1) hyperpolarized metabolic (HP)-magnetic resonance (MR), which increases the sensitivity of conventional MR by over 10,000-fold, enabling real-time metabolic measurements; and (2) applications of artificial intelligence (AI). Objective: Our objective of this review was to discuss these two exciting but independent developments (HP-MR and AI) in the realm of PDAC imaging and detection from the available literature to date. Methods: A systematic review following the PRISMA extension for Scoping Reviews (PRISMA-ScR) guidelines was performed. Studies addressing the utilization of HP-MR and/or AI for early detection, assessment of aggressiveness, and interrogating the early efficacy of therapy in patients with PDAC cited in recent clinical guidelines were extracted from the PubMed and Google Scholar databases. The studies were reviewed following predefined exclusion and inclusion criteria, and grouped based on the utilization of HP-MR and/or AI in PDAC diagnosis. Results: Part of the goal of this review was to highlight the knowledge gap of early detection in pancreatic cancer by any imaging modality, and to emphasize how AI and HP-MR can address this critical gap. We reviewed every paper published on HP-MR applications in PDAC, including six preclinical studies and one clinical trial. We also reviewed several HP-MR–related articles describing new probes with many functional applications in PDAC. On the AI side, we reviewed all existing papers that met our inclusion criteria on AI applications for evaluating computed tomography (CT) and MR images in PDAC. With the emergence of AI and its unique capability to learn across multimodal data, along with sensitive metabolic imaging using HP-MR, this knowledge gap in PDAC can be adequately addressed. CT is an accessible and widespread imaging modality worldwide as it is affordable; because of this reason alone, most of the data discussed are based on CT imaging datasets. Although there were relatively few MR-related papers included in this review, we believe that with rapid adoption of MR imaging and HP-MR, more clinical data on pancreatic cancer imaging will be available in the near future. Conclusions: Integration of AI, HP-MR, and multimodal imaging information in pancreatic cancer may lead to the development of real-time biomarkers of early detection, assessing aggressiveness, and interrogating early efficacy of therapy in PDAC. %M 34137725 %R 10.2196/26601 %U https://medinform.jmir.org/2021/6/e26601 %U https://doi.org/10.2196/26601 %U http://www.ncbi.nlm.nih.gov/pubmed/34137725 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 6 %P e29549 %T Authors’ Reply to: Bibliometric Studies and the Discipline of Social Media Mental Health Research. Comment on “Machine Learning for Mental Health in Social Media: Bibliometric Study” %A Kim,Jina %A Lee,Daeun %A Park,Eunil %+ Department of Applied Artificial Intelligence, Sungkyunkwan University, 312 International Hall, Sungkyunkwan-ro 25-2, Seoul, 03063, Republic of Korea, 82 2 740 1864, eunilpark@skku.edu %K bibliometric analysis %K machine learning %K mental health %K social media %K bibliometrics %D 2021 %7 17.6.2021 %9 Letter to the Editor %J J Med Internet Res %G English %X %M 34137721 %R 10.2196/29549 %U https://www.jmir.org/2021/6/e29549 %U https://doi.org/10.2196/29549 %U http://www.ncbi.nlm.nih.gov/pubmed/34137721 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 6 %P e28990 %T Bibliometric Studies and the Discipline of Social Media Mental Health Research. Comment on “Machine Learning for Mental Health in Social Media: Bibliometric Study” %A Resnik,Philip %A De Choudhury,Munmun %A Musacchio Schafer,Katherine %A Coppersmith,Glen %+ Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland, 1401 Marie Mount Hall, College Park, MD, 20814, United States, 1 301 405 7002, resnik@umd.edu %K bibliometric analysis %K machine learning %K mental health %K social media %K bibliometrics %D 2021 %7 17.6.2021 %9 Letter to the Editor %J J Med Internet Res %G English %X %M 34137722 %R 10.2196/28990 %U https://www.jmir.org/2021/6/e28990 %U https://doi.org/10.2196/28990 %U http://www.ncbi.nlm.nih.gov/pubmed/34137722 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 6 %P e27807 %T Development of a Positive Body Image Chatbot (KIT) With Young People and Parents/Carers: Qualitative Focus Group Study %A Beilharz,Francesca %A Sukunesan,Suku %A Rossell,Susan L %A Kulkarni,Jayashri %A Sharp,Gemma %+ Monash Alfred Psychiatry Research Centre, Monash University, 4/607 St Kilda Road, Melbourne, 3004, Australia, 61 390765167, gemma.sharp@monash.edu %K body image %K eating disorder %K chatbot %K conversational agent %K artificial intelligence %K mental health %K digital health %K design %D 2021 %7 16.6.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Body image and eating disorders represent a significant public health concern; however, many affected individuals never access appropriate treatment. Conversational agents or chatbots reflect a unique opportunity to target those affected online by providing psychoeducation and coping skills, thus filling the gap in service provision. Objective: A world-first body image chatbot called “KIT” was designed. The aim of this study was to assess preliminary acceptability and feasibility via the collection of qualitative feedback from young people and parents/carers regarding the content, structure, and design of the chatbot, in accordance with an agile methodology strategy. The chatbot was developed in collaboration with Australia’s national eating disorder support organization, the Butterfly Foundation. Methods: A conversation decision tree was designed that offered psychoeducational information on body image and eating disorders, as well as evidence-based coping strategies. A version of KIT was built as a research prototype to deliver these conversations. Six focus groups were conducted using online semistructured interviews to seek feedback on the KIT prototype. This included four groups of people seeking help for themselves (n=17; age 13-18 years) and two groups of parents/carers (n=8; age 46-57 years). Participants provided feedback on the cartoon chatbot character design, as well as the content, structure, and design of the chatbot webchat. Results: Thematic analyses identified the following three main themes from the six focus groups: (1) chatbot character and design, (2) content presentation, and (3) flow. Overall, the participants provided positive feedback regarding KIT, with both young people and parents/carers generally providing similar reflections. The participants approved of KIT’s character and engagement. Specific suggestions were made regarding the brevity and tone to increase KIT’s interactivity. Conclusions: Focus groups provided overall positive qualitative feedback regarding the content, structure, and design of the body image chatbot. Incorporating the feedback of lived experience from both individuals and parents/carers allowed the refinement of KIT in the development phase as per an iterative agile methodology. Further research is required to evaluate KIT’s efficacy. %M 34132644 %R 10.2196/27807 %U https://www.jmir.org/2021/6/e27807 %U https://doi.org/10.2196/27807 %U http://www.ncbi.nlm.nih.gov/pubmed/34132644 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 6 %P e29336 %T Author's Reply to: Periodic Manual Algorithm Updates and Generalizability: A Developer’s Response. Comment on “Evaluation of Four Artificial Intelligence–Assisted Self-Diagnosis Apps on Three Diagnoses: Two-Year Follow-Up Study” %A Ćirković,Aleksandar %+ Schulgasse 21, Weiden, 92637, Germany, 49 1788603753, aleksandar.cirkovic@mailbox.org %K artificial intelligence %K machine learning %K mobile apps %K medical diagnosis %K mHealth %K symptom assessment %D 2021 %7 16.6.2021 %9 Letter to the Editor %J J Med Internet Res %G English %X %M 34132643 %R 10.2196/29336 %U https://www.jmir.org/2021/6/e29336 %U https://doi.org/10.2196/29336 %U http://www.ncbi.nlm.nih.gov/pubmed/34132643 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 6 %P e26514 %T Periodic Manual Algorithm Updates and Generalizability: A Developer’s Response. Comment on “Evaluation of Four Artificial Intelligence–Assisted Self-Diagnosis Apps on Three Diagnoses: Two-Year Follow-Up Study” %A Gilbert,Stephen %A Fenech,Matthew %A Idris,Anisa %A Türk,Ewelina %+ Ada Health, Karl-Liebknecht-Str 1, Berlin, 10178, Germany, 49 017680396015, stephen.gilbert@ada.com %K artificial intelligence %K machine learning %K mobile apps %K medical diagnosis %K mHealth %K symptom assessment %D 2021 %7 16.6.2021 %9 Letter to the Editor %J J Med Internet Res %G English %X %M 34132641 %R 10.2196/26514 %U https://www.jmir.org/2021/6/e26514 %U https://doi.org/10.2196/26514 %U http://www.ncbi.nlm.nih.gov/pubmed/34132641 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 10 %N 6 %P e26448 %T Automation of Article Selection Process in Systematic Reviews Through Artificial Neural Network Modeling and Machine Learning: Protocol for an Article Selection Model %A Ferreira,Gabriel Ferraz %A Quiles,Marcos Gonçalves %A Nazaré,Tiago Santana %A Rezende,Solange Oliveira %A Demarzo,Marcelo %+ Department of Science and Technology, Universidade Federal de São Paulo, Avenida Padre José Maria, 545, São Paulo, CEP 04753-060, Brazil, 55 1135547084, demarzo@unifesp.br %K deep learning %K machine learning %K systematic review %K mindfulness %D 2021 %7 15.6.2021 %9 Protocol %J JMIR Res Protoc %G English %X Background: A systematic review can be defined as a summary of the evidence found in the literature via a systematic search in the available scientific databases. One of the steps involved is article selection, which is typically a laborious task. Machine learning and artificial intelligence can be important tools in automating this step, thus aiding researchers. Objective: The aim of this study is to create models based on an artificial neural network system to automate the article selection process in systematic reviews related to “Mindfulness and Health Promotion.” Methods: The study will be performed using Python programming software. The system will consist of six main steps: (1) data import, (2) exclusion of duplicates, (3) exclusion of non-articles, (4) article reading and model creation using artificial neural network, (5) comparison of the models, and (6) system sharing. We will choose the 10 most relevant systematic reviews published in the fields of “Mindfulness and Health Promotion” and “Orthopedics” (control group) to serve as a test of the effectiveness of the article selection. Results: Data collection will begin in July 2021, with completion scheduled for December 2021, and final publication available in March 2022. Conclusions: An automated system with a modifiable sensitivity will be created to select scientific articles in systematic review that can be expanded to various fields. We will disseminate our results and models through the “Observatory of Evidence” in public health, an open and online platform that will assist researchers in systematic reviews. International Registered Report Identifier (IRRID): PRR1-10.2196/26448 %M 34128820 %R 10.2196/26448 %U https://www.researchprotocols.org/2021/6/e26448 %U https://doi.org/10.2196/26448 %U http://www.ncbi.nlm.nih.gov/pubmed/34128820 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 6 %P e24285 %T Real-time Prediction of the Daily Incidence of COVID-19 in 215 Countries and Territories Using Machine Learning: Model Development and Validation %A Peng,Yuanyuan %A Li,Cuilian %A Rong,Yibiao %A Pang,Chi Pui %A Chen,Xinjian %A Chen,Haoyu %+ Joint Shantou International Eye Center, Shantou University & the Chinese University of Hong Kong, Joint Shantou International Eye Center, North Dongxia Road, Shantou, 515041, China, 86 075488393560, drchenhaoyu@gmail.com %K COVID-19 %K daily incidence %K real-time prediction %K machine learning %K Google Trends %K infoveillance %K infodemiology %K digital health %K digital public health %K surveillance %K prediction %K incidence %K policy %K prevention %K model %D 2021 %7 14.6.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Advanced prediction of the daily incidence of COVID-19 can aid policy making on the prevention of disease spread, which can profoundly affect people's livelihood. In previous studies, predictions were investigated for single or several countries and territories. Objective: We aimed to develop models that can be applied for real-time prediction of COVID-19 activity in all individual countries and territories worldwide. Methods: Data of the previous daily incidence and infoveillance data (search volume data via Google Trends) from 215 individual countries and territories were collected. A random forest regression algorithm was used to train models to predict the daily new confirmed cases 7 days ahead. Several methods were used to optimize the models, including clustering the countries and territories, selecting features according to the importance scores, performing multiple-step forecasting, and upgrading the models at regular intervals. The performance of the models was assessed using the mean absolute error (MAE), root mean square error (RMSE), Pearson correlation coefficient, and Spearman correlation coefficient. Results: Our models can accurately predict the daily new confirmed cases of COVID-19 in most countries and territories. Of the 215 countries and territories under study, 198 (92.1%) had MAEs <10 and 187 (87.0%) had Pearson correlation coefficients >0.8. For the 215 countries and territories, the mean MAE was 5.42 (range 0.26-15.32), the mean RMSE was 9.27 (range 1.81-24.40), the mean Pearson correlation coefficient was 0.89 (range 0.08-0.99), and the mean Spearman correlation coefficient was 0.84 (range 0.2-1.00). Conclusions: By integrating previous incidence and Google Trends data, our machine learning algorithm was able to predict the incidence of COVID-19 in most individual countries and territories accurately 7 days ahead. %M 34081607 %R 10.2196/24285 %U https://www.jmir.org/2021/6/e24285 %U https://doi.org/10.2196/24285 %U http://www.ncbi.nlm.nih.gov/pubmed/34081607 %0 Journal Article %@ 2563-6316 %I JMIR Publications %V 2 %N 2 %P e25560 %T Machine Learning for Risk Group Identification and User Data Collection in a Herpes Simplex Virus Patient Registry: Algorithm Development and Validation Study %A Surodina,Svitlana %A Lam,Ching %A Grbich,Svetislav %A Milne-Ives,Madison %A van Velthoven,Michelle %A Meinert,Edward %+ Centre for Health Technology, University of Plymouth, 6 Kirkby Place, Room 2, Plymouth, PL4 6DN, United Kingdom, 44 1752600600, edward.meinert@plymouth.ac.uk %K data collection %K herpes simplex virus %K registries %K machine learning %K risk assessment %K artificial intelligence %K medical information system %K user-centered design %K predictor %K risk %D 2021 %7 11.6.2021 %9 Original Paper %J JMIRx Med %G English %X Background: Researching people with herpes simplex virus (HSV) is challenging because of poor data quality, low user engagement, and concerns around stigma and anonymity. Objective: This project aimed to improve data collection for a real-world HSV registry by identifying predictors of HSV infection and selecting a limited number of relevant questions to ask new registry users to determine their level of HSV infection risk. Methods: The US National Health and Nutrition Examination Survey (NHANES, 2015-2016) database includes the confirmed HSV type 1 and type 2 (HSV-1 and HSV-2, respectively) status of American participants (14-49 years) and a wealth of demographic and health-related data. The questionnaires and data sets from this survey were used to form two data sets: one for HSV-1 and one for HSV-2. These data sets were used to train and test a model that used a random forest algorithm (devised using Python) to minimize the number of anonymous lifestyle-based questions needed to identify risk groups for HSV. Results: The model selected a reduced number of questions from the NHANES questionnaire that predicted HSV infection risk with high accuracy scores of 0.91 and 0.96 and high recall scores of 0.88 and 0.98 for the HSV-1 and HSV-2 data sets, respectively. The number of questions was reduced from 150 to an average of 40, depending on age and gender. The model, therefore, provided high predictability of risk of infection with minimal required input. Conclusions: This machine learning algorithm can be used in a real-world evidence registry to collect relevant lifestyle data and identify individuals’ levels of risk of HSV infection. A limitation is the absence of real user data and integration with electronic medical records, which would enable model learning and improvement. Future work will explore model adjustments, anonymization options, explicit permissions, and a standardized data schema that meet the General Data Protection Regulation, Health Insurance Portability and Accountability Act, and third-party interface connectivity requirements. %M 37725536 %R 10.2196/25560 %U https://xmed.jmir.org/2021/2/e25560 %U https://doi.org/10.2196/25560 %U http://www.ncbi.nlm.nih.gov/pubmed/37725536 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 8 %N 6 %P e24668 %T Ethics and Law in Research on Algorithmic and Data-Driven Technology in Mental Health Care: Scoping Review %A Gooding,Piers %A Kariotis,Timothy %+ Melbourne Law School, University of Melbourne, 185 Pelham Street, Melbourne, 3053, Australia, 61 383440910, p.gooding@unimelb.edu.au %K digital psychiatry %K digital mental health %K machine learning %K algorithmic technology %K data-driven technology %K artificial intelligence %K ethics %K regulation %K law %K mobile phone %D 2021 %7 10.6.2021 %9 Review %J JMIR Ment Health %G English %X Background: Uncertainty surrounds the ethical and legal implications of algorithmic and data-driven technologies in the mental health context, including technologies characterized as artificial intelligence, machine learning, deep learning, and other forms of automation. Objective: This study aims to survey empirical scholarly literature on the application of algorithmic and data-driven technologies in mental health initiatives to identify the legal and ethical issues that have been raised. Methods: We searched for peer-reviewed empirical studies on the application of algorithmic technologies in mental health care in the Scopus, Embase, and Association for Computing Machinery databases. A total of 1078 relevant peer-reviewed applied studies were identified, which were narrowed to 132 empirical research papers for review based on selection criteria. Conventional content analysis was undertaken to address our aims, and this was supplemented by a keyword-in-context analysis. Results: We grouped the findings into the following five categories of technology: social media (53/132, 40.1%), smartphones (37/132, 28%), sensing technology (20/132, 15.1%), chatbots (5/132, 3.8%), and miscellaneous (17/132, 12.9%). Most initiatives were directed toward detection and diagnosis. Most papers discussed privacy, mainly in terms of respecting the privacy of research participants. There was relatively little discussion of privacy in this context. A small number of studies discussed ethics directly (10/132, 7.6%) and indirectly (10/132, 7.6%). Legal issues were not substantively discussed in any studies, although some legal issues were discussed in passing (7/132, 5.3%), such as the rights of user subjects and privacy law compliance. Conclusions: Ethical and legal issues tend to not be explicitly addressed in empirical studies on algorithmic and data-driven technologies in mental health initiatives. Scholars may have considered ethical or legal matters at the ethics committee or institutional review board stage. If so, this consideration seldom appears in published materials in applied research in any detail. The form itself of peer-reviewed papers that detail applied research in this field may well preclude a substantial focus on ethics and law. Regardless, we identified several concerns, including the near-complete lack of involvement of mental health service users, the scant consideration of algorithmic accountability, and the potential for overmedicalization and techno-solutionism. Most papers were published in the computer science field at the pilot or exploratory stages. Thus, these technologies could be appropriated into practice in rarely acknowledged ways, with serious legal and ethical implications. %M 34110297 %R 10.2196/24668 %U https://mental.jmir.org/2021/6/e24668 %U https://doi.org/10.2196/24668 %U http://www.ncbi.nlm.nih.gov/pubmed/34110297 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 6 %P e29242 %T Informing Developmental Milestone Achievement for Children With Autism: Machine Learning Approach %A Haque,Munirul M %A Rabbani,Masud %A Dipal,Dipranjan Das %A Zarif,Md Ishrak Islam %A Iqbal,Anik %A Schwichtenberg,Amy %A Bansal,Naveen %A Soron,Tanjir Rashid %A Ahmed,Syed Ishtiaque %A Ahamed,Sheikh Iqbal %+ Ubicomp Lab, Department of Computer Science, Marquette University, 1422 W Kilbourn Ave, 102, Milwaukee, WI, 53233-1784, United States, 1 4143267769, masud.rabbani@marquette.edu %K autism spectrum disorders %K machine learning %K digital health %K mobile health %K mhealth %K predictive modeling %K milestone parameters %K Autism and Developmental Disabilities Monitoring (ADDM) %K early intervention %D 2021 %7 8.6.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Care for children with autism spectrum disorder (ASD) can be challenging for families and medical care systems. This is especially true in low- and- middle-income countries such as Bangladesh. To improve family–practitioner communication and developmental monitoring of children with ASD, mCARE (Mobile-Based Care for Children with Autism Spectrum Disorder Using Remote Experience Sampling Method) was developed. Within this study, mCARE was used to track child milestone achievement and family sociodemographic assets to inform mCARE feasibility/scalability and family asset–informed practitioner recommendations. Objective: The objectives of this paper are threefold. First, it documents how mCARE can be used to monitor child milestone achievement. Second, it demonstrates how advanced machine learning models can inform our understanding of milestone achievement in children with ASD. Third, it describes family/child sociodemographic factors that are associated with earlier milestone achievement in children with ASD (across 5 machine learning models). Methods: Using mCARE-collected data, this study assessed milestone achievement in 300 children with ASD from Bangladesh. In this study, we used 4 supervised machine learning algorithms (decision tree, logistic regression, K-nearest neighbor [KNN], and artificial neural network [ANN]) and 1 unsupervised machine learning algorithm (K-means clustering) to build models of milestone achievement based on family/child sociodemographic details. For analyses, the sample was randomly divided in half to train the machine learning models and then their accuracy was estimated based on the other half of the sample. Each model was specified for the following milestones: Brushes teeth, Asks to use the toilet, Urinates in the toilet or potty, and Buttons large buttons. Results: This study aimed to find a suitable machine learning algorithm for milestone prediction/achievement for children with ASD using family/child sociodemographic characteristics. For Brushes teeth, the 3 supervised machine learning models met or exceeded an accuracy of 95% with logistic regression, KNN, and ANN as the most robust sociodemographic predictors. For Asks to use toilet, 84.00% accuracy was achieved with the KNN and ANN models. For these models, the family sociodemographic predictors of “family expenditure” and “parents’ age” accounted for most of the model variability. The last 2 parameters, Urinates in toilet or potty and Buttons large buttons, had an accuracy of 91.00% and 76.00%, respectively, in ANN. Overall, the ANN had a higher accuracy (above ~80% on average) among the other algorithms for all the parameters. Across the models and milestones, “family expenditure,” “family size/type,” “living places,” and “parent’s age and occupation” were the most influential family/child sociodemographic factors. Conclusions: mCARE was successfully deployed in a low- and middle-income country (ie, Bangladesh), providing parents and care practitioners a mechanism to share detailed information on child milestones achievement. Using advanced modeling techniques this study demonstrates how family/child sociodemographic elements can inform child milestone achievement. Specifically, families with fewer sociodemographic resources reported later milestone attainment. Developmental science theories highlight how family/systems can directly influence child development and this study provides a clear link between family resources and child developmental progress. Clinical implications for this work could include supporting the larger family system to improve child milestone achievement. %M 33984830 %R 10.2196/29242 %U https://medinform.jmir.org/2021/6/e29242 %U https://doi.org/10.2196/29242 %U http://www.ncbi.nlm.nih.gov/pubmed/33984830 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 6 %P e25247 %T Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study %A Hu,Hao-Chun %A Chang,Shyue-Yih %A Wang,Chuen-Heng %A Li,Kai-Jun %A Cho,Hsiao-Yun %A Chen,Yi-Ting %A Lu,Chang-Jung %A Tsai,Tzu-Pei %A Lee,Oscar Kuang-Sheng %+ Institute of Clinical Medicine, National Yang Ming Chiao Tung University, No 155, Section 2, Li-Nong Street, Beitou District, Taipei, 11221, Taiwan, 886 2 28757391, oscarlee9203@gmail.com %K artificial intelligence %K convolutional neural network %K dysphonia %K pathological voice %K vocal fold disease %K voice pathology identification %D 2021 %7 8.6.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Dysphonia influences the quality of life by interfering with communication. However, a laryngoscopic examination is expensive and not readily accessible in primary care units. Experienced laryngologists are required to achieve an accurate diagnosis. Objective: This study sought to detect various vocal fold diseases through pathological voice recognition using artificial intelligence. Methods: We collected 189 normal voice samples and 552 samples of individuals with voice disorders, including vocal atrophy (n=224), unilateral vocal paralysis (n=50), organic vocal fold lesions (n=248), and adductor spasmodic dysphonia (n=30). The 741 samples were divided into 2 sets: 593 samples as the training set and 148 samples as the testing set. A convolutional neural network approach was applied to train the model, and findings were compared with those of human specialists. Results: The convolutional neural network model achieved a sensitivity of 0.66, a specificity of 0.91, and an overall accuracy of 66.9% for distinguishing normal voice, vocal atrophy, unilateral vocal paralysis, organic vocal fold lesions, and adductor spasmodic dysphonia. Compared with the accuracy of human specialists, the overall accuracy rates were 60.1% and 56.1% for the 2 laryngologists and 51.4% and 43.2% for the 2 general ear, nose, and throat doctors. Conclusions: Voice alone could be used for common vocal fold disease recognition through a deep learning approach after training with our Mandarin pathological voice database. This approach involving artificial intelligence could be clinically useful for screening general vocal fold disease using the voice. The approach includes a quick survey and a general health examination. It can be applied during telemedicine in areas with primary care units lacking laryngoscopic abilities. It could support physicians when prescreening cases by allowing for invasive examinations to be performed only for cases involving problems with automatic recognition or listening and for professional analyses of other clinical examination results that reveal doubts about the presence of pathologies. %M 34100770 %R 10.2196/25247 %U https://www.jmir.org/2021/6/e25247 %U https://doi.org/10.2196/25247 %U http://www.ncbi.nlm.nih.gov/pubmed/34100770 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 6 %P e28856 %T Reliable Prediction Models Based on Enriched Data for Identifying the Mode of Childbirth by Using Machine Learning Methods: Development Study %A Ullah,Zahid %A Saleem,Farrukh %A Jamjoom,Mona %A Fakieh,Bahjat %+ Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, PO Box 84428, Riyadh, 11671, Saudi Arabia, 966 505273052, mmjamjoom@pnu.edu.sa %K machine learning %K prediction model %K health care %K cesarean %K delivery %K decision making %D 2021 %7 4.6.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: The use of artificial intelligence has revolutionized every area of life such as business and trade, social and electronic media, education and learning, manufacturing industries, medicine and sciences, and every other sector. The new reforms and advanced technologies of artificial intelligence have enabled data analysts to transmute raw data generated by these sectors into meaningful insights for an effective decision-making process. Health care is one of the integral sectors where a large amount of data is generated daily, and making effective decisions based on these data is therefore a challenge. In this study, cases related to childbirth either by the traditional method of vaginal delivery or cesarean delivery were investigated. Cesarean delivery is performed to save both the mother and the fetus when complications related to vaginal birth arise. Objective: The aim of this study was to develop reliable prediction models for a maternity care decision support system to predict the mode of delivery before childbirth. Methods: This study was conducted in 2 parts for identifying the mode of childbirth: first, the existing data set was enriched and second, previous medical records about the mode of delivery were investigated using machine learning algorithms and by extracting meaningful insights from unseen cases. Several prediction models were trained to achieve this objective, such as decision tree, random forest, AdaBoostM1, bagging, and k-nearest neighbor, based on original and enriched data sets. Results: The prediction models based on enriched data performed well in terms of accuracy, sensitivity, specificity, F-measure, and receiver operating characteristic curves in the outcomes. Specifically, the accuracy of k-nearest neighbor was 84.38%, that of bagging was 83.75%, that of random forest was 83.13%, that of decision tree was 81.25%, and that of AdaBoostM1 was 80.63%. Enrichment of the data set had a good impact on improving the accuracy of the prediction process, which supports maternity care practitioners in making decisions in critical cases. Conclusions: Our study shows that enriching the data set improves the accuracy of the prediction process, thereby supporting maternity care practitioners in making informed decisions in critical cases. The enriched data set used in this study yields good results, but this data set can become even better if the records are increased with real clinical data. %M 34085938 %R 10.2196/28856 %U https://www.jmir.org/2021/6/e28856 %U https://doi.org/10.2196/28856 %U http://www.ncbi.nlm.nih.gov/pubmed/34085938 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 6 %P e25006 %T Chatbots to Support People With Dementia and Their Caregivers: Systematic Review of Functions and Quality %A Ruggiano,Nicole %A Brown,Ellen L %A Roberts,Lisa %A Framil Suarez,C Victoria %A Luo,Yan %A Hao,Zhichao %A Hristidis,Vagelis %+ Department of Computer Science and Engineering, University of California, Riverside, 317 Winston Chung Hall, 900 University Ave, Riverside, CA, 92521, United States, 1 951 827 2478, vagelis@cs.ucr.edu %K dementia %K caregivers %K chatbots %K conversation agents %K mobile apps %K mobile phone %D 2021 %7 3.6.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Over the past decade, there has been an increase in the use of information technologies to educate and support people with dementia and their family caregivers. At the same time, chatbot technologies have become increasingly popular for use by the public and have been identified as having benefits for health care delivery. However, little is known about how chatbot technologies may benefit people with dementia and their caregivers. Objective: This study aims to identify the types of current commercially available chatbots that are designed for use by people with dementia and their caregivers and to assess their quality in terms of features and content. Methods: Chatbots were identified through a systematic search on Google Play Store, Apple App Store, Alexa Skills, and the internet. An evidence-based assessment tool was used to evaluate the features and content of the identified apps. The assessment was conducted through interrater agreement among 4 separate reviewers. Results: Of the 505 initial chatbots identified, 6 were included in the review. The chatbots assessed varied significantly in terms of content and scope. Although the chatbots were generally found to be easy to use, some limitations were noted regarding their performance and programmed content for dialog. Conclusions: Although chatbot technologies are well established and commonly used by the public, their development for people with dementia and their caregivers is in its infancy. Given the successful use of chatbots in other health care settings and for other applications, there are opportunities to integrate this technology into dementia care. However, more evidence-based chatbots that have undergone end user evaluation are needed to evaluate their potential to adequately educate and support these populations. %M 34081019 %R 10.2196/25006 %U https://www.jmir.org/2021/6/e25006 %U https://doi.org/10.2196/25006 %U http://www.ncbi.nlm.nih.gov/pubmed/34081019 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 6 %P e25929 %T Evaluation Framework for Successful Artificial Intelligence–Enabled Clinical Decision Support Systems: Mixed Methods Study %A Ji,Mengting %A Genchev,Georgi Z %A Huang,Hengye %A Xu,Ting %A Lu,Hui %A Yu,Guangjun %+ Shanghai Children’s Hospital, Shanghai Jiao Tong University, No 355 Luding Road, Shanghai, 200062, China, 86 18917762998, yugj1688@shchildren.com.cn %K artificial intelligence %K AI %K clinical decision support systems %K evaluation framework %D 2021 %7 2.6.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Clinical decision support systems are designed to utilize medical data, knowledge, and analysis engines and to generate patient-specific assessments or recommendations to health professionals in order to assist decision making. Artificial intelligence–enabled clinical decision support systems aid the decision-making process through an intelligent component. Well-defined evaluation methods are essential to ensure the seamless integration and contribution of these systems to clinical practice. Objective: The purpose of this study was to develop and validate a measurement instrument and test the interrelationships of evaluation variables for an artificial intelligence–enabled clinical decision support system evaluation framework. Methods: An artificial intelligence–enabled clinical decision support system evaluation framework consisting of 6 variables was developed. A Delphi process was conducted to develop the measurement instrument items. Cognitive interviews and pretesting were performed to refine the questions. Web-based survey response data were analyzed to remove irrelevant questions from the measurement instrument, to test dimensional structure, and to assess reliability and validity. The interrelationships of relevant variables were tested and verified using path analysis, and a 28-item measurement instrument was developed. Measurement instrument survey responses were collected from 156 respondents. Results: The Cronbach α of the measurement instrument was 0.963, and its content validity was 0.943. Values of average variance extracted ranged from 0.582 to 0.756, and values of the heterotrait-monotrait ratio ranged from 0.376 to 0.896. The final model had a good fit (χ262=36.984; P=.08; comparative fit index 0.991; goodness-of-fit index 0.957; root mean square error of approximation 0.052; standardized root mean square residual 0.028). Variables in the final model accounted for 89% of the variance in the user acceptance dimension. Conclusions: User acceptance is the central dimension of artificial intelligence–enabled clinical decision support system success. Acceptance was directly influenced by perceived ease of use, information quality, service quality, and perceived benefit. Acceptance was also indirectly influenced by system quality and information quality through perceived ease of use. User acceptance and perceived benefit were interrelated. %M 34076581 %R 10.2196/25929 %U https://www.jmir.org/2021/6/e25929 %U https://doi.org/10.2196/25929 %U http://www.ncbi.nlm.nih.gov/pubmed/34076581 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 6 %P e28921 %T Ethical Applications of Artificial Intelligence: Evidence From Health Research on Veterans %A Makridis,Christos %A Hurley,Seth %A Klote,Mary %A Alterovitz,Gil %+ National Artificial Intelligence Institute, Department of Veterans Affairs, 810 Vermont Avenue NW, Washington, DC, 20420, United States, 1 2022977787, christos.makridis@va.gov %K artificial intelligence %K ethics %K veterans %K health data %K technology %K Veterans Affairs %K health technology %K data %D 2021 %7 2.6.2021 %9 Viewpoint %J JMIR Med Inform %G English %X Background: Despite widespread agreement that artificial intelligence (AI) offers significant benefits for individuals and society at large, there are also serious challenges to overcome with respect to its governance. Recent policymaking has focused on establishing principles for the trustworthy use of AI. Adhering to these principles is especially important for ensuring that the development and application of AI raises economic and social welfare, including among vulnerable groups and veterans. Objective: We explore the newly developed principles around trustworthy AI and how they can be readily applied at scale to vulnerable groups that are potentially less likely to benefit from technological advances. Methods: Using the US Department of Veterans Affairs as a case study, we explore the principles of trustworthy AI that are of particular interest for vulnerable groups and veterans. Results: We focus on three principles: (1) designing, developing, acquiring, and using AI so that the benefits of its use significantly outweigh the risks and the risks are assessed and managed; (2) ensuring that the application of AI occurs in well-defined domains and is accurate, effective, and fit for the intended purposes; and (3) ensuring that the operations and outcomes of AI applications are sufficiently interpretable and understandable by all subject matter experts, users, and others. Conclusions: These principles and applications apply more generally to vulnerable groups, and adherence to them can allow the VA and other organizations to continue modernizing their technology governance, leveraging the gains of AI while simultaneously managing its risks. %M 34076584 %R 10.2196/28921 %U https://medinform.jmir.org/2021/6/e28921 %U https://doi.org/10.2196/28921 %U http://www.ncbi.nlm.nih.gov/pubmed/34076584 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 6 %P e27793 %T Unsupervised Machine Learning for Identifying Challenging Behavior Profiles to Explore Cluster-Based Treatment Efficacy in Children With Autism Spectrum Disorder: Retrospective Data Analysis Study %A Gardner-Hoag,Julie %A Novack,Marlena %A Parlett-Pelleriti,Chelsea %A Stevens,Elizabeth %A Dixon,Dennis %A Linstead,Erik %+ Fowler School of Engineering, Chapman University, One University Drive, Orange, CA, 92866, United States, 1 714 289 3159, linstead@chapman.edu %K autism spectrum disorder %K challenging behaviors %K unsupervised machine learning %K subtypes %K treatment response %K autism %K treatment %K behavior %K machine learning %K impact %K efficacy %K disorder %K engagement %K retrospective %D 2021 %7 2.6.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Challenging behaviors are prevalent among individuals with autism spectrum disorder; however, research exploring the impact of challenging behaviors on treatment response is lacking. Objective: The purpose of this study was to identify types of autism spectrum disorder based on engagement in different challenging behaviors and evaluate differences in treatment response between groups. Methods: Retrospective data on challenging behaviors and treatment progress for 854 children with autism spectrum disorder were analyzed. Participants were clustered based on 8 observed challenging behaviors using k means, and multiple linear regression was performed to test interactions between skill mastery and treatment hours, cluster assignment, and gender. Results: Seven clusters were identified, which demonstrated a single dominant challenging behavior. For some clusters, significant differences in treatment response were found. Specifically, a cluster characterized by low levels of stereotypy was found to have significantly higher levels of skill mastery than clusters characterized by self-injurious behavior and aggression (P<.003). Conclusions: These findings have implications on the treatment of individuals with autism spectrum disorder. Self-injurious behavior and aggression were prevalent among participants with the worst treatment response, thus interventions targeting these challenging behaviors may be worth prioritizing. Furthermore, the use of unsupervised machine learning models to identify types of autism spectrum disorder shows promise. %M 34076577 %R 10.2196/27793 %U https://medinform.jmir.org/2021/6/e27793 %U https://doi.org/10.2196/27793 %U http://www.ncbi.nlm.nih.gov/pubmed/34076577 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 5 %P e28868 %T A Multimodal Imaging–Based Deep Learning Model for Detecting Treatment-Requiring Retinal Vascular Diseases: Model Development and Validation Study %A Kang,Eugene Yu-Chuan %A Yeung,Ling %A Lee,Yi-Lun %A Wu,Cheng-Hsiu %A Peng,Shu-Yen %A Chen,Yueh-Peng %A Gao,Quan-Ze %A Lin,Chihung %A Kuo,Chang-Fu %A Lai,Chi-Chun %+ Department of Ophthalmology, Keelung Chang Gung Memorial Hospital, No. 222, Maijin Rd, Keelung, Taiwan, 886 24313131 ext 6314, Chichun.lai@gmail.com %K deep learning %K retinal vascular diseases %K multimodal imaging %K treatment requirement %K machine learning %K eye %K retinal %K imaging %K treatment %K model %K detection %K vascular %D 2021 %7 31.5.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Retinal vascular diseases, including diabetic macular edema (DME), neovascular age-related macular degeneration (nAMD), myopic choroidal neovascularization (mCNV), and branch and central retinal vein occlusion (BRVO/CRVO), are considered vision-threatening eye diseases. However, accurate diagnosis depends on multimodal imaging and the expertise of retinal ophthalmologists. Objective: The aim of this study was to develop a deep learning model to detect treatment-requiring retinal vascular diseases using multimodal imaging. Methods: This retrospective study enrolled participants with multimodal ophthalmic imaging data from 3 hospitals in Taiwan from 2013 to 2019. Eye-related images were used, including those obtained through retinal fundus photography, optical coherence tomography (OCT), and fluorescein angiography with or without indocyanine green angiography (FA/ICGA). A deep learning model was constructed for detecting DME, nAMD, mCNV, BRVO, and CRVO and identifying treatment-requiring diseases. Model performance was evaluated and is presented as the area under the curve (AUC) for each receiver operating characteristic curve. Results: A total of 2992 eyes of 2185 patients were studied, with 239, 1209, 1008, 211, 189, and 136 eyes in the control, DME, nAMD, mCNV, BRVO, and CRVO groups, respectively. Among them, 1898 eyes required treatment. The eyes were divided into training, validation, and testing groups in a 5:1:1 ratio. In total, 5117 retinal fundus photos, 9316 OCT images, and 20,922 FA/ICGA images were used. The AUCs for detecting mCNV, DME, nAMD, BRVO, and CRVO were 0.996, 0.995, 0.990, 0.959, and 0.988, respectively. The AUC for detecting treatment-requiring diseases was 0.969. From the heat maps, we observed that the model could identify retinal vascular diseases. Conclusions: Our study developed a deep learning model to detect retinal diseases using multimodal ophthalmic imaging. Furthermore, the model demonstrated good performance in detecting treatment-requiring retinal diseases. %M 34057419 %R 10.2196/28868 %U https://medinform.jmir.org/2021/5/e28868 %U https://doi.org/10.2196/28868 %U http://www.ncbi.nlm.nih.gov/pubmed/34057419 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 5 %P e29058 %T A Machine Learning Approach for Mortality Prediction in COVID-19 Pneumonia: Development and Evaluation of the Piacenza Score %A Halasz,Geza %A Sperti,Michela %A Villani,Matteo %A Michelucci,Umberto %A Agostoni,Piergiuseppe %A Biagi,Andrea %A Rossi,Luca %A Botti,Andrea %A Mari,Chiara %A Maccarini,Marco %A Pura,Filippo %A Roveda,Loris %A Nardecchia,Alessia %A Mottola,Emanuele %A Nolli,Massimo %A Salvioni,Elisabetta %A Mapelli,Massimo %A Deriu,Marco Agostino %A Piga,Dario %A Piepoli,Massimo %+ Department of Cardiology, Guglielmo Da Saliceto Hospital, Via Taverna 49,, Piacenza, 29121, Italy, 39 3517489495, geza.halasz@gmail.com %K artificial intelligence %K prognostic score %K COVID-19 %K pneumonia %K mortality %K prediction %K machine learning %K modeling %D 2021 %7 31.5.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Several models have been developed to predict mortality in patients with COVID-19 pneumonia, but only a few have demonstrated enough discriminatory capacity. Machine learning algorithms represent a novel approach for the data-driven prediction of clinical outcomes with advantages over statistical modeling. Objective: We aimed to develop a machine learning–based score—the Piacenza score—for 30-day mortality prediction in patients with COVID-19 pneumonia. Methods: The study comprised 852 patients with COVID-19 pneumonia, admitted to the Guglielmo da Saliceto Hospital in Italy from February to November 2020. Patients’ medical history, demographics, and clinical data were collected using an electronic health record. The overall patient data set was randomly split into derivation and test cohorts. The score was obtained through the naïve Bayes classifier and externally validated on 86 patients admitted to Centro Cardiologico Monzino (Italy) in February 2020. Using a forward-search algorithm, 6 features were identified: age, mean corpuscular hemoglobin concentration, PaO2/FiO2 ratio, temperature, previous stroke, and gender. The Brier index was used to evaluate the ability of the machine learning model to stratify and predict the observed outcomes. A user-friendly website was designed and developed to enable fast and easy use of the tool by physicians. Regarding the customization properties of the Piacenza score, we added a tailored version of the algorithm to the website, which enables an optimized computation of the mortality risk score for a patient when some of the variables used by the Piacenza score are not available. In this case, the naïve Bayes classifier is retrained over the same derivation cohort but using a different set of patient characteristics. We also compared the Piacenza score with the 4C score and with a naïve Bayes algorithm with 14 features chosen a priori. Results: The Piacenza score exhibited an area under the receiver operating characteristic curve (AUC) of 0.78 (95% CI 0.74-0.84, Brier score=0.19) in the internal validation cohort and 0.79 (95% CI 0.68-0.89, Brier score=0.16) in the external validation cohort, showing a comparable accuracy with respect to the 4C score and to the naïve Bayes model with a priori chosen features; this achieved an AUC of 0.78 (95% CI 0.73-0.83, Brier score=0.26) and 0.80 (95% CI 0.75-0.86, Brier score=0.17), respectively. Conclusions: Our findings demonstrated that a customizable machine learning–based score with a purely data-driven selection of features is feasible and effective for the prediction of mortality among patients with COVID-19 pneumonia. %M 33999838 %R 10.2196/29058 %U https://www.jmir.org/2021/5/e29058 %U https://doi.org/10.2196/29058 %U http://www.ncbi.nlm.nih.gov/pubmed/33999838 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 5 %P e27778 %T A Roadmap for Automating Lineage Tracing to Aid Automatically Explaining Machine Learning Predictions for Clinical Decision Support %A Luo,Gang %+ Department of Biomedical Informatics and Medical Education, University of Washington, UW Medicine South Lake Union, 850 Republican Street, Building C, Box 358047, Seattle, WA, 98195, United States, 1 206 221 4596, gangluo@cs.wisc.edu %K clinical decision support %K database management systems %K forecasting %K machine learning %K electronic medical records %D 2021 %7 27.5.2021 %9 Viewpoint %J JMIR Med Inform %G English %X Using machine learning predictive models for clinical decision support has great potential in improving patient outcomes and reducing health care costs. However, most machine learning models are black boxes that do not explain their predictions, thereby forming a barrier to clinical adoption. To overcome this barrier, an automated method was recently developed to provide rule-style explanations of any machine learning model’s predictions on tabular data and to suggest customized interventions. Each explanation delineates the association between a feature value pattern and an outcome value. Although the association and intervention information is useful, the user of the automated explaining function often requires more detailed information to better understand the patient’s situation and to aid in decision making. More specifically, consider a feature value in the explanation that is computed by an aggregation function on the raw data, such as the number of emergency department visits related to asthma that the patient had in the prior 12 months. The user often wants to rapidly drill through to see certain parts of the related raw data that produce the feature value. This task is frequently difficult and time-consuming because the few pieces of related raw data are submerged by many pieces of raw data of the patient that are unrelated to the feature value. To address this issue, this paper outlines an automated lineage tracing approach, which adds automated drill-through capability to the automated explaining function, and provides a roadmap for future research. %M 34042600 %R 10.2196/27778 %U https://medinform.jmir.org/2021/5/e27778 %U https://doi.org/10.2196/27778 %U http://www.ncbi.nlm.nih.gov/pubmed/34042600 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 5 %P e29405 %T Authors’ Reply to: Minimizing Selection and Classification Biases Comment on “Clinical Characteristics and Prognostic Factors for Intensive Care Unit Admission of Patients With COVID-19: Retrospective Study Using Machine Learning and Natural Language Processing” %A Izquierdo,Jose Luis %A Soriano,Joan B %+ Hospital Universitario de La Princesa, Diego de Leon 62, Servicio de Neumología, Madrid, 28006, Spain, jbsoriano2@gmail.com %K artificial intelligence %K big data %K COVID-19 %K electronic health records %K tachypnea %K SARS-CoV-2 %K predictive model %K prognosis %K classification bias %K critical care %D 2021 %7 26.5.2021 %9 Letter to the Editor %J J Med Internet Res %G English %X %M 33989164 %R 10.2196/29405 %U https://www.jmir.org/2021/5/e29405 %U https://doi.org/10.2196/29405 %U http://www.ncbi.nlm.nih.gov/pubmed/33989164 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 5 %P e27142 %T Minimizing Selection and Classification Biases. Comment on “Clinical Characteristics and Prognostic Factors for Intensive Care Unit Admission of Patients With COVID-19: Retrospective Study Using Machine Learning and Natural Language Processing” %A Martos Pérez,Francisco %A Gomez Huelgas,Ricardo %A Martín Escalante,María Dolores %A Casas Rojo,José Manuel %+ Department of Internal Medicine, Hospital Costa del Sol, Autovía A-7, Km 187, Marbella, 29603, Spain, 34 658927715, pacomartos1@gmail.com %K artificial intelligence %K big data %K COVID-19 %K electronic health records %K tachypnea %K SARS-CoV-2 %K predictive model %K prognosis %K classification bias %K critical care %D 2021 %7 26.5.2021 %9 Letter to the Editor %J J Med Internet Res %G English %X %M 33989163 %R 10.2196/27142 %U https://www.jmir.org/2021/5/e27142 %U https://doi.org/10.2196/27142 %U http://www.ncbi.nlm.nih.gov/pubmed/33989163 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 10 %N 5 %P e27271 %T Personalized Analytics and a Wearable Biosensor Platform for Early Detection of COVID-19 Decompensation (DeCODe): Protocol for the Development of the COVID-19 Decompensation Index %A Larimer,Karen %A Wegerich,Stephan %A Splan,Joel %A Chestek,David %A Prendergast,Heather %A Vanden Hoek,Terry %+ physIQ, Inc, 200 W Jackson Street Suite 550, Chicago, IL, 60606, United States, 1 7736126205, karen.larimer@physiq.com %K analytic %K artificial intelligence %K biomarker %K cloud %K COVID-19 %K decompensation %K detection %K development %K index %K monitoring %K outcome %K remote monitoring %K symptom validation %K wearable %D 2021 %7 26.5.2021 %9 Protocol %J JMIR Res Protoc %G English %X Background: During the COVID-19 pandemic, novel digital health technologies have the potential to improve our understanding of SARS-CoV-2 and COVID-19, improve care delivery, and produce better health outcomes. The National Institutes of Health called on digital health leaders to contribute to a high-quality data repository that will support researchers to make discoveries that are otherwise not possible with small, limited data sets. Objective: To this end, we seek to develop a COVID-19 digital biomarker for early detection of physiological exacerbation or decompensation. We propose the development and validation of a COVID-19 decompensation Index (CDI) in a 2-phase study that builds on existing wearable biosensor-derived analytics generated by physIQ’s end-to-end cloud platform for continuous physiological monitoring with wearable biosensors. This effort serves to achieve two primary objectives: (1) to collect adequate data to help develop the CDI and (2) to collect rich deidentified clinical data correlating with outcomes and symptoms related to COVID-19 progression. Our secondary objectives include evaluation of the feasibility and usability of pinpointIQ, a digital platform through which data are gathered, analyzed, and displayed. Methods: This is a prospective, nonrandomized, open-label, 2-phase study. Phase I will involve data collection for the digital data hub of the National Institutes of Health as well as data to support the preliminary development of the CDI. Phase II will involve data collection for the hub and contribute to continued refinement and validation of the CDI. While this study will focus on the development of a CDI, the digital platform will also be evaluated for feasibility and usability while clinicians deliver care to continuously monitored patients enrolled in the study. Results: Our target CDI will be a binary classifier trained to distinguish participants with and those without decompensation. The primary performance metric for CDI will be the area under the receiver operating characteristic curve with a minimum performance criterion of ≥0.75 (α=.05; power [1–β]=0.80). Furthermore, we will determine the sex or gender and race or ethnicity of the participants, which would account for differences in the CDI performance, as well as the lead time—time to predict decompensation—and its relationship with the ultimate disease severity based on the World Health Organization COVID-19 ordinal scale. Conclusions: Using machine learning techniques on a large data set of patients with COVID-19 could provide valuable insights into the pathophysiology of COVID-19 and a digital biomarker for COVID-19 decompensation. Through this study, we intend to develop a tool that can uniquely reflect physiological data of a diverse population and contribute to high-quality data that will help researchers better understand COVID-19. Trial Registration: ClinicalTrials.gov NCT04575532; https://www.clinicaltrials.gov/ct2/show/NCT04575532 International Registered Report Identifier (IRRID): DERR1-10.2196/27271 %M 33949966 %R 10.2196/27271 %U https://www.researchprotocols.org/2021/5/e27271 %U https://doi.org/10.2196/27271 %U http://www.ncbi.nlm.nih.gov/pubmed/33949966 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 5 %P e22959 %T Artificial Intelligence Can Improve Patient Management at the Time of a Pandemic: The Role of Voice Technology %A Jadczyk,Tomasz %A Wojakowski,Wojciech %A Tendera,Michal %A Henry,Timothy D %A Egnaczyk,Gregory %A Shreenivas,Satya %+ Department of Cardiology and Structural Heart Diseases, Medical University of Silesia, Ziolowa 45-47, Katowice, 40-635, Poland, 48 512 099 211, tomasz.jadczyk@gmail.com %K artificial intelligence %K conversational agent %K COVID-19 %K virtual care %K voice assistant %K voice chatbot %D 2021 %7 25.5.2021 %9 Viewpoint %J J Med Internet Res %G English %X Artificial intelligence–driven voice technology deployed on mobile phones and smart speakers has the potential to improve patient management and organizational workflow. Voice chatbots have been already implemented in health care–leveraging innovative telehealth solutions during the COVID-19 pandemic. They allow for automatic acute care triaging and chronic disease management, including remote monitoring, preventive care, patient intake, and referral assistance. This paper focuses on the current clinical needs and applications of artificial intelligence–driven voice chatbots to drive operational effectiveness and improve patient experience and outcomes. %M 33999834 %R 10.2196/22959 %U https://www.jmir.org/2021/5/e22959 %U https://doi.org/10.2196/22959 %U http://www.ncbi.nlm.nih.gov/pubmed/33999834 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 5 %P e26109 %T Addressing Biodisaster X Threats With Artificial Intelligence and 6G Technologies: Literature Review and Critical Insights %A Su,Zhaohui %A McDonnell,Dean %A Bentley,Barry L %A He,Jiguang %A Shi,Feng %A Cheshmehzangi,Ali %A Ahmad,Junaid %A Jia,Peng %+ Department of Land Surveying and Geo-Informatics, The Hong Kong Polytechnic University, 181 Chatham Road South, Kowloon, Hong Kong, China, 86 2766 5956, jiapengff@hotmail.com %K 6G %K artificial intelligence %K biodisaster X %K biodisasters %K biosafety %K biosurveillance %K biotechnology %K bioterrorism %K COVID-19 %K disease X %K sixth-generation technologies %D 2021 %7 25.5.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: With advances in science and technology, biotechnology is becoming more accessible to people of all demographics. These advances inevitably hold the promise to improve personal and population well-being and welfare substantially. It is paradoxical that while greater access to biotechnology on a population level has many advantages, it may also increase the likelihood and frequency of biodisasters due to accidental or malicious use. Similar to “Disease X” (describing unknown naturally emerging pathogenic diseases with a pandemic potential), we term this unknown risk from biotechnologies “Biodisaster X.” To date, no studies have examined the potential role of information technologies in preventing and mitigating Biodisaster X. Objective: This study aimed to explore (1) what Biodisaster X might entail and (2) solutions that use artificial intelligence (AI) and emerging 6G technologies to help monitor and manage Biodisaster X threats. Methods: A review of the literature on applying AI and 6G technologies for monitoring and managing biodisasters was conducted on PubMed, using articles published from database inception through to November 16, 2020. Results: Our findings show that Biodisaster X has the potential to upend lives and livelihoods and destroy economies, essentially posing a looming risk for civilizations worldwide. To shed light on Biodisaster X threats, we detailed effective AI and 6G-enabled strategies, ranging from natural language processing to deep learning–based image analysis to address issues ranging from early Biodisaster X detection (eg, identification of suspicious behaviors), remote design and development of pharmaceuticals (eg, treatment development), and public health interventions (eg, reactive shelter-at-home mandate enforcement), as well as disaster recovery (eg, sentiment analysis of social media posts to shed light on the public’s feelings and readiness for recovery building). Conclusions: Biodisaster X is a looming but avoidable catastrophe. Considering the potential human and economic consequences Biodisaster X could cause, actions that can effectively monitor and manage Biodisaster X threats must be taken promptly and proactively. Rather than solely depending on overstretched professional attention of health experts and government officials, it is perhaps more cost-effective and practical to deploy technology-based solutions to prevent and control Biodisaster X threats. This study discusses what Biodisaster X could entail and emphasizes the importance of monitoring and managing Biodisaster X threats by AI techniques and 6G technologies. Future studies could explore how the convergence of AI and 6G systems may further advance the preparedness for high-impact, less likely events beyond Biodisaster X. %M 33961583 %R 10.2196/26109 %U https://www.jmir.org/2021/5/e26109 %U https://doi.org/10.2196/26109 %U http://www.ncbi.nlm.nih.gov/pubmed/33961583 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 5 %P e25237 %T Improving Current Glycated Hemoglobin Prediction in Adults: Use of Machine Learning Algorithms With Electronic Health Records %A Alhassan,Zakhriya %A Watson,Matthew %A Budgen,David %A Alshammari,Riyad %A Alessa,Ali %A Al Moubayed,Noura %+ Department of Computer Science, Durham University, Mountjoy Centre, Durham, DH1 3LE, United Kingdom, 44 1913 341724 ext 41749, noura.al-moubayed@durham.ac.uk %K glycated hemoglobin HbA1c %K prediction %K machine learning %K deep learning %K neural network %K multilayer perceptron %K electronic health records %K time series data %K longitudinal data %K diabetes %D 2021 %7 24.5.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Predicting the risk of glycated hemoglobin (HbA1c) elevation can help identify patients with the potential for developing serious chronic health problems, such as diabetes. Early preventive interventions based upon advanced predictive models using electronic health records data for identifying such patients can ultimately help provide better health outcomes. Objective: Our study investigated the performance of predictive models to forecast HbA1c elevation levels by employing several machine learning models. We also examined the use of patient electronic health record longitudinal data in the performance of the predictive models. Explainable methods were employed to interpret the decisions made by the black box models. Methods: This study employed multiple logistic regression, random forest, support vector machine, and logistic regression models, as well as a deep learning model (multilayer perceptron) to classify patients with normal (<5.7%) and elevated (≥5.7%) levels of HbA1c. We also integrated current visit data with historical (longitudinal) data from previous visits. Explainable machine learning methods were used to interrogate the models and provide an understanding of the reasons behind the decisions made by the models. All models were trained and tested using a large data set from Saudi Arabia with 18,844 unique patient records. Results: The machine learning models achieved promising results for predicting current HbA1c elevation risk. When coupled with longitudinal data, the machine learning models outperformed the multiple logistic regression model used in the comparative study. The multilayer perceptron model achieved an accuracy of 83.22% for the area under receiver operating characteristic curve when used with historical data. All models showed a close level of agreement on the contribution of random blood sugar and age variables with and without longitudinal data. Conclusions: This study shows that machine learning models can provide promising results for the task of predicting current HbA1c levels (≥5.7% or less). Using patients’ longitudinal data improved the performance and affected the relative importance for the predictors used. The models showed results that are consistent with comparable studies. %M 34028357 %R 10.2196/25237 %U https://medinform.jmir.org/2021/5/e25237 %U https://doi.org/10.2196/25237 %U http://www.ncbi.nlm.nih.gov/pubmed/34028357 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 5 %P e27806 %T A COVID-19 Pandemic Artificial Intelligence–Based System With Deep Learning Forecasting and Automatic Statistical Data Acquisition: Development and Implementation Study %A Yu,Cheng-Sheng %A Chang,Shy-Shin %A Chang,Tzu-Hao %A Wu,Jenny L %A Lin,Yu-Jiun %A Chien,Hsiung-Fei %A Chen,Ray-Jade %+ Department of Surgery, School of Medicine, College of Medicine, Taipei Medical University, No.250, Wuxing St.,, Taipei, 11031, Taiwan, 886 227372181 ext 3966, rayjchen@tmu.edu.tw %K COVID-19 %K artificial intelligence %K time series %K deep learning %K machine learning %K statistical analysis %K pandemic %K data visualization %D 2021 %7 20.5.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: More than 79.2 million confirmed COVID-19 cases and 1.7 million deaths were caused by SARS-CoV-2; the disease was named COVID-19 by the World Health Organization. Control of the COVID-19 epidemic has become a crucial issue around the globe, but there are limited studies that investigate the global trend of the COVID-19 pandemic together with each country’s policy measures. Objective: We aimed to develop an online artificial intelligence (AI) system to analyze the dynamic trend of the COVID-19 pandemic, facilitate forecasting and predictive modeling, and produce a heat map visualization of policy measures in 171 countries. Methods: The COVID-19 Pandemic AI System (CPAIS) integrated two data sets: the data set from the Oxford COVID-19 Government Response Tracker from the Blavatnik School of Government, which is maintained by the University of Oxford, and the data set from the COVID-19 Data Repository, which was established by the Johns Hopkins University Center for Systems Science and Engineering. This study utilized four statistical and deep learning techniques for forecasting: autoregressive integrated moving average (ARIMA), feedforward neural network (FNN), multilayer perceptron (MLP) neural network, and long short-term memory (LSTM). With regard to 1-year records (ie, whole time series data), records from the last 14 days served as the validation set to evaluate the performance of the forecast, whereas earlier records served as the training set. Results: A total of 171 countries that featured in both databases were included in the online system. The CPAIS was developed to explore variations, trends, and forecasts related to the COVID-19 pandemic across several counties. For instance, the number of confirmed monthly cases in the United States reached a local peak in July 2020 and another peak of 6,368,591 in December 2020. A dynamic heat map with policy measures depicts changes in COVID-19 measures for each country. A total of 19 measures were embedded within the three sections presented on the website, and only 4 of the 19 measures were continuous measures related to financial support or investment. Deep learning models were used to enable COVID-19 forecasting; the performances of ARIMA, FNN, and the MLP neural network were not stable because their forecast accuracy was only better than LSTM for a few countries. LSTM demonstrated the best forecast accuracy for Canada, as the root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) were 2272.551, 1501.248, and 0.2723075, respectively. ARIMA (RMSE=317.53169; MAPE=0.4641688) and FNN (RMSE=181.29894; MAPE=0.2708482) demonstrated better performance for South Korea. Conclusions: The CPAIS collects and summarizes information about the COVID-19 pandemic and offers data visualization and deep learning–based prediction. It might be a useful reference for predicting a serious outbreak or epidemic. Moreover, the system undergoes daily updates and includes the latest information on vaccination, which may change the dynamics of the pandemic. %M 33900932 %R 10.2196/27806 %U https://www.jmir.org/2021/5/e27806 %U https://doi.org/10.2196/27806 %U http://www.ncbi.nlm.nih.gov/pubmed/33900932 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 5 %P e27118 %T A Clinical Prediction Model to Predict Heparin Treatment Outcomes and Provide Dosage Recommendations: Development and Validation Study %A Li,Dongkai %A Gao,Jianwei %A Hong,Na %A Wang,Hao %A Su,Longxiang %A Liu,Chun %A He,Jie %A Jiang,Huizhen %A Wang,Qiang %A Long,Yun %A Zhu,Weiguo %+ Department of Information Center, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No.1 Shuaifuyuan, Dongcheng District, Beijing, , China, 86 69154149, zhuwg@pumch.cn %K outcome prediction %K clinical decision support %K dosage recommendation %K machine learning %K intensive care unit %D 2021 %7 20.5.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Unfractionated heparin is widely used in the intensive care unit as an anticoagulant. However, weight-based heparin dosing has been shown to be suboptimal and may place patients at unnecessary risk during their intensive care unit stay. Objective: In this study, we intended to develop and validate a machine learning–based model to predict heparin treatment outcomes and to provide dosage recommendations to clinicians. Methods: A shallow neural network model was adopted in a retrospective cohort of patients from the Multiparameter Intelligent Monitoring in Intensive Care III (MIMIC III) database and patients admitted to the Peking Union Medical College Hospital (PUMCH). We modeled the subtherapeutic, normal, and supratherapeutic activated partial thromboplastin time (aPTT) as the outcomes of heparin treatment and used a group of clinical features for modeling. Our model classifies patients into 3 different therapeutic states. We tested the prediction ability of our model and evaluated its performance by using accuracy, the kappa coefficient, precision, recall, and the F1 score. Furthermore, a dosage recommendation module was designed and evaluated for clinical decision support. Results: A total of 3607 patients selected from MIMIC III and 1549 patients admitted to the PUMCH who met our criteria were included in this study. The shallow neural network model showed results of F1 scores 0.887 (MIMIC III) and 0.925 (PUMCH). When compared with the actual dosage prescribed, our model recommended increasing the dosage for 72.2% (MIMIC III, 1240/1718) and 64.7% (PUMCH, 281/434) of the subtherapeutic patients and decreasing the dosage for 80.9% (MIMIC III, 504/623) and 76.7% (PUMCH, 277/361) of the supratherapeutic patients, suggesting that the recommendations can contribute to clinical improvements and that they may effectively reduce the time to optimal dosage in the clinical setting. Conclusions: The evaluation of our model for predicting heparin treatment outcomes demonstrated that the developed model is potentially applicable for reducing the misdosage of heparin and for providing appropriate decision recommendations to clinicians. %M 34014171 %R 10.2196/27118 %U https://www.jmir.org/2021/5/e27118 %U https://doi.org/10.2196/27118 %U http://www.ncbi.nlm.nih.gov/pubmed/34014171 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 5 %P e17886 %T Predicting Prolonged Length of Hospital Stay for Peritoneal Dialysis–Treated Patients Using Stacked Generalization: Model Development and Validation Study %A Kong,Guilan %A Wu,Jingyi %A Chu,Hong %A Yang,Chao %A Lin,Yu %A Lin,Ke %A Shi,Ying %A Wang,Haibo %A Zhang,Luxia %+ National Institute of Health Data Science, Peking University, No 38 Xueyuan Road, Haidian District, Beijing, 100191, China, 86 10 82806538, zhanglx@bjmu.edu.cn %K peritoneal dialysis %K prolonged length of stay %K machine learning %K prediction model %K clinical decision support %D 2021 %7 19.5.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: The increasing number of patients treated with peritoneal dialysis (PD) and their consistently high rate of hospital admissions have placed a large burden on the health care system. Early clinical interventions and optimal management of patients at a high risk of prolonged length of stay (pLOS) may help improve the medical efficiency and prognosis of PD-treated patients. If timely clinical interventions are not provided, patients at a high risk of pLOS may face a poor prognosis and high medical expenses, which will also be a burden on hospitals. Therefore, physicians need an effective pLOS prediction model for PD-treated patients. Objective: This study aimed to develop an optimal data-driven model for predicting the pLOS risk of PD-treated patients using basic admission data. Methods: Patient data collected using the Hospital Quality Monitoring System (HQMS) in China were used to develop pLOS prediction models. A stacking model was constructed with support vector machine, random forest (RF), and K-nearest neighbor algorithms as its base models and traditional logistic regression (LR) as its meta-model. The meta-model used the outputs of all 3 base models as input and generated the output of the stacking model. Another LR-based pLOS prediction model was built as the benchmark model. The prediction performance of the stacking model was compared with that of its base models and the benchmark model. Five-fold cross-validation was employed to develop and validate the models. Performance measures included the Brier score, area under the receiver operating characteristic curve (AUROC), estimated calibration index (ECI), accuracy, sensitivity, specificity, and geometric mean (Gm). In addition, a calibration plot was employed to visually demonstrate the calibration power of each model. Results: The final cohort extracted from the HQMS database consisted of 23,992 eligible PD-treated patients, among whom 30.3% had a pLOS (ie, longer than the average LOS, which was 16 days in our study). Among the models, the stacking model achieved the best calibration (ECI 8.691), balanced accuracy (Gm 0.690), accuracy (0.695), and specificity (0.701). Meanwhile, the stacking and RF models had the best overall performance (Brier score 0.174 for both) and discrimination (AUROC 0.757 for the stacking model and 0.756 for the RF model). Compared with the benchmark LR model, the stacking model was superior in all performance measures except sensitivity, but there was no significant difference in sensitivity between the 2 models. The 2-sided t tests revealed significant performance differences between the stacking and LR models in overall performance, discrimination, calibration, balanced accuracy, and accuracy. Conclusions: This study is the first to develop data-driven pLOS prediction models for PD-treated patients using basic admission data from a national database. The results indicate the feasibility of utilizing a stacking-based pLOS prediction model for PD-treated patients. The pLOS prediction tools developed in this study have the potential to assist clinicians in identifying patients at a high risk of pLOS and to allocate resources optimally for PD-treated patients. %M 34009135 %R 10.2196/17886 %U https://medinform.jmir.org/2021/5/e17886 %U https://doi.org/10.2196/17886 %U http://www.ncbi.nlm.nih.gov/pubmed/34009135 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 10 %N 5 %P e27340 %T Health Care Professional Association Agency in Preparing for Artificial Intelligence: Protocol for a Multi-Case Study %A Gillan,Caitlin %A Hodges,Brian %A Wiljer,David %A Dobrow,Mark %+ Institute of Health Policy, Management, and Evaluation, University of Toronto, 700 Bay St, 2nd Floor, Suite 201, Toronto, ON, M5G1Z6, Canada, 1 416 340 4800 ext 2916, caitlin.gillan@uhn.ca %K artificial intelligence %K health professions %K normalization process theory %K case study %D 2021 %7 19.5.2021 %9 Protocol %J JMIR Res Protoc %G English %X Background: The emergence of artificial intelligence (AI) in health care has impacted health care systems, including employment, training, education, and professional regulation. It is incumbent on health professional associations to assist their membership in defining and preparing for AI-related change. Health professional associations, or the national groups convened to represent the interests of the members of a profession, play a unique role in establishing the sociocultural, normative, and regulative elements of health care professions. Objective: The aim of this paper is to present a protocol for a proposed study of how, when faced with AI as a disruptive technology, health professional associations engage in sensemaking and legitimization of change to support their membership in preparing for future practice. Methods: An exploratory multi-case study approach will be used. This study will be informed by the normalization process theory (NPT), which suggests behavioral constructs required for complex change, providing a novel lens through which to consider the agency of macrolevel actors in practice change. A total of 4 health professional associations will be studied, each representing an instrumental case and related fields selected for their early consideration of AI technologies. Data collection will consist of key informant interviews, observation of relevant meetings, and document review. Individual and collective sensemaking activities and action toward change will be identified using stakeholder network mapping. A hybrid inductive and deductive model will be used for a concurrent thematic analysis, mapping emergent themes against the NPT framework to assess fit and identify areas of discordance. Results: As of January 2021, we have conducted 17 interviews, with representation across the 4 health professional associations. Of these 17 interviews, 15 (88%) have been transcribed. Document review is underway and complete for one health professional association and nearly complete for another. Observation opportunities have been challenged by competing priorities during COVID-19 and may require revisiting. A linear cross-case analytic approach will be taken to present the data, highlighting both guidance for the implementation of AI and implications for the application of NPT at the macro level. The ability to inform consideration of AI will depend on the degree to which the engaged health professional associations have considered this topic at the time of the study and, hence, what priority it has been assigned within the health professional association and what actions have been taken to consider or prepare for it. The fact that this may differ between health professional associations and practice environments will require consideration throughout the analysis. Conclusions: Ultimately, this protocol outlines a case study approach to understand how, when faced with AI as a disruptive technology, health professional associations engage in sensemaking and legitimization of change to support their membership in preparing for future practice. International Registered Report Identifier (IRRID): DERR1-10.2196/27340 %M 34009136 %R 10.2196/27340 %U https://www.researchprotocols.org/2021/5/e27340 %U https://doi.org/10.2196/27340 %U http://www.ncbi.nlm.nih.gov/pubmed/34009136 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 5 %P e25869 %T Federated Learning for Thyroid Ultrasound Image Analysis to Protect Personal Information: Validation Study in a Real Health Care Environment %A Lee,Haeyun %A Chai,Young Jun %A Joo,Hyunjin %A Lee,Kyungsu %A Hwang,Jae Youn %A Kim,Seok-Mo %A Kim,Kwangsoon %A Nam,Inn-Chul %A Choi,June Young %A Yu,Hyeong Won %A Lee,Myung-Chul %A Masuoka,Hiroo %A Miyauchi,Akira %A Lee,Kyu Eun %A Kim,Sungwan %A Kong,Hyoun-Joong %+ Transdisciplinary Department of Medicine and Advanced Technology, Seoul National University Hospital, Daehak-ro 101, Jongno-gu, Seoul, Republic of Korea, 82 2 2072 4492, gongcop@gmail.com %K deep learning %K federated learning %K thyroid nodules %K ultrasound image %D 2021 %7 18.5.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Federated learning is a decentralized approach to machine learning; it is a training strategy that overcomes medical data privacy regulations and generalizes deep learning algorithms. Federated learning mitigates many systemic privacy risks by sharing only the model and parameters for training, without the need to export existing medical data sets. In this study, we performed ultrasound image analysis using federated learning to predict whether thyroid nodules were benign or malignant. Objective: The goal of this study was to evaluate whether the performance of federated learning was comparable with that of conventional deep learning. Methods: A total of 8457 (5375 malignant, 3082 benign) ultrasound images were collected from 6 institutions and used for federated learning and conventional deep learning. Five deep learning networks (VGG19, ResNet50, ResNext50, SE-ResNet50, and SE-ResNext50) were used. Using stratified random sampling, we selected 20% (1075 malignant, 616 benign) of the total images for internal validation. For external validation, we used 100 ultrasound images (50 malignant, 50 benign) from another institution. Results: For internal validation, the area under the receiver operating characteristic (AUROC) curve for federated learning was between 78.88% and 87.56%, and the AUROC for conventional deep learning was between 82.61% and 91.57%. For external validation, the AUROC for federated learning was between 75.20% and 86.72%, and the AUROC curve for conventional deep learning was between 73.04% and 91.04%. Conclusions: We demonstrated that the performance of federated learning using decentralized data was comparable to that of conventional deep learning using pooled data. Federated learning might be potentially useful for analyzing medical images while protecting patients’ personal information. %M 33858817 %R 10.2196/25869 %U https://medinform.jmir.org/2021/5/e25869 %U https://doi.org/10.2196/25869 %U http://www.ncbi.nlm.nih.gov/pubmed/33858817 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 5 %P e26618 %T Understanding Public Perceptions of COVID-19 Contact Tracing Apps: Artificial Intelligence–Enabled Social Media Analysis %A Cresswell,Kathrin %A Tahir,Ahsen %A Sheikh,Zakariya %A Hussain,Zain %A Domínguez Hernández,Andrés %A Harrison,Ewen %A Williams,Robin %A Sheikh,Aziz %A Hussain,Amir %+ Usher Institute, The University of Edinburgh, Teviot Place, Edinburgh, EH8 9AG, United Kingdom, 44 (0)131 651 4151, kathrin.cresswell@ed.ac.uk %K artificial intelligence %K sentiment analysis %K COVID-19 %K contact tracing %K social media %K perception %K app %K exploratory %K suitability %K AI %K Facebook %K Twitter %K United Kingdom %K sentiment %K attitude %K infodemiology %K infoveillance %D 2021 %7 17.5.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: The emergence of SARS-CoV-2 in late 2019 and its subsequent spread worldwide continues to be a global health crisis. Many governments consider contact tracing of citizens through apps installed on mobile phones as a key mechanism to contain the spread of SARS-CoV-2. Objective: In this study, we sought to explore the suitability of artificial intelligence (AI)–enabled social media analyses using Facebook and Twitter to understand public perceptions of COVID-19 contact tracing apps in the United Kingdom. Methods: We extracted and analyzed over 10,000 relevant social media posts across an 8-month period, from March 1 to October 31, 2020. We used an initial filter with COVID-19–related keywords, which were predefined as part of an open Twitter-based COVID-19 dataset. We then applied a second filter using contract tracing app–related keywords and a geographical filter. We developed and utilized a hybrid, rule-based ensemble model, combining state-of-the-art lexicon rule-based and deep learning–based approaches. Results: Overall, we observed 76% positive and 12% negative sentiments, with the majority of negative sentiments reported in the North of England. These sentiments varied over time, likely influenced by ongoing public debates around implementing app-based contact tracing by using a centralized model where data would be shared with the health service, compared with decentralized contact-tracing technology. Conclusions: Variations in sentiments corroborate with ongoing debates surrounding the information governance of health-related information. AI-enabled social media analysis of public attitudes in health care can help facilitate the implementation of effective public health campaigns. %M 33939622 %R 10.2196/26618 %U https://www.jmir.org/2021/5/e26618 %U https://doi.org/10.2196/26618 %U http://www.ncbi.nlm.nih.gov/pubmed/33939622 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 5 %P e24803 %T An Attention Model With Transfer Embeddings to Classify Pneumonia-Related Bilingual Imaging Reports: Algorithm Development and Validation %A Park,Hyung %A Song,Min %A Lee,Eun Byul %A Seo,Bo Kyung %A Choi,Chang Min %+ Department of Pulmonary and Critical Care Medicine, Asan Medical Center, Olympic-ro 43-gil, Seoul, 05505, Republic of Korea, 82 2 3010 5902, ccm9607@gmail.com %K deep learning %K natural language process %K attention %K clinical data %K pneumonia %K classification %K medical imaging %K electronic health record %K machine learning %K model %D 2021 %7 17.5.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: In the analysis of electronic health records, proper labeling of outcomes is mandatory. To obtain proper information from radiologic reports, several studies were conducted to classify radiologic reports using deep learning. However, the classification of pneumonia in bilingual radiologic reports has not been conducted previously. Objective: The aim of this research was to classify radiologic reports into pneumonia or no pneumonia using a deep learning method. Methods: A data set of radiology reports for chest computed tomography and chest x-rays of surgical patients from January 2008 to January 2018 in the Asan Medical Center in Korea was retrospectively analyzed. The classification performance of our long short-term memory (LSTM)–Attention model was compared with various deep learning and machine learning methods. The area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve, sensitivity, specificity, accuracy, and F1 score for the models were compared. Results: A total of 5450 radiologic reports were included that contained at least one pneumonia-related word. In the test set (n=1090), our proposed model showed 91.01% (992/1090) accuracy (AUROCs for negative, positive, and obscure were 0.98, 0.97, and 0.90, respectively). The top 3 performances of the models were based on FastText or LSTM. The convolutional neural network–based model showed a lower accuracy 73.03% (796/1090) than the other 2 algorithms. The classification of negative results had an F1 score of 0.96, whereas the classification of positive and uncertain results showed a lower performance (positive F1 score 0.83; uncertain F1 score 0.62). In the extra-validation set, our model showed 80.0% (642/803) accuracy (AUROCs for negative, positive, and obscure were 0.92, 0.96, and 0.84, respectively). Conclusions: Our method showed excellent performance in classifying pneumonia in bilingual radiologic reports. The method could enrich the research on pneumonia by obtaining exact outcomes from electronic health data. %M 33820755 %R 10.2196/24803 %U https://medinform.jmir.org/2021/5/e24803 %U https://doi.org/10.2196/24803 %U http://www.ncbi.nlm.nih.gov/pubmed/33820755 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 10 %N 5 %P e24494 %T Improving Medication Adherence Through Adaptive Digital Interventions (iMedA) in Patients With Hypertension: Protocol for an Interrupted Time Series Study %A Etminani,Kobra %A Göransson,Carina %A Galozy,Alexander %A Norell Pejner,Margaretha %A Nowaczyk,Sławomir %+ Center for Applied Intelligent Systems Research, Halmstad University, Kristian IV:s väg 3, Halmstad, 30118, Sweden, 46 35167332, kobra.etminani@hh.se %K medication adherence %K hypertension %K digital intervention %K mHealth %K artificial intelligence %D 2021 %7 12.5.2021 %9 Protocol %J JMIR Res Protoc %G English %X Background: There is a strong need to improve medication adherence (MA) for individuals with hypertension in order to reduce long-term hospitalization costs. We believe this can be achieved through an artificial intelligence agent that helps the patient in understanding key individual adherence risk factors and designing an appropriate intervention plan. The incidence of hypertension in Sweden is estimated at approximately 27%. Although blood pressure control has increased in Sweden, barely half of the treated patients achieved adequate blood pressure levels. It is a major risk factor for coronary heart disease and stroke as well as heart failure. MA is a key factor for good clinical outcomes in persons with hypertension. Objective: The overall aim of this study is to design, develop, test, and evaluate an adaptive digital intervention called iMedA, delivered via a mobile app to improve MA, self-care management, and blood pressure control for persons with hypertension. Methods: The study design is an interrupted time series. We will collect data on a daily basis, 14 days before, during 6 months of delivering digital interventions through the mobile app, and 14 days after. The effect will be analyzed using segmented regression analysis. The participants will be recruited in Region Halland, Sweden. The design of the digital interventions follows the just-in-time adaptive intervention framework. The primary (distal) outcome is MA, and the secondary outcome is blood pressure. The design of the digital intervention is developed based on a needs assessment process including a systematic review, focus group interviews, and a pilot study, before conducting the longitudinal interrupted time series study. Results: The focus groups of persons with hypertension have been conducted to perform the needs assessment in a Swedish context. The design and development of digital interventions are in progress, and the interventions are planned to be ready in November 2020. Then, the 2-week pilot study for usability evaluation will start, and the interrupted time series study, which we plan to start in February 2021, will follow it. Conclusions: We hypothesize that iMedA will improve medication adherence and self-care management. This study could illustrate how self-care management tools can be an additional (digital) treatment support to a clinical one without increasing burden on health care staff. Trial Registration: ClinicalTrials.gov NCT04413500; https://clinicaltrials.gov/ct2/show/NCT04413500 International Registered Report Identifier (IRRID): DERR1-10.2196/24494 %M 33978593 %R 10.2196/24494 %U https://www.researchprotocols.org/2021/5/e24494 %U https://doi.org/10.2196/24494 %U http://www.ncbi.nlm.nih.gov/pubmed/33978593 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 5 %P e24721 %T Automated Generation of Personalized Shock Wave Lithotripsy Protocols: Treatment Planning Using Deep Learning %A Chen,Zhipeng %A Zeng,Daniel D %A Seltzer,Ryan G N %A Hamilton,Blake D %+ Shenzhen Artificial Intelligence and Data Science Institute (Longhua), Building 26, Technology Innovation Center, Hongshan 6979, Longhua, Shenzhen, 518110, China, 86 21071934, zhipengchen@saidi.org.cn %K nephrolithiasis %K extracorporeal shock wave therapy %K lithotripsy %K treatment planning %K deep learning %K artificial intelligence %D 2021 %7 11.5.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Though shock wave lithotripsy (SWL) has developed to be one of the most common treatment approaches for nephrolithiasis in recent decades, its treatment planning is often a trial-and-error process based on physicians’ subjective judgement. Physicians’ inexperience with this modality can lead to low-quality treatment and unnecessary risks to patients. Objective: To improve the quality and consistency of shock wave lithotripsy treatment, we aimed to develop a deep learning model for generating the next treatment step by previous steps and preoperative patient characteristics and to produce personalized SWL treatment plans in a step-by-step protocol based on the deep learning model. Methods: We developed a deep learning model to generate the optimal power level, shock rate, and number of shocks in the next step, given previous treatment steps encoded by long short-term memory neural networks and preoperative patient characteristics. We constructed a next-step data set (N=8583) from top practices of renal SWL treatments recorded in the International Stone Registry. Then, we trained the deep learning model and baseline models (linear regression, logistic regression, random forest, and support vector machine) with 90% of the samples and validated them with the remaining samples. Results: The deep learning models for generating the next treatment steps outperformed the baseline models (accuracy = 98.8%, F1 = 98.0% for power levels; accuracy = 98.1%, F1 = 96.0% for shock rates; root mean squared error = 207, mean absolute error = 121 for numbers of shocks). The hypothesis testing showed no significant difference between steps generated by our model and the top practices (P=.480 for power levels; P=.782 for shock rates; P=.727 for numbers of shocks). Conclusions: The high performance of our deep learning approach shows its treatment planning capability on par with top physicians. To the best of our knowledge, our framework is the first effort to implement automated planning of SWL treatment via deep learning. It is a promising technique in assisting treatment planning and physician training at low cost. %M 33973862 %R 10.2196/24721 %U https://medinform.jmir.org/2021/5/e24721 %U https://doi.org/10.2196/24721 %U http://www.ncbi.nlm.nih.gov/pubmed/33973862 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 8 %N 5 %P e27113 %T Learning From Clinical Consensus Diagnosis in India to Facilitate Automatic Classification of Dementia: Machine Learning Study %A Jin,Haomiao %A Chien,Sandy %A Meijer,Erik %A Khobragade,Pranali %A Lee,Jinkook %+ Center for Economic and Social Research, University of Southern California, 635 Downey Way, VPD, Los Angeles, CA, 90089, United States, 1 626 554 3370, haomiaoj@usc.edu %K dementia %K Alzheimer disease %K machine learning %K artificial intelligence %K diagnosis %K classification %K India %K model %D 2021 %7 10.5.2021 %9 Original Paper %J JMIR Ment Health %G English %X Background: The Harmonized Diagnostic Assessment of Dementia for the Longitudinal Aging Study in India (LASI-DAD) is the first and only nationally representative study on late-life cognition and dementia in India (n=4096). LASI-DAD obtained clinical consensus diagnosis of dementia for a subsample of 2528 respondents. Objective: This study develops a machine learning model that uses data from the clinical consensus diagnosis in LASI-DAD to support the classification of dementia status. Methods: Clinicians were presented with the extensive data collected from LASI-DAD, including sociodemographic information and health history of respondents, results from the screening tests of cognitive status, and information obtained from informant interviews. Based on the Clinical Dementia Rating (CDR) and using an online platform, clinicians individually evaluated each case and then reached a consensus diagnosis. A 2-step procedure was implemented to train several candidate machine learning models, which were evaluated using a separate test set for predictive accuracy measurement, including the area under receiver operating curve (AUROC), accuracy, sensitivity, specificity, precision, F1 score, and kappa statistic. The ultimate model was selected based on overall agreement as measured by kappa. We further examined the overall accuracy and agreement with the final consensus diagnoses between the selected machine learning model and individual clinicians who participated in the clinical consensus diagnostic process. Finally, we applied the selected model to a subgroup of LASI-DAD participants for whom the clinical consensus diagnosis was not obtained to predict their dementia status. Results: Among the 2528 individuals who received clinical consensus diagnosis, 192 (6.7% after adjusting for sampling weight) were diagnosed with dementia. All candidate machine learning models achieved outstanding discriminative ability, as indicated by AUROC >.90, and had similar accuracy and specificity (both around 0.95). The support vector machine model outperformed other models with the highest sensitivity (0.81), F1 score (0.72), and kappa (.70, indicating substantial agreement) and the second highest precision (0.65). As a result, the support vector machine was selected as the ultimate model. Further examination revealed that overall accuracy and agreement were similar between the selected model and individual clinicians. Application of the prediction model on 1568 individuals without clinical consensus diagnosis classified 127 individuals as living with dementia. After applying sampling weight, we can estimate the prevalence of dementia in the population as 7.4%. Conclusions: The selected machine learning model has outstanding discriminative ability and substantial agreement with a clinical consensus diagnosis of dementia. The model can serve as a computer model of the clinical knowledge and experience encoded in the clinical consensus diagnostic process and has many potential applications, including predicting missed dementia diagnoses and serving as a clinical decision support tool or virtual rater to assist diagnosis of dementia. %M 33970122 %R 10.2196/27113 %U https://mental.jmir.org/2021/5/e27113 %U https://doi.org/10.2196/27113 %U http://www.ncbi.nlm.nih.gov/pubmed/33970122 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 5 %P e25304 %T Combining External Medical Knowledge for Improving Obstetric Intelligent Diagnosis: Model Development and Validation %A Zhang,Kunli %A Cai,Linkun %A Song,Yu %A Liu,Tao %A Zhao,Yueshu %+ School of Information Engineering, Zhengzhou University, No 100, Science Avenue, Zhengzhou, 450000, China, 86 137 0084 2398, ieysong@zzu.edu.cn %K intelligent diagnosis %K obstetric electronic medical record %K medical knowledge %K attention mechanism %D 2021 %7 10.5.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Data-driven medical health information processing has become a new development trend in obstetrics. Electronic medical records (EMRs) are the basis of evidence-based medicine and an important information source for intelligent diagnosis. To obtain diagnostic results, doctors combine clinical experience and medical knowledge in their diagnosis process. External medical knowledge provides strong support for diagnosis. Therefore, it is worth studying how to make full use of EMRs and medical knowledge in intelligent diagnosis. Objective: This study aims to improve the performance of intelligent diagnosis in EMRs by combining medical knowledge. Methods: As an EMR usually contains multiple types of diagnostic results, the intelligent diagnosis can be treated as a multilabel classification task. We propose a novel neural network knowledge-aware hierarchical diagnosis model (KHDM) in which Chinese obstetric EMRs and external medical knowledge can be synchronously and effectively used for intelligent diagnostics. In KHDM, EMRs and external knowledge documents are integrated by the attention mechanism contained in the hierarchical deep learning framework. In this way, we enrich the language model with curated knowledge documents, combining the advantages of both to make a knowledge-aware diagnosis. Results: We evaluate our model on a real-world Chinese obstetric EMR dataset and showed that KHDM achieves an accuracy of 0.8929, which exceeds that of the most advanced classification benchmark methods. We also verified the model’s interpretability advantage. Conclusions: In this paper, an improved model combining medical knowledge and an attention mechanism is proposed, based on the problem of diversity of diagnostic results in Chinese EMRs. KHDM can effectively integrate domain knowledge to greatly improve the accuracy of diagnosis. %M 33970113 %R 10.2196/25304 %U https://medinform.jmir.org/2021/5/e25304 %U https://doi.org/10.2196/25304 %U http://www.ncbi.nlm.nih.gov/pubmed/33970113 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 5 %P e27460 %T Medical Specialty Recommendations by an Artificial Intelligence Chatbot on a Smartphone: Development and Deployment %A Lee,Hyeonhoon %A Kang,Jaehyun %A Yeo,Jonghyeon %+ Department of Clinical Korean Medicine, Graduate School, Kyung Hee University, 23 Kyungheedae-ro, Dongdaemun-gu, Seoul, 02447, Republic of Korea, 82 29589207, jackli0373@gmail.com %K artificial intelligence %K chatbot %K COVID-19 %K deep learning %K deployment %K development %K machine learning %K medical specialty %K natural language processing %K recommendation %K smartphone %D 2021 %7 6.5.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: The COVID-19 pandemic has limited daily activities and even contact between patients and primary care providers. This makes it more difficult to provide adequate primary care services, which include connecting patients to an appropriate medical specialist. A smartphone-compatible artificial intelligence (AI) chatbot that classifies patients’ symptoms and recommends the appropriate medical specialty could provide a valuable solution. Objective: In order to establish a contactless method of recommending the appropriate medical specialty, this study aimed to construct a deep learning–based natural language processing (NLP) pipeline and to develop an AI chatbot that can be used on a smartphone. Methods: We collected 118,008 sentences containing information on symptoms with labels (medical specialty), conducted data cleansing, and finally constructed a pipeline of 51,134 sentences for this study. Several deep learning models, including 4 different long short-term memory (LSTM) models with or without attention and with or without a pretrained FastText embedding layer, as well as bidirectional encoder representations from transformers for NLP, were trained and validated using a randomly selected test data set. The performance of the models was evaluated on the basis of the precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). An AI chatbot was also designed to make it easy for patients to use this specialty recommendation system. We used an open-source framework called “Alpha” to develop our AI chatbot. This takes the form of a web-based app with a frontend chat interface capable of conversing in text and a backend cloud-based server application to handle data collection, process the data with a deep learning model, and offer the medical specialty recommendation in a responsive web that is compatible with both desktops and smartphones. Results: The bidirectional encoder representations from transformers model yielded the best performance, with an AUC of 0.964 and F1-score of 0.768, followed by LSTM model with embedding vectors, with an AUC of 0.965 and F1-score of 0.739. Considering the limitations of computing resources and the wide availability of smartphones, the LSTM model with embedding vectors trained on our data set was adopted for our AI chatbot service. We also deployed an Alpha version of the AI chatbot to be executed on both desktops and smartphones. Conclusions: With the increasing need for telemedicine during the current COVID-19 pandemic, an AI chatbot with a deep learning–based NLP model that can recommend a medical specialty to patients through their smartphones would be exceedingly useful. This chatbot allows patients to identify the proper medical specialist in a rapid and contactless manner, based on their symptoms, thus potentially supporting both patients and primary care providers. %M 33882012 %R 10.2196/27460 %U https://www.jmir.org/2021/5/e27460 %U https://doi.org/10.2196/27460 %U http://www.ncbi.nlm.nih.gov/pubmed/33882012 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 5 %P e28413 %T Use of Machine Learning Algorithms to Predict the Understandability of Health Education Materials: Development and Evaluation Study %A Ji,Meng %A Liu,Yanmeng %A Zhao,Mengdan %A Lyu,Ziqing %A Zhang,Boren %A Luo,Xin %A Li,Yanlin %A Zhong,Yin %+ School of Languages and Cultures, University of Sydney, Camperdown, Sydney, NSW2006, Australia, 61 449858887, yanmeng.liu@sydney.edu.au %K machine learning %K PEMAT %K health education %K understandability evaluation %K patient-oriented %D 2021 %7 6.5.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Improving the understandability of health information can significantly increase the cost-effectiveness and efficiency of health education programs for vulnerable populations. There is a pressing need to develop clinically informed computerized tools to enable rapid, reliable assessment of the linguistic understandability of specialized health and medical education resources. This paper fills a critical gap in current patient-oriented health resource development, which requires reliable and accurate evaluation instruments to increase the efficiency and cost-effectiveness of health education resource evaluation. Objective: We aimed to translate internationally endorsed clinical guidelines to machine learning algorithms to facilitate the evaluation of the understandability of health resources for international students at Australian universities. Methods: Based on international patient health resource assessment guidelines, we developed machine learning algorithms to predict the linguistic understandability of health texts for Australian college students (aged 25-30 years) from non-English speaking backgrounds. We compared extreme gradient boosting, random forest, neural networks, and C5.0 decision tree for automated health information understandability evaluation. The 5 machine learning models achieved statistically better results compared to the baseline logistic regression model. We also evaluated the impact of each linguistic feature on the performance of each of the 5 models. Results: We found that information evidentness, relevance to educational purposes, and logical sequence were consistently more important than numeracy skills and medical knowledge when assessing the linguistic understandability of health education resources for international tertiary students with adequate English skills (International English Language Testing System mean score 6.5) and high health literacy (mean 16.5 in the Short Assessment of Health Literacy-English test). Our results challenge the traditional views that lack of medical knowledge and numerical skills constituted the barriers to the understanding of health educational materials. Conclusions: Machine learning algorithms were developed to predict health information understandability for international college students aged 25-30 years. Thirteen natural language features and 5 evaluation dimensions were identified and compared in terms of their impact on the performance of the models. Health information understandability varies according to the demographic profiles of the target readers, and for international tertiary students, improving health information evidentness, relevance, and logic is critical. %M 33955834 %R 10.2196/28413 %U https://medinform.jmir.org/2021/5/e28413 %U https://doi.org/10.2196/28413 %U http://www.ncbi.nlm.nih.gov/pubmed/33955834 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 9 %N 5 %P e22591 %T Acute Exacerbation of a Chronic Obstructive Pulmonary Disease Prediction System Using Wearable Device Data, Machine Learning, and Deep Learning: Development and Cohort Study %A Wu,Chia-Tung %A Li,Guo-Hung %A Huang,Chun-Ta %A Cheng,Yu-Chieh %A Chen,Chi-Hsien %A Chien,Jung-Yien %A Kuo,Ping-Hung %A Kuo,Lu-Cheng %A Lai,Feipei %+ Department of Internal Medicine, National Taiwan University Hospital, College of Medicine, National Taiwan University, No 7 Chung-Shan S Road, Taipei, 100, Taiwan, 886 972651516, jychien@ntu.edu.tw %K chronic obstructive pulmonary disease %K clinical decision support systems %K health risk assessment %K wearable device %D 2021 %7 6.5.2021 %9 Original Paper %J JMIR Mhealth Uhealth %G English %X Background: The World Health Organization has projected that by 2030, chronic obstructive pulmonary disease (COPD) will be the third-leading cause of mortality and the seventh-leading cause of morbidity worldwide. Acute exacerbations of chronic obstructive pulmonary disease (AECOPD) are associated with an accelerated decline in lung function, diminished quality of life, and higher mortality. Accurate early detection of acute exacerbations will enable early management and reduce mortality. Objective: The aim of this study was to develop a prediction system using lifestyle data, environmental factors, and patient symptoms for the early detection of AECOPD in the upcoming 7 days. Methods: This prospective study was performed at National Taiwan University Hospital. Patients with COPD that did not have a pacemaker and were not pregnant were invited for enrollment. Data on lifestyle, temperature, humidity, and fine particulate matter were collected using wearable devices (Fitbit Versa), a home air quality–sensing device (EDIMAX Airbox), and a smartphone app. AECOPD episodes were evaluated via standardized questionnaires. With these input features, we evaluated the prediction performance of machine learning models, including random forest, decision trees, k-nearest neighbor, linear discriminant analysis, and adaptive boosting, and a deep neural network model. Results: The continuous real-time monitoring of lifestyle and indoor environment factors was implemented by integrating home air quality–sensing devices, a smartphone app, and wearable devices. All data from 67 COPD patients were collected prospectively during a mean 4-month follow-up period, resulting in the detection of 25 AECOPD episodes. For 7-day AECOPD prediction, the proposed AECOPD predictive model achieved an accuracy of 92.1%, sensitivity of 94%, and specificity of 90.4%. Receiver operating characteristic curve analysis showed that the area under the curve of the model in predicting AECOPD was greater than 0.9. The most important variables in the model were daily steps walked, stairs climbed, and daily distance moved. Conclusions: Using wearable devices, home air quality–sensing devices, a smartphone app, and supervised prediction algorithms, we achieved excellent power to predict whether a patient would experience AECOPD within the upcoming 7 days. The AECOPD prediction system provided an effective way to collect lifestyle and environmental data, and yielded reliable predictions of future AECOPD events. Compared with previous studies, we have comprehensively improved the performance of the AECOPD prediction model by adding objective lifestyle and environmental data. This model could yield more accurate prediction results for COPD patients than using only questionnaire data. %M 33955840 %R 10.2196/22591 %U https://mhealth.jmir.org/2021/5/e22591 %U https://doi.org/10.2196/22591 %U http://www.ncbi.nlm.nih.gov/pubmed/33955840 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 5 %P e15708 %T Machine Learning and Natural Language Processing in Mental Health: Systematic Review %A Le Glaz,Aziliz %A Haralambous,Yannis %A Kim-Dufor,Deok-Hee %A Lenca,Philippe %A Billot,Romain %A Ryan,Taylor C %A Marsh,Jonathan %A DeVylder,Jordan %A Walter,Michel %A Berrouiguet,Sofian %A Lemey,Christophe %+ URCI Mental Health Department, Brest Medical University Hospital, Route de Ploudalmézeau, Brest, 29200, France, 33 619211032, christophe.lemey@chu-brest.fr %K machine learning %K natural language processing %K artificial intelligence %K data mining %K mental health %K psychiatry %D 2021 %7 4.5.2021 %9 Review %J J Med Internet Res %G English %X Background: Machine learning systems are part of the field of artificial intelligence that automatically learn models from data to make better decisions. Natural language processing (NLP), by using corpora and learning approaches, provides good performance in statistical tasks, such as text classification or sentiment mining. Objective: The primary aim of this systematic review was to summarize and characterize, in methodological and technical terms, studies that used machine learning and NLP techniques for mental health. The secondary aim was to consider the potential use of these methods in mental health clinical practice Methods: This systematic review follows the PRISMA (Preferred Reporting Items for Systematic Review and Meta-analysis) guidelines and is registered with PROSPERO (Prospective Register of Systematic Reviews; number CRD42019107376). The search was conducted using 4 medical databases (PubMed, Scopus, ScienceDirect, and PsycINFO) with the following keywords: machine learning, data mining, psychiatry, mental health, and mental disorder. The exclusion criteria were as follows: languages other than English, anonymization process, case studies, conference papers, and reviews. No limitations on publication dates were imposed. Results: A total of 327 articles were identified, of which 269 (82.3%) were excluded and 58 (17.7%) were included in the review. The results were organized through a qualitative perspective. Although studies had heterogeneous topics and methods, some themes emerged. Population studies could be grouped into 3 categories: patients included in medical databases, patients who came to the emergency room, and social media users. The main objectives were to extract symptoms, classify severity of illness, compare therapy effectiveness, provide psychopathological clues, and challenge the current nosography. Medical records and social media were the 2 major data sources. With regard to the methods used, preprocessing used the standard methods of NLP and unique identifier extraction dedicated to medical texts. Efficient classifiers were preferred rather than transparent functioning classifiers. Python was the most frequently used platform. Conclusions: Machine learning and NLP models have been highly topical issues in medicine in recent years and may be considered a new paradigm in medical research. However, these processes tend to confirm clinical hypotheses rather than developing entirely new information, and only one major category of the population (ie, social media users) is an imprecise cohort. Moreover, some language-specific features can improve the performance of NLP methods, and their extension to other languages should be more closely investigated. However, machine learning and NLP techniques provide useful information from unexplored data (ie, patients’ daily habits that are usually inaccessible to care providers). Before considering It as an additional tool of mental health care, ethical issues remain and should be discussed in a timely manner. Machine learning and NLP methods may offer multiple perspectives in mental health research but should also be considered as tools to support clinical practice. %M 33944788 %R 10.2196/15708 %U https://www.jmir.org/2021/5/e15708 %U https://doi.org/10.2196/15708 %U http://www.ncbi.nlm.nih.gov/pubmed/33944788 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e27341 %T Emotions of COVID-19: Content Analysis of Self-Reported Information Using Artificial Intelligence %A Adikari,Achini %A Nawaratne,Rashmika %A De Silva,Daswin %A Ranasinghe,Sajani %A Alahakoon,Oshadi %A Alahakoon,Damminda %+ Research Centre for Data Analytics and Cognition, La Trobe University, Kingsbury Drive, Melbourne, Australia, 61 394793109, A.Adikari@latrobe.edu.au %K COVID-19 %K pandemic %K lockdown %K human emotions %K affective computing %K human-centric artificial intelligence %K artificial intelligence %K AI %K machine learning %K natural language processing %K language modeling %K infodemiology %K infoveillance %D 2021 %7 30.4.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: The COVID-19 pandemic has disrupted human societies around the world. This public health emergency was followed by a significant loss of human life; the ensuing social restrictions led to loss of employment, lack of interactions, and burgeoning psychological distress. As physical distancing regulations were introduced to manage outbreaks, individuals, groups, and communities engaged extensively on social media to express their thoughts and emotions. This internet-mediated communication of self-reported information encapsulates the emotional health and mental well-being of all individuals impacted by the pandemic. Objective: This research aims to investigate the human emotions related to the COVID-19 pandemic expressed on social media over time, using an artificial intelligence (AI) framework. Methods: Our study explores emotion classifications, intensities, transitions, and profiles, as well as alignment to key themes and topics, across the four stages of the pandemic: declaration of a global health crisis (ie, prepandemic), the first lockdown, easing of restrictions, and the second lockdown. This study employs an AI framework comprised of natural language processing, word embeddings, Markov models, and the growing self-organizing map algorithm, which are collectively used to investigate social media conversations. The investigation was carried out using 73,000 public Twitter conversations posted by users in Australia from January to September 2020. Results: The outcomes of this study enabled us to analyze and visualize different emotions and related concerns that were expressed and reflected on social media during the COVID-19 pandemic, which could be used to gain insights into citizens’ mental health. First, the topic analysis showed the diverse as well as common concerns people had expressed during the four stages of the pandemic. It was noted that personal-level concerns expressed on social media had escalated to broader concerns over time. Second, the emotion intensity and emotion state transitions showed that fear and sadness emotions were more prominently expressed at first; however, emotions transitioned into anger and disgust over time. Negative emotions, except for sadness, were significantly higher (P<.05) in the second lockdown, showing increased frustration. Temporal emotion analysis was conducted by modeling the emotion state changes across the four stages of the pandemic, which demonstrated how different emotions emerged and shifted over time. Third, the concerns expressed by social media users were categorized into profiles, where differences could be seen between the first and second lockdown profiles. Conclusions: This study showed that the diverse emotions and concerns that were expressed and recorded on social media during the COVID-19 pandemic reflected the mental health of the general public. While this study established the use of social media to discover informed insights during a time when physical communication was impossible, the outcomes could also contribute toward postpandemic recovery and understanding psychological impact via emotion changes, and they could potentially inform health care decision making. This study exploited AI and social media to enhance our understanding of human behaviors in global emergencies, which could lead to improved planning and policy making for future crises. %M 33819167 %R 10.2196/27341 %U https://www.jmir.org/2021/4/e27341 %U https://doi.org/10.2196/27341 %U http://www.ncbi.nlm.nih.gov/pubmed/33819167 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 4 %P e27419 %T Returning to a Normal Life via COVID-19 Vaccines in the United States: A Large-scale Agent-Based Simulation Study %A Li,Junjiang %A Giabbanelli,Philippe %+ Department of Computer Science & Software Engineering, Miami University, 205 Benton Hall, Oxford, OH, 45056, United States, 1 513 529 0147, aqualonne@free.fr %K agent-based model %K cloud-based simulations %K COVID-19 %K large-scale simulations %K vaccine %K model %K simulation %K United States %K agent-based %K effective %K willingness %K capacity %K plan %K strategy %K outcome %K interaction %K intervention %K scenario %K impact %D 2021 %7 29.4.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: In 2020, COVID-19 has claimed more than 300,000 deaths in the United States alone. Although nonpharmaceutical interventions were implemented by federal and state governments in the United States, these efforts have failed to contain the virus. Following the Food and Drug Administration's approval of two COVID-19 vaccines, however, the hope for the return to normalcy has been renewed. This hope rests on an unprecedented nationwide vaccine campaign, which faces many logistical challenges and is also contingent on several factors whose values are currently unknown. Objective: We study the effectiveness of a nationwide vaccine campaign in response to different vaccine efficacies, the willingness of the population to be vaccinated, and the daily vaccine capacity under two different federal plans. To characterize the possible outcomes most accurately, we also account for the interactions between nonpharmaceutical interventions and vaccines through 6 scenarios that capture a range of possible impacts from nonpharmaceutical interventions. Methods: We used large-scale, cloud-based, agent-based simulations by implementing the vaccination campaign using COVASIM, an open-source agent-based model for COVID-19 that has been used in several peer-reviewed studies and accounts for individual heterogeneity and a multiplicity of contact networks. Several modifications to the parameters and simulation logic were made to better align the model with current evidence. We chose 6 nonpharmaceutical intervention scenarios and applied the vaccination intervention following both the plan proposed by Operation Warp Speed (former Trump administration) and the plan of one million vaccines per day, proposed by the Biden administration. We accounted for unknowns in vaccine efficacies and levels of population compliance by varying both parameters. For each experiment, the cumulative infection growth was fitted to a logistic growth model, and the carrying capacities and the growth rates were recorded. Results: For both vaccination plans and all nonpharmaceutical intervention scenarios, the presence of the vaccine intervention considerably lowers the total number of infections when life returns to normal, even when the population compliance to vaccines is as low as 20%. We noted an unintended consequence; given the vaccine availability estimates under both federal plans and the focus on vaccinating individuals by age categories, a significant reduction in nonpharmaceutical interventions results in a counterintuitive situation in which higher vaccine compliance then leads to more total infections. Conclusions: Although potent, vaccines alone cannot effectively end the pandemic given the current availability estimates and the adopted vaccination strategy. Nonpharmaceutical interventions need to continue and be enforced to ensure high compliance so that the rate of immunity established by vaccination outpaces that induced by infections. %M 33872188 %R 10.2196/27419 %U https://medinform.jmir.org/2021/4/e27419 %U https://doi.org/10.2196/27419 %U http://www.ncbi.nlm.nih.gov/pubmed/33872188 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 4 %P e21394 %T Application of Artificial Intelligence for Screening COVID-19 Patients Using Digital Images: Meta-analysis %A Poly,Tahmina Nasrin %A Islam,Md Mohaimenul %A Li,Yu-Chuan Jack %A Alsinglawi,Belal %A Hsu,Min-Huei %A Jian,Wen Shan %A Yang,Hsuan-Chia %+ Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, 15 Floor, No. 172-1, Section: 2, Keelung Road, Daan District, Taipei, 106, Taiwan, 886 (02)66382736 ext 1507, itpharmacist@gmail.com %K COVID-19 %K SARS-CoV-2 %K pneumonia %K artificial intelligence %K deep learning %D 2021 %7 29.4.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: The COVID-19 outbreak has spread rapidly and hospitals are overwhelmed with COVID-19 patients. While analysis of nasal and throat swabs from patients is the main way to detect COVID-19, analyzing chest images could offer an alternative method to hospitals, where health care personnel and testing kits are scarce. Deep learning (DL), in particular, has shown impressive levels of performance when analyzing medical images, including those related to COVID-19 pneumonia. Objective: The goal of this study was to perform a systematic review with a meta-analysis of relevant studies to quantify the performance of DL algorithms in the automatic stratification of COVID-19 patients using chest images. Methods: A search strategy for use in PubMed, Scopus, Google Scholar, and Web of Science was developed, where we searched for articles published between January 1 and April 25, 2020. We used the key terms “COVID-19,” or “coronavirus,” or “SARS-CoV-2,” or “novel corona,” or “2019-ncov,” and “deep learning,” or “artificial intelligence,” or “automatic detection.” Two authors independently extracted data on study characteristics, methods, risk of bias, and outcomes. Any disagreement between them was resolved by consensus. Results: A total of 16 studies were included in the meta-analysis, which included 5896 chest images from COVID-19 patients. The pooled sensitivity and specificity of the DL models in detecting COVID-19 were 0.95 (95% CI 0.94-0.95) and 0.96 (95% CI 0.96-0.97), respectively, with an area under the receiver operating characteristic curve of 0.98. The positive likelihood, negative likelihood, and diagnostic odds ratio were 19.02 (95% CI 12.83-28.19), 0.06 (95% CI 0.04-0.10), and 368.07 (95% CI 162.30-834.75), respectively. The pooled sensitivity and specificity for distinguishing other types of pneumonia from COVID-19 were 0.93 (95% CI 0.92-0.94) and 0.95 (95% CI 0.94-0.95), respectively. The performance of radiologists in detecting COVID-19 was lower than that of the DL models; however, the performance of junior radiologists was improved when they used DL-based prediction tools. Conclusions: Our study findings show that DL models have immense potential in accurately stratifying COVID-19 patients and in correctly differentiating them from patients with other types of pneumonia and normal patients. Implementation of DL-based tools can assist radiologists in correctly and quickly detecting COVID-19 and, consequently, in combating the COVID-19 pandemic. %M 33764884 %R 10.2196/21394 %U https://medinform.jmir.org/2021/4/e21394 %U https://doi.org/10.2196/21394 %U http://www.ncbi.nlm.nih.gov/pubmed/33764884 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e26075 %T Predictability of COVID-19 Hospitalizations, Intensive Care Unit Admissions, and Respiratory Assistance in Portugal: Longitudinal Cohort Study %A Patrício,André %A Costa,Rafael S %A Henriques,Rui %+ LAQV-REQUIMTE, NOVA School of Science and Technology, Universidade NOVA de Lisboa, Campus Caparica, 2829-516, Caparica, 2829-516, Portugal, 351 21 294 8351, rs.costa@fct.unl.pt %K COVID-19 %K machine learning %K intensive care admissions %K respiratory assistance %K predictive models %K data modeling %K clinical informatics %D 2021 %7 28.4.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: In the face of the current COVID-19 pandemic, the timely prediction of upcoming medical needs for infected individuals enables better and quicker care provision when necessary and management decisions within health care systems. Objective: This work aims to predict the medical needs (hospitalizations, intensive care unit admissions, and respiratory assistance) and survivability of individuals testing positive for SARS-CoV-2 infection in Portugal. Methods: A retrospective cohort of 38,545 infected individuals during 2020 was used. Predictions of medical needs were performed using state-of-the-art machine learning approaches at various stages of a patient’s cycle, namely, at testing (prehospitalization), at posthospitalization, and during postintensive care. A thorough optimization of state-of-the-art predictors was undertaken to assess the ability to anticipate medical needs and infection outcomes using demographic and comorbidity variables, as well as dates associated with symptom onset, testing, and hospitalization. Results: For the target cohort, 75% of hospitalization needs could be identified at the time of testing for SARS-CoV-2 infection. Over 60% of respiratory needs could be identified at the time of hospitalization. Both predictions had >50% precision. Conclusions: The conducted study pinpoints the relevance of the proposed predictive models as good candidates to support medical decisions in the Portuguese population, including both monitoring and in-hospital care decisions. A clinical decision support system is further provided to this end. %M 33835931 %R 10.2196/26075 %U https://www.jmir.org/2021/4/e26075 %U https://doi.org/10.2196/26075 %U http://www.ncbi.nlm.nih.gov/pubmed/33835931 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e27468 %T Deep Convolutional Neural Network–Based Computer-Aided Detection System for COVID-19 Using Multiple Lung Scans: Design and Implementation Study %A Ghaderzadeh,Mustafa %A Asadi,Farkhondeh %A Jafari,Ramezan %A Bashash,Davood %A Abolghasemi,Hassan %A Aria,Mehrad %+ Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Darband St, Ghods Square, Tehran, Iran, 98 9123187253, Asadifar@sbmu.ac.ir %K artificial intelligence %K classification %K computer-aided detection %K computed tomography scan %K convolutional neural network %K coronavirus %K COVID-19 %K deep learning %K machine learning %K machine vision %K model %K pandemic %D 2021 %7 26.4.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Owing to the COVID-19 pandemic and the imminent collapse of health care systems following the exhaustion of financial, hospital, and medicinal resources, the World Health Organization changed the alert level of the COVID-19 pandemic from high to very high. Meanwhile, more cost-effective and precise COVID-19 detection methods are being preferred worldwide. Objective: Machine vision–based COVID-19 detection methods, especially deep learning as a diagnostic method in the early stages of the pandemic, have been assigned great importance during the pandemic. This study aimed to design a highly efficient computer-aided detection (CAD) system for COVID-19 by using a neural search architecture network (NASNet)–based algorithm. Methods: NASNet, a state-of-the-art pretrained convolutional neural network for image feature extraction, was adopted to identify patients with COVID-19 in their early stages of the disease. A local data set, comprising 10,153 computed tomography scans of 190 patients with and 59 without COVID-19 was used. Results: After fitting on the training data set, hyperparameter tuning, and topological alterations of the classifier block, the proposed NASNet-based model was evaluated on the test data set and yielded remarkable results. The proposed model's performance achieved a detection sensitivity, specificity, and accuracy of 0.999, 0.986, and 0.996, respectively. Conclusions: The proposed model achieved acceptable results in the categorization of 2 data classes. Therefore, a CAD system was designed on the basis of this model for COVID-19 detection using multiple lung computed tomography scans. The system differentiated all COVID-19 cases from non–COVID-19 ones without any error in the application phase. Overall, the proposed deep learning–based CAD system can greatly help radiologists detect COVID-19 in its early stages. During the COVID-19 pandemic, the use of a CAD system as a screening tool would accelerate disease detection and prevent the loss of health care resources. %M 33848973 %R 10.2196/27468 %U https://www.jmir.org/2021/4/e27468 %U https://doi.org/10.2196/27468 %U http://www.ncbi.nlm.nih.gov/pubmed/33848973 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e26628 %T Machine Learning–Based Prediction of Growth in Confirmed COVID-19 Infection Cases in 114 Countries Using Metrics of Nonpharmaceutical Interventions and Cultural Dimensions: Model Development and Validation %A Yeung,Arnold YS %A Roewer-Despres,Francois %A Rosella,Laura %A Rudzicz,Frank %+ Department of Computer Science, University of Toronto, 27 King's College Cir, Toronto, ON, M5S 3H7, Canada, 1 416 978 2011, arnoldyeung@cs.toronto.edu %K COVID-19 %K machine learning %K nonpharmaceutical interventions %K cultural dimensions %K random forest %K AdaBoost %K forecast %K informatics %K epidemiology %K artificial intelligence %D 2021 %7 23.4.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: National governments worldwide have implemented nonpharmaceutical interventions to control the COVID-19 pandemic and mitigate its effects. Objective: The aim of this study was to investigate the prediction of future daily national confirmed COVID-19 infection growth—the percentage change in total cumulative cases—across 14 days for 114 countries using nonpharmaceutical intervention metrics and cultural dimension metrics, which are indicative of specific national sociocultural norms. Methods: We combined the Oxford COVID-19 Government Response Tracker data set, Hofstede cultural dimensions, and daily reported COVID-19 infection case numbers to train and evaluate five non–time series machine learning models in predicting confirmed infection growth. We used three validation methods—in-distribution, out-of-distribution, and country-based cross-validation—for the evaluation, each of which was applicable to a different use case of the models. Results: Our results demonstrate high R2 values between the labels and predictions for the in-distribution method (0.959) and moderate R2 values for the out-of-distribution and country-based cross-validation methods (0.513 and 0.574, respectively) using random forest and adaptive boosting (AdaBoost) regression. Although these models may be used to predict confirmed infection growth, the differing accuracies obtained from the three tasks suggest a strong influence of the use case. Conclusions: This work provides new considerations in using machine learning techniques with nonpharmaceutical interventions and cultural dimensions as metrics to predict the national growth of confirmed COVID-19 infections. %M 33844636 %R 10.2196/26628 %U https://www.jmir.org/2021/4/e26628 %U https://doi.org/10.2196/26628 %U http://www.ncbi.nlm.nih.gov/pubmed/33844636 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 4 %P e25181 %T Machine Learning Models for Image-Based Diagnosis and Prognosis of COVID-19: Systematic Review %A Montazeri,Mahdieh %A ZahediNasab,Roxana %A Farahani,Ali %A Mohseni,Hadis %A Ghasemian,Fahimeh %+ Computer Engineering Department, Faculty of Engineering, Shahid Bahonar University of Kerman, Pajoohesh Sq, PO Box: 76169-14111, Kerman, Iran, 98 9133924837, ghasemianfahime@uk.ac.ir %K machine learning %K diagnosis %K prognosis %K COVID-19 %D 2021 %7 23.4.2021 %9 Review %J JMIR Med Inform %G English %X Background: Accurate and timely diagnosis and effective prognosis of the disease is important to provide the best possible care for patients with COVID-19 and reduce the burden on the health care system. Machine learning methods can play a vital role in the diagnosis of COVID-19 by processing chest x-ray images. Objective: The aim of this study is to summarize information on the use of intelligent models for the diagnosis and prognosis of COVID-19 to help with early and timely diagnosis, minimize prolonged diagnosis, and improve overall health care. Methods: A systematic search of databases, including PubMed, Web of Science, IEEE, ProQuest, Scopus, bioRxiv, and medRxiv, was performed for COVID-19–related studies published up to May 24, 2020. This study was performed in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) guidelines. All original research articles describing the application of image processing for the prediction and diagnosis of COVID-19 were considered in the analysis. Two reviewers independently assessed the published papers to determine eligibility for inclusion in the analysis. Risk of bias was evaluated using the Prediction Model Risk of Bias Assessment Tool. Results: Of the 629 articles retrieved, 44 articles were included. We identified 4 prognosis models for calculating prediction of disease severity and estimation of confinement time for individual patients, and 40 diagnostic models for detecting COVID-19 from normal or other pneumonias. Most included studies used deep learning methods based on convolutional neural networks, which have been widely used as a classification algorithm. The most frequently reported predictors of prognosis in patients with COVID-19 included age, computed tomography data, gender, comorbidities, symptoms, and laboratory findings. Deep convolutional neural networks obtained better results compared with non–neural network–based methods. Moreover, all of the models were found to be at high risk of bias due to the lack of information about the study population, intended groups, and inappropriate reporting. Conclusions: Machine learning models used for the diagnosis and prognosis of COVID-19 showed excellent discriminative performance. However, these models were at high risk of bias, because of various reasons such as inadequate information about study participants, randomization process, and the lack of external validation, which may have resulted in the optimistic reporting of these models. Hence, our findings do not recommend any of the current models to be used in practice for the diagnosis and prognosis of COVID-19. %M 33735095 %R 10.2196/25181 %U https://medinform.jmir.org/2021/4/e25181 %U https://doi.org/10.2196/25181 %U http://www.ncbi.nlm.nih.gov/pubmed/33735095 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e25759 %T Role of Artificial Intelligence Applications in Real-Life Clinical Practice: Systematic Review %A Yin,Jiamin %A Ngiam,Kee Yuan %A Teo,Hock Hai %+ Department of Information Systems and Analytics, School of Computing, National University of Singapore, 13 Computing Drive, NUS, Singapore, 117417, Singapore, 65 65162979, teohh@comp.nus.edu.sg %K artificial intelligence %K machine learning %K deep learning %K system implementation %K clinical practice %K review %D 2021 %7 22.4.2021 %9 Review %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) applications are growing at an unprecedented pace in health care, including disease diagnosis, triage or screening, risk analysis, surgical operations, and so forth. Despite a great deal of research in the development and validation of health care AI, only few applications have been actually implemented at the frontlines of clinical practice. Objective: The objective of this study was to systematically review AI applications that have been implemented in real-life clinical practice. Methods: We conducted a literature search in PubMed, Embase, Cochrane Central, and CINAHL to identify relevant articles published between January 2010 and May 2020. We also hand searched premier computer science journals and conferences as well as registered clinical trials. Studies were included if they reported AI applications that had been implemented in real-world clinical settings. Results: We identified 51 relevant studies that reported the implementation and evaluation of AI applications in clinical practice, of which 13 adopted a randomized controlled trial design and eight adopted an experimental design. The AI applications targeted various clinical tasks, such as screening or triage (n=16), disease diagnosis (n=16), risk analysis (n=14), and treatment (n=7). The most commonly addressed diseases and conditions were sepsis (n=6), breast cancer (n=5), diabetic retinopathy (n=4), and polyp and adenoma (n=4). Regarding the evaluation outcomes, we found that 26 studies examined the performance of AI applications in clinical settings, 33 studies examined the effect of AI applications on clinician outcomes, 14 studies examined the effect on patient outcomes, and one study examined the economic impact associated with AI implementation. Conclusions: This review indicates that research on the clinical implementation of AI applications is still at an early stage despite the great potential. More research needs to assess the benefits and challenges associated with clinical AI applications through a more rigorous methodology. %M 33885365 %R 10.2196/25759 %U https://www.jmir.org/2021/4/e25759 %U https://doi.org/10.2196/25759 %U http://www.ncbi.nlm.nih.gov/pubmed/33885365 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e27060 %T Prediction and Feature Importance Analysis for Severity of COVID-19 in South Korea Using Artificial Intelligence: Model Development and Validation %A Chung,Heewon %A Ko,Hoon %A Kang,Wu Seong %A Kim,Kyung Won %A Lee,Hooseok %A Park,Chul %A Song,Hyun-Ok %A Choi,Tae-Young %A Seo,Jae Ho %A Lee,Jinseok %+ Department of Artificial Intelligence, The Catholic University of Korea, 43 Jibong-ro, Bucheon, 14662, Republic of Korea, 82 2 2164 5523, gonasago@catholic.ac.kr %K COVID-19 %K artificial intelligence %K blood samples %K mortality prediction %D 2021 %7 19.4.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: The number of deaths from COVID-19 continues to surge worldwide. In particular, if a patient’s condition is sufficiently severe to require invasive ventilation, it is more likely to lead to death than to recovery. Objective: The goal of our study was to analyze the factors related to COVID-19 severity in patients and to develop an artificial intelligence (AI) model to predict the severity of COVID-19 at an early stage. Methods: We developed an AI model that predicts severity based on data from 5601 COVID-19 patients from all national and regional hospitals across South Korea as of April 2020. The clinical severity of COVID-19 was divided into two categories: low and high severity. The condition of patients in the low-severity group corresponded to no limit of activity, oxygen support with nasal prong or facial mask, and noninvasive ventilation. The condition of patients in the high-severity group corresponded to invasive ventilation, multi-organ failure with extracorporeal membrane oxygenation required, and death. For the AI model input, we used 37 variables from the medical records, including basic patient information, a physical index, initial examination findings, clinical findings, comorbid diseases, and general blood test results at an early stage. Feature importance analysis was performed with AdaBoost, random forest, and eXtreme Gradient Boosting (XGBoost); the AI model for predicting COVID-19 severity among patients was developed with a 5-layer deep neural network (DNN) with the 20 most important features, which were selected based on ranked feature importance analysis of 37 features from the comprehensive data set. The selection procedure was performed using sensitivity, specificity, accuracy, balanced accuracy, and area under the curve (AUC). Results: We found that age was the most important factor for predicting disease severity, followed by lymphocyte level, platelet count, and shortness of breath or dyspnea. Our proposed 5-layer DNN with the 20 most important features provided high sensitivity (90.2%), specificity (90.4%), accuracy (90.4%), balanced accuracy (90.3%), and AUC (0.96). Conclusions: Our proposed AI model with the selected features was able to predict the severity of COVID-19 accurately. We also made a web application so that anyone can access the model. We believe that sharing the AI model with the public will be helpful in validating and improving its performance. %M 33764883 %R 10.2196/27060 %U https://www.jmir.org/2021/4/e27060 %U https://doi.org/10.2196/27060 %U http://www.ncbi.nlm.nih.gov/pubmed/33764883 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e24996 %T Machine Learning–Driven Models to Predict Prognostic Outcomes in Patients Hospitalized With Heart Failure Using Electronic Health Records: Retrospective Study %A Lv,Haichen %A Yang,Xiaolei %A Wang,Bingyi %A Wang,Shaobo %A Du,Xiaoyan %A Tan,Qian %A Hao,Zhujing %A Liu,Ying %A Yan,Jun %A Xia,Yunlong %+ Department of Cardiology, The First Affiliated Hospital of Dalian Medical University, 193 Lianhe Road, Shahekou District, Dalian, 116014, China, 86 18098875555, yunlongxia01@163.com %K heart failure %K machine learning %K predictive modeling %K mortality %K positive inotropic agents %K readmission %D 2021 %7 19.4.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: With the prevalence of cardiovascular diseases increasing worldwide, early prediction and accurate assessment of heart failure (HF) risk are crucial to meet the clinical demand. Objective: Our study objective was to develop machine learning (ML) models based on real-world electronic health records to predict 1-year in-hospital mortality, use of positive inotropic agents, and 1-year all-cause readmission rate. Methods: For this single-center study, we recruited patients with newly diagnosed HF hospitalized between December 2010 and August 2018 at the First Affiliated Hospital of Dalian Medical University (Liaoning Province, China). The models were constructed for a population set (90:10 split of data set into training and test sets) using 79 variables during the first hospitalization. Logistic regression, support vector machine, artificial neural network, random forest, and extreme gradient boosting models were investigated for outcome predictions. Results: Of the 13,602 patients with HF enrolled in the study, 537 (3.95%) died within 1 year and 2779 patients (20.43%) had a history of use of positive inotropic agents. ML algorithms improved the performance of predictive models for 1-year in-hospital mortality (areas under the curve [AUCs] 0.92-1.00), use of positive inotropic medication (AUCs 0.85-0.96), and 1-year readmission rates (AUCs 0.63-0.96). A decision tree of mortality risk was created and stratified by single variables at levels of high-sensitivity cardiac troponin I (<0.068 μg/L), followed by percentage of lymphocytes (<14.688%) and neutrophil count (4.870×109/L). Conclusions: ML techniques based on a large scale of clinical variables can improve outcome predictions for patients with HF. The mortality decision tree may contribute to guiding better clinical risk assessment and decision making. %M 33871375 %R 10.2196/24996 %U https://www.jmir.org/2021/4/e24996 %U https://doi.org/10.2196/24996 %U http://www.ncbi.nlm.nih.gov/pubmed/33871375 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 4 %P e25347 %T Reliable Deep Learning–Based Detection of Misplaced Chest Electrodes During Electrocardiogram Recording: Algorithm Development and Validation %A Rjoob,Khaled %A Bond,Raymond %A Finlay,Dewar %A McGilligan,Victoria %A J Leslie,Stephen %A Rababah,Ali %A Iftikhar,Aleeha %A Guldenring,Daniel %A Knoery,Charles %A McShane,Anne %A Peace,Aaron %+ Faculty of Computing, Engineering & Built Environment, Ulster University, Shore Road, Jordanstown, BT37 0QB, United Kingdom, 44 07904392923, rjoob-k@ulster.ac.uk %K deep learning %K ECG interpretation %K electrode misplacement %K feature engineering %K machine learning %K ECG %K engineering %K cardiovascular disease %K myocardial infarction %K myocardial %K physicians %D 2021 %7 16.4.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: A 12-lead electrocardiogram (ECG) is the most commonly used method to diagnose patients with cardiovascular diseases. However, there are a number of possible misinterpretations of the ECG that can be caused by several different factors, such as the misplacement of chest electrodes. Objective: The aim of this study is to build advanced algorithms to detect precordial (chest) electrode misplacement. Methods: In this study, we used traditional machine learning (ML) and deep learning (DL) to autodetect the misplacement of electrodes V1 and V2 using features from the resultant ECG. The algorithms were trained using data extracted from high-resolution body surface potential maps of patients who were diagnosed with myocardial infarction, diagnosed with left ventricular hypertrophy, or a normal ECG. Results: DL achieved the highest accuracy in this study for detecting V1 and V2 electrode misplacement, with an accuracy of 93.0% (95% CI 91.46-94.53) for misplacement in the second intercostal space. The performance of DL in the second intercostal space was benchmarked with physicians (n=11 and age 47.3 years, SD 15.5) who were experienced in reading ECGs (mean number of ECGs read in the past year 436.54, SD 397.9). Physicians were poor at recognizing chest electrode misplacement on the ECG and achieved a mean accuracy of 60% (95% CI 56.09-63.90), which was significantly poorer than that of DL (P<.001). Conclusions: DL provides the best performance for detecting chest electrode misplacement when compared with the ability of experienced physicians. DL and ML could be used to help flag ECGs that have been incorrectly recorded and flag that the data may be flawed, which could reduce the number of erroneous diagnoses. %M 33861205 %R 10.2196/25347 %U https://medinform.jmir.org/2021/4/e25347 %U https://doi.org/10.2196/25347 %U http://www.ncbi.nlm.nih.gov/pubmed/33861205 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e22796 %T Forecasting Future Asthma Hospital Encounters of Patients With Asthma in an Academic Health Care System: Predictive Model Development and Secondary Analysis Study %A Tong,Yao %A Messinger,Amanda I %A Wilcox,Adam B %A Mooney,Sean D %A Davidson,Giana H %A Suri,Pradeep %A Luo,Gang %+ Department of Biomedical Informatics and Medical Education, University of Washington, UW Medicine South Lake Union, 850 Republican Street, Building C, Box 358047, Seattle, WA, 98109, United States, 1 206 221 4596, gangluo@cs.wisc.edu %K asthma %K forecasting %K machine learning %K patient care management %K risk factors %D 2021 %7 16.4.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Asthma affects a large proportion of the population and leads to many hospital encounters involving both hospitalizations and emergency department visits every year. To lower the number of such encounters, many health care systems and health plans deploy predictive models to prospectively identify patients at high risk and offer them care management services for preventive care. However, the previous models do not have sufficient accuracy for serving this purpose well. Embracing the modeling strategy of examining many candidate features, we built a new machine learning model to forecast future asthma hospital encounters of patients with asthma at Intermountain Healthcare, a nonacademic health care system. This model is more accurate than the previously published models. However, it is unclear how well our modeling strategy generalizes to academic health care systems, whose patient composition differs from that of Intermountain Healthcare. Objective: This study aims to evaluate the generalizability of our modeling strategy to the University of Washington Medicine (UWM), an academic health care system. Methods: All adult patients with asthma who visited UWM facilities between 2011 and 2018 served as the patient cohort. We considered 234 candidate features. Through a secondary analysis of 82,888 UWM data instances from 2011 to 2018, we built a machine learning model to forecast asthma hospital encounters of patients with asthma in the subsequent 12 months. Results: Our UWM model yielded an area under the receiver operating characteristic curve (AUC) of 0.902. When placing the cutoff point for making binary classification at the top 10% (1464/14,644) of patients with asthma with the largest forecasted risk, our UWM model yielded an accuracy of 90.6% (13,268/14,644), a sensitivity of 70.2% (153/218), and a specificity of 90.91% (13,115/14,426). Conclusions: Our modeling strategy showed excellent generalizability to the UWM, leading to a model with an AUC that is higher than all of the AUCs previously reported in the literature for forecasting asthma hospital encounters. After further optimization, our model could be used to facilitate the efficient and effective allocation of asthma care management resources to improve outcomes. International Registered Report Identifier (IRRID): RR2-10.2196/resprot.5039 %M 33861206 %R 10.2196/22796 %U https://www.jmir.org/2021/4/e22796 %U https://doi.org/10.2196/22796 %U http://www.ncbi.nlm.nih.gov/pubmed/33861206 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e24120 %T Real-Time Clinical Decision Support Based on Recurrent Neural Networks for In-Hospital Acute Kidney Injury: External Validation and Model Interpretation %A Kim,Kipyo %A Yang,Hyeonsik %A Yi,Jinyeong %A Son,Hyung-Eun %A Ryu,Ji-Young %A Kim,Yong Chul %A Jeong,Jong Cheol %A Chin,Ho Jun %A Na,Ki Young %A Chae,Dong-Wan %A Han,Seung Seok %A Kim,Sejoong %+ Department of Internal Medicine, Seoul National University Bundang Hospital, 82 Gumi-ro 173-beon-gil Bundang-gu, Seongnam, 13620, Republic of Korea, 82 31 787 7051, sejoong@snubh.org %K acute kidney injury %K recurrent neural network %K prediction model %K external validation %K internal validation %K kidney %K neural networks %D 2021 %7 16.4.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Acute kidney injury (AKI) is commonly encountered in clinical practice and is associated with poor patient outcomes and increased health care costs. Despite it posing significant challenges for clinicians, effective measures for AKI prediction and prevention are lacking. Previously published AKI prediction models mostly have a simple design without external validation. Furthermore, little is known about the process of linking model output and clinical decisions due to the black-box nature of neural network models. Objective: We aimed to present an externally validated recurrent neural network (RNN)–based continuous prediction model for in-hospital AKI and show applicable model interpretations in relation to clinical decision support. Methods: Study populations were all patients aged 18 years or older who were hospitalized for more than 48 hours between 2013 and 2017 in 2 tertiary hospitals in Korea (Seoul National University Bundang Hospital and Seoul National University Hospital). All demographic data, laboratory values, vital signs, and clinical conditions of patients were obtained from electronic health records of each hospital. We developed 2-stage hierarchical prediction models (model 1 and model 2) using RNN algorithms. The outcome variable for model 1 was the occurrence of AKI within 7 days from the present. Model 2 predicted the future trajectory of creatinine values up to 72 hours. The performance of each developed model was evaluated using the internal and external validation data sets. For the explainability of our models, different model-agnostic interpretation methods were used, including Shapley Additive Explanations, partial dependence plots, individual conditional expectation, and accumulated local effects plots. Results: We included 69,081 patients in the training, 7675 in the internal validation, and 72,352 in the external validation cohorts for model development after excluding cases with missing data and those with an estimated glomerular filtration rate less than 15 mL/min/1.73 m2 or end-stage kidney disease. Model 1 predicted any AKI development with an area under the receiver operating characteristic curve (AUC) of 0.88 (internal validation) and 0.84 (external validation), and stage 2 or higher AKI development with an AUC of 0.93 (internal validation) and 0.90 (external validation). Model 2 predicted the future creatinine values within 3 days with mean-squared errors of 0.04-0.09 for patients with higher risks of AKI and 0.03-0.08 for those with lower risks. Based on the developed models, we showed AKI probability according to feature values in total patients and each individual with partial dependence, accumulated local effects, and individual conditional expectation plots. We also estimated the effects of feature modifications such as nephrotoxic drug discontinuation on future creatinine levels. Conclusions: We developed and externally validated a continuous AKI prediction model using RNN algorithms. Our model could provide real-time assessment of future AKI occurrences and individualized risk factors for AKI in general inpatient cohorts; thus, we suggest approaches to support clinical decisions based on prediction models for in-hospital AKI. %M 33861200 %R 10.2196/24120 %U https://www.jmir.org/2021/4/e24120 %U https://doi.org/10.2196/24120 %U http://www.ncbi.nlm.nih.gov/pubmed/33861200 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e24153 %T Generalizability of an Automatic Explanation Method for Machine Learning Prediction Results on Asthma-Related Hospital Visits in Patients With Asthma: Quantitative Analysis %A Luo,Gang %A Nau,Claudia L %A Crawford,William W %A Schatz,Michael %A Zeiger,Robert S %A Koebnick,Corinna %+ Department of Biomedical Informatics and Medical Education, University of Washington, UW Medicine South Lake Union, 850 Republican Street,, Box 358047, Building C, Seattle, WA, 98195, United States, 1 2062214596, gangluo@cs.wisc.edu %K asthma %K forecasting %K patient care management %K machine learning %D 2021 %7 15.4.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Asthma exerts a substantial burden on patients and health care systems. To facilitate preventive care for asthma management and improve patient outcomes, we recently developed two machine learning models, one on Intermountain Healthcare data and the other on Kaiser Permanente Southern California (KPSC) data, to forecast asthma-related hospital visits, including emergency department visits and hospitalizations, in the succeeding 12 months among patients with asthma. As is typical for machine learning approaches, these two models do not explain their forecasting results. To address the interpretability issue of black-box models, we designed an automatic method to offer rule format explanations for the forecasting results of any machine learning model on imbalanced tabular data and to suggest customized interventions with no accuracy loss. Our method worked well for explaining the forecasting results of our Intermountain Healthcare model, but its generalizability to other health care systems remains unknown. Objective: The objective of this study is to evaluate the generalizability of our automatic explanation method to KPSC for forecasting asthma-related hospital visits. Methods: Through a secondary analysis of 987,506 data instances from 2012 to 2017 at KPSC, we used our method to explain the forecasting results of our KPSC model and to suggest customized interventions. The patient cohort covered a random sample of 70% of patients with asthma who had a KPSC health plan for any period between 2015 and 2018. Results: Our method explained the forecasting results for 97.57% (2204/2259) of the patients with asthma who were correctly forecasted to undergo asthma-related hospital visits in the succeeding 12 months. Conclusions: For forecasting asthma-related hospital visits, our automatic explanation method exhibited an acceptable generalizability to KPSC. International Registered Report Identifier (IRRID): RR2-10.2196/resprot.5039 %M 33856359 %R 10.2196/24153 %U https://www.jmir.org/2021/4/e24153 %U https://doi.org/10.2196/24153 %U http://www.ncbi.nlm.nih.gov/pubmed/33856359 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e25053 %T Establishing Machine Learning Models to Predict Curative Resection in Early Gastric Cancer with Undifferentiated Histology: Development and Usability Study %A Bang,Chang Seok %A Ahn,Ji Yong %A Kim,Jie-Hyun %A Kim,Young-Il %A Choi,Il Ju %A Shin,Woon Geon %+ Department of Internal Medicine, Hallym University College of Medicine, Sakju-ro 77, Gangwon-do, Chuncheon, 24253, Republic of Korea, 82 33 240 5821, csbang@hallym.ac.kr %K early gastric cancer %K artificial intelligence %K machine learning %K endoscopic submucosal dissection %K undifferentiated %K gastric cancer %K endoscopy %K dissection %D 2021 %7 15.4.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Undifferentiated type of early gastric cancer (U-EGC) is included among the expanded indications of endoscopic submucosal dissection (ESD); however, the rate of curative resection remains unsatisfactory. Endoscopists predict the probability of curative resection by considering the size and shape of the lesion and whether ulcers are present or not. The location of the lesion, indicating the likely technical difficulty, is also considered. Objective: The aim of this study was to establish machine learning (ML) models to better predict the possibility of curative resection in U-EGC prior to ESD. Methods: A nationwide cohort of 2703 U-EGCs treated by ESD or surgery were adopted for the training and internal validation cohorts. Separately, an independent data set of the Korean ESD registry (n=275) and an Asan medical center data set (n=127) treated by ESD were chosen for external validation. Eighteen ML classifiers were selected to establish prediction models of curative resection with the following variables: age; sex; location, size, and shape of the lesion; and whether ulcers were present or not. Results: Among the 18 models, the extreme gradient boosting classifier showed the best performance (internal validation accuracy 93.4%, 95% CI 90.4%-96.4%; precision 92.6%, 95% CI 89.5%-95.7%; recall 99.0%, 95% CI 97.8%-99.9%; and F1 score 95.7%, 95% CI 93.3%-98.1%). Attempts at external validation showed substantial accuracy (first external validation 81.5%, 95% CI 76.9%-86.1% and second external validation 89.8%, 95% CI 84.5%-95.1%). Lesion size was the most important feature in each explainable artificial intelligence analysis. Conclusions: We established an ML model capable of accurately predicting the curative resection of U-EGC before ESD by considering the morphological and ecological characteristics of the lesions. %M 33856358 %R 10.2196/25053 %U https://www.jmir.org/2021/4/e25053 %U https://doi.org/10.2196/25053 %U http://www.ncbi.nlm.nih.gov/pubmed/33856358 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e25167 %T Use of Endoscopic Images in the Prediction of Submucosal Invasion of Gastric Neoplasms: Automated Deep Learning Model Development and Usability Study %A Bang,Chang Seok %A Lim,Hyun %A Jeong,Hae Min %A Hwang,Sung Hyeon %+ Department of Internal Medicine, Hallym University College of Medicine, Sakju-ro 77, Chuncheon, 24253, Republic of Korea, 82 332405821, csbang@hallym.ac.kr %K convolutional neural network %K deep learning %K automated deep learning %K endoscopy %K gastric neoplasms %K neural network %K deep learning model %K artificial intelligence %D 2021 %7 15.4.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: In a previous study, we examined the use of deep learning models to classify the invasion depth (mucosa-confined versus submucosa-invaded) of gastric neoplasms using endoscopic images. The external test accuracy reached 77.3%. However, model establishment is labor intense, requiring high performance. Automated deep learning (AutoDL) models, which enable fast searching of optimal neural architectures and hyperparameters without complex coding, have been developed. Objective: The objective of this study was to establish AutoDL models to classify the invasion depth of gastric neoplasms. Additionally, endoscopist–artificial intelligence interactions were explored. Methods: The same 2899 endoscopic images that were employed to establish the previous model were used. A prospective multicenter validation using 206 and 1597 novel images was conducted. The primary outcome was external test accuracy. Neuro-T, Create ML Image Classifier, and AutoML Vision were used in establishing the models. Three doctors with different levels of endoscopy expertise were asked to classify the invasion depth of gastric neoplasms for each image without AutoDL support, with faulty AutoDL support, and with best performance AutoDL support in sequence. Results: The Neuro-T–based model reached 89.3% (95% CI 85.1%-93.5%) external test accuracy. For the model establishment time, Create ML Image Classifier showed the fastest time of 13 minutes while reaching 82.0% (95% CI 76.8%-87.2%) external test accuracy. While the expert endoscopist's decisions were not influenced by AutoDL, the faulty AutoDL misled the endoscopy trainee and the general physician. However, this was corrected by the support of the best performance AutoDL model. The trainee gained the most benefit from the AutoDL support. Conclusions: AutoDL is deemed useful for the on-site establishment of customized deep learning models. An inexperienced endoscopist with at least a certain level of expertise can benefit from AutoDL support. %M 33856356 %R 10.2196/25167 %U https://www.jmir.org/2021/4/e25167 %U https://doi.org/10.2196/25167 %U http://www.ncbi.nlm.nih.gov/pubmed/33856356 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e26211 %T Machine Learning Applied to Clinical Laboratory Data in Spain for COVID-19 Outcome Prediction: Model Development and Validation %A Domínguez-Olmedo,Juan L %A Gragera-Martínez,Álvaro %A Mata,Jacinto %A Pachón Álvarez,Victoria %+ Higher Technical School of Engineering, University of Huelva, Fuerzas Armadas Ave, Huelva, 21007, Spain, 34 959217371, juan.dominguez@dti.uhu.es %K COVID-19 %K electronic health record %K machine learning %K mortality %K prediction %D 2021 %7 14.4.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: The COVID-19 pandemic is probably the greatest health catastrophe of the modern era. Spain’s health care system has been exposed to uncontrollable numbers of patients over a short period, causing the system to collapse. Given that diagnosis is not immediate, and there is no effective treatment for COVID-19, other tools have had to be developed to identify patients at the risk of severe disease complications and thus optimize material and human resources in health care. There are no tools to identify patients who have a worse prognosis than others. Objective: This study aimed to process a sample of electronic health records of patients with COVID-19 in order to develop a machine learning model to predict the severity of infection and mortality from among clinical laboratory parameters. Early patient classification can help optimize material and human resources, and analysis of the most important features of the model could provide more detailed insights into the disease. Methods: After an initial performance evaluation based on a comparison with several other well-known methods, the extreme gradient boosting algorithm was selected as the predictive method for this study. In addition, Shapley Additive Explanations was used to analyze the importance of the features of the resulting model. Results: After data preprocessing, 1823 confirmed patients with COVID-19 and 32 predictor features were selected. On bootstrap validation, the extreme gradient boosting classifier yielded a value of 0.97 (95% CI 0.96-0.98) for the area under the receiver operator characteristic curve, 0.86 (95% CI 0.80-0.91) for the area under the precision-recall curve, 0.94 (95% CI 0.92-0.95) for accuracy, 0.77 (95% CI 0.72-0.83) for the F-score, 0.93 (95% CI 0.89-0.98) for sensitivity, and 0.91 (95% CI 0.86-0.96) for specificity. The 4 most relevant features for model prediction were lactate dehydrogenase activity, C-reactive protein levels, neutrophil counts, and urea levels. Conclusions: Our predictive model yielded excellent results in the differentiating among patients who died of COVID-19, primarily from among laboratory parameter values. Analysis of the resulting model identified a set of features with the most significant impact on the prediction, thus relating them to a higher risk of mortality. %M 33793407 %R 10.2196/26211 %U https://www.jmir.org/2021/4/e26211 %U https://doi.org/10.2196/26211 %U http://www.ncbi.nlm.nih.gov/pubmed/33793407 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 4 %P e17503 %T A User-Centered Chatbot (Wakamola) to Collect Linked Data in Population Networks to Support Studies of Overweight and Obesity Causes: Design and Pilot Study %A Asensio-Cuesta,Sabina %A Blanes-Selva,Vicent %A Conejero,J Alberto %A Frigola,Ana %A Portolés,Manuel G %A Merino-Torres,Juan Francisco %A Rubio Almanza,Matilde %A Syed-Abdul,Shabbir %A Li,Yu-Chuan (Jack) %A Vilar-Mateo,Ruth %A Fernandez-Luque,Luis %A García-Gómez,Juan M %+ Instituto de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Camino de Vera s/n, Valencia, 46022, Spain, 34 96 387 70 07 ext 71846, sasensio@dpi.upv.es %K mHealth %K obesity %K overweight %K chatbot %K assessment %K public health %K Telegram %K user-centered design %K Social Network Analysis %D 2021 %7 14.4.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Obesity and overweight are a serious health problem worldwide with multiple and connected causes. Simultaneously, chatbots are becoming increasingly popular as a way to interact with users in mobile health apps. Objective: This study reports the user-centered design and feasibility study of a chatbot to collect linked data to support the study of individual and social overweight and obesity causes in populations. Methods: We first studied the users’ needs and gathered users’ graphical preferences through an open survey on 52 wireframes designed by 150 design students; it also included questions about sociodemographics, diet and activity habits, the need for overweight and obesity apps, and desired functionality. We also interviewed an expert panel. We then designed and developed a chatbot. Finally, we conducted a pilot study to test feasibility. Results: We collected 452 answers to the survey and interviewed 4 specialists. Based on this research, we developed a Telegram chatbot named Wakamola structured in six sections: personal, diet, physical activity, social network, user's status score, and project information. We defined a user's status score as a normalized sum (0-100) of scores about diet (frequency of eating 50 foods), physical activity, BMI, and social network. We performed a pilot to evaluate the chatbot implementation among 85 healthy volunteers. Of 74 participants who completed all sections, we found 8 underweight people (11%), 5 overweight people (7%), and no obesity cases. The mean BMI was 21.4 kg/m2 (normal weight). The most consumed foods were olive oil, milk and derivatives, cereals, vegetables, and fruits. People walked 10 minutes on 5.8 days per week, slept 7.02 hours per day, and were sitting 30.57 hours per week. Moreover, we were able to create a social network with 74 users, 178 relations, and 12 communities. Conclusions: The Telegram chatbot Wakamola is a feasible tool to collect data from a population about sociodemographics, diet patterns, physical activity, BMI, and specific diseases. Besides, the chatbot allows the connection of users in a social network to study overweight and obesity causes from both individual and social perspectives. %M 33851934 %R 10.2196/17503 %U https://medinform.jmir.org/2021/4/e17503 %U https://doi.org/10.2196/17503 %U http://www.ncbi.nlm.nih.gov/pubmed/33851934 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e27275 %T Impact of Big Data Analytics on People’s Health: Overview of Systematic Reviews and Recommendations for Future Studies %A Borges do Nascimento,Israel Júnior %A Marcolino,Milena Soriano %A Abdulazeem,Hebatullah Mohamed %A Weerasekara,Ishanka %A Azzopardi-Muscat,Natasha %A Gonçalves,Marcos André %A Novillo-Ortiz,David %+ Division of Country Health Policies and Systems, World Health Organization, Regional Office for Europe, Marmorej 51, Copenhagen, 2100, Denmark, 45 61614868, dnovillo@who.int %K public health %K big data %K health status %K evidence-based medicine %K big data analytics %K secondary data analysis %K machine learning %K systematic review %K overview %K World Health Organization %D 2021 %7 13.4.2021 %9 Review %J J Med Internet Res %G English %X Background: Although the potential of big data analytics for health care is well recognized, evidence is lacking on its effects on public health. Objective: The aim of this study was to assess the impact of the use of big data analytics on people’s health based on the health indicators and core priorities in the World Health Organization (WHO) General Programme of Work 2019/2023 and the European Programme of Work (EPW), approved and adopted by its Member States, in addition to SARS-CoV-2–related studies. Furthermore, we sought to identify the most relevant challenges and opportunities of these tools with respect to people’s health. Methods: Six databases (MEDLINE, Embase, Cochrane Database of Systematic Reviews via Cochrane Library, Web of Science, Scopus, and Epistemonikos) were searched from the inception date to September 21, 2020. Systematic reviews assessing the effects of big data analytics on health indicators were included. Two authors independently performed screening, selection, data extraction, and quality assessment using the AMSTAR-2 (A Measurement Tool to Assess Systematic Reviews 2) checklist. Results: The literature search initially yielded 185 records, 35 of which met the inclusion criteria, involving more than 5,000,000 patients. Most of the included studies used patient data collected from electronic health records, hospital information systems, private patient databases, and imaging datasets, and involved the use of big data analytics for noncommunicable diseases. “Probability of dying from any of cardiovascular, cancer, diabetes or chronic renal disease” and “suicide mortality rate” were the most commonly assessed health indicators and core priorities within the WHO General Programme of Work 2019/2023 and the EPW 2020/2025. Big data analytics have shown moderate to high accuracy for the diagnosis and prediction of complications of diabetes mellitus as well as for the diagnosis and classification of mental disorders; prediction of suicide attempts and behaviors; and the diagnosis, treatment, and prediction of important clinical outcomes of several chronic diseases. Confidence in the results was rated as “critically low” for 25 reviews, as “low” for 7 reviews, and as “moderate” for 3 reviews. The most frequently identified challenges were establishment of a well-designed and structured data source, and a secure, transparent, and standardized database for patient data. Conclusions: Although the overall quality of included studies was limited, big data analytics has shown moderate to high accuracy for the diagnosis of certain diseases, improvement in managing chronic diseases, and support for prompt and real-time analyses of large sets of varied input data to diagnose and predict disease outcomes. Trial Registration: International Prospective Register of Systematic Reviews (PROSPERO) CRD42020214048; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=214048 %M 33847586 %R 10.2196/27275 %U https://www.jmir.org/2021/4/e27275 %U https://doi.org/10.2196/27275 %U http://www.ncbi.nlm.nih.gov/pubmed/33847586 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 4 %P e25884 %T Machine Learning Approach to Predicting COVID-19 Disease Severity Based on Clinical Blood Test Data: Statistical Analysis and Model Development %A Aktar,Sakifa %A Ahamad,Md Martuza %A Rashed-Al-Mahfuz,Md %A Azad,AKM %A Uddin,Shahadat %A Kamal,AHM %A Alyami,Salem A %A Lin,Ping-I %A Islam,Sheikh Mohammed Shariful %A Quinn,Julian MW %A Eapen,Valsamma %A Moni,Mohammad Ali %+ WHO Collaborating Centre on eHealth, UNSW Digital Health, School of Public Health and Community Medicine, Faculty of Medicine, University of New South Wales, Kensington, Sydney, NSW 2052, Australia, 61 414701759, m.moni@unsw.edu.au %K COVID-19 %K blood samples %K machine learning %K statistical analysis %K prediction %K severity %K mortality %K morbidity %K risk %K blood %K testing %K outcome %K data set %D 2021 %7 13.4.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Accurate prediction of the disease severity of patients with COVID-19 would greatly improve care delivery and resource allocation and thereby reduce mortality risks, especially in less developed countries. Many patient-related factors, such as pre-existing comorbidities, affect disease severity and can be used to aid this prediction. Objective: Because rapid automated profiling of peripheral blood samples is widely available, we aimed to investigate how data from the peripheral blood of patients with COVID-19 can be used to predict clinical outcomes. Methods: We investigated clinical data sets of patients with COVID-19 with known outcomes by combining statistical comparison and correlation methods with machine learning algorithms; the latter included decision tree, random forest, variants of gradient boosting machine, support vector machine, k-nearest neighbor, and deep learning methods. Results: Our work revealed that several clinical parameters that are measurable in blood samples are factors that can discriminate between healthy people and COVID-19–positive patients, and we showed the value of these parameters in predicting later severity of COVID-19 symptoms. We developed a number of analytical methods that showed accuracy and precision scores >90% for disease severity prediction. Conclusions: We developed methodologies to analyze routine patient clinical data that enable more accurate prediction of COVID-19 patient outcomes. With this approach, data from standard hospital laboratory analyses of patient blood could be used to identify patients with COVID-19 who are at high risk of mortality, thus enabling optimization of hospital facilities for COVID-19 treatment. %M 33779565 %R 10.2196/25884 %U https://medinform.jmir.org/2021/4/e25884 %U https://doi.org/10.2196/25884 %U http://www.ncbi.nlm.nih.gov/pubmed/33779565 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 4 %P e23238 %T A Patient Journey Map to Improve the Home Isolation Experience of Persons With Mild COVID-19: Design Research for Service Touchpoints of Artificial Intelligence in eHealth %A He,Qian %A Du,Fei %A Simonse,Lianne W L %+ Department of Design Organisation & Strategy, Faculty of Industrial Design Engineering, Delft University of Technology, Landbergstraat 15, Delft, 2628CE, Netherlands, 31 15 27 ext 89054, L.W.L.Simonse@tudelft.nl %K COVID-19 %K design %K eHealth %K artificial intelligence %K service design %K patient journey map %K user-centered design %K digital service solutions in health %K home isolation %K AI %K touchpoint %D 2021 %7 12.4.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: In the context of the COVID-19 outbreak, 80% of the persons who are infected have mild symptoms and are required to self-recover at home. They have a strong demand for remote health care that, despite the great potential of artificial intelligence (AI), is not met by the current services of eHealth. Understanding the real needs of these persons is lacking. Objective: The aim of this paper is to contribute a fine-grained understanding of the home isolation experience of persons with mild COVID-19 symptoms to enhance AI in eHealth services. Methods: A design research method with a qualitative approach was used to map the patient journey. Data on the home isolation experiences of persons with mild COVID-19 symptoms was collected from the top-viewed personal video stories on YouTube and their comment threads. For the analysis, this data was transcribed, coded, and mapped into the patient journey map. Results: The key findings on the home isolation experience of persons with mild COVID-19 symptoms concerned (1) an awareness period before testing positive, (2) less typical and more personal symptoms, (3) a negative mood experience curve, (5) inadequate home health care service support for patients, and (6) benefits and drawbacks of social media support. Conclusions: The design of the patient journey map and underlying insights on the home isolation experience of persons with mild COVID-19 symptoms serves health and information technology professionals in more effectively applying AI technology into eHealth services, for which three main service concepts are proposed: (1) trustworthy public health information to relieve stress, (2) personal COVID-19 health monitoring, and (3) community support. %M 33444156 %R 10.2196/23238 %U https://medinform.jmir.org/2021/4/e23238 %U https://doi.org/10.2196/23238 %U http://www.ncbi.nlm.nih.gov/pubmed/33444156 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e25312 %T Voice-Controlled Intelligent Personal Assistants in Health Care: International Delphi Study %A Ermolina,Alena %A Tiberius,Victor %+ Faculty of Economics and Social Sciences, University of Potsdam, August-Bebel-Str 89, Potsdam, 14882, Germany, 49 331 977 ext 3593, tiberius@uni-potsdam.de %K Delphi study %K medical informatics %K voice-controlled intelligent personal assistants %K internet of things %K smart devices %D 2021 %7 9.4.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Voice-controlled intelligent personal assistants (VIPAs), such as Amazon Echo and Google Home, involve artificial intelligence–powered algorithms designed to simulate humans. Their hands-free interface and growing capabilities have a wide range of applications in health care, covering off-clinic education, health monitoring, and communication. However, conflicting factors, such as patient safety and privacy concerns, make it difficult to foresee the further development of VIPAs in health care. Objective: This study aimed to develop a plausible scenario for the further development of VIPAs in health care to support decision making regarding the procurement of VIPAs in health care organizations. Methods: We conducted a two-stage Delphi study with an internationally recruited panel consisting of voice assistant experts, medical professionals, and representatives of academia, governmental health authorities, and nonprofit health associations having expertise with voice technology. Twenty projections were formulated and evaluated by the panelists. Descriptive statistics were used to derive the desired scenario. Results: The panelists expect VIPAs to be able to provide solid medical advice based on patients’ personal health information and to have human-like conversations. However, in the short term, voice assistants might neither provide frustration-free user experience nor outperform or replace humans in health care. With a high level of consensus, the experts agreed with the potential of VIPAs to support elderly people and be widely used as anamnesis, informational, self-therapy, and communication tools by patients and health care professionals. Although users’ and governments’ privacy concerns are not expected to decrease in the near future, the panelists believe that strict regulations capable of preventing VIPAs from providing medical help services will not be imposed. Conclusions: According to the surveyed experts, VIPAs will show notable technological development and gain more user trust in the near future, resulting in widespread application in health care. However, voice assistants are expected to solely support health care professionals in their daily operations and will not be able to outperform or replace medical staff. %M 33835032 %R 10.2196/25312 %U https://www.jmir.org/2021/4/e25312 %U https://doi.org/10.2196/25312 %U http://www.ncbi.nlm.nih.gov/pubmed/33835032 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 4 %P e24073 %T A Natural Language Processing–Based Virtual Patient Simulator and Intelligent Tutoring System for the Clinical Diagnostic Process: Simulator Development and Case Study %A Furlan,Raffaello %A Gatti,Mauro %A Menè,Roberto %A Shiffer,Dana %A Marchiori,Chiara %A Giaj Levra,Alessandro %A Saturnino,Vincenzo %A Brunetta,Enrico %A Dipaola,Franca %+ Department of Biomedical Sciences, Humanitas University, Via R Levi Montalcini, 4, Pieve Emanuele, Milan, 20090, Italy, 39 0282247228, raffaello.furlan@hunimed.eu %K COVID-19 %K intelligent tutoring system %K virtual patient simulator %K natural language processing %K artificial intelligence %K clinical diagnostic reasoning %D 2021 %7 9.4.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Shortage of human resources, increasing educational costs, and the need to keep social distances in response to the COVID-19 worldwide outbreak have prompted the necessity of clinical training methods designed for distance learning. Virtual patient simulators (VPSs) may partially meet these needs. Natural language processing (NLP) and intelligent tutoring systems (ITSs) may further enhance the educational impact of these simulators. Objective: The goal of this study was to develop a VPS for clinical diagnostic reasoning that integrates interaction in natural language and an ITS. We also aimed to provide preliminary results of a short-term learning test administered on undergraduate students after use of the simulator. Methods: We trained a Siamese long short-term memory network for anamnesis and NLP algorithms combined with Systematized Nomenclature of Medicine (SNOMED) ontology for diagnostic hypothesis generation. The ITS was structured on the concepts of knowledge, assessment, and learner models. To assess short-term learning changes, 15 undergraduate medical students underwent two identical tests, composed of multiple-choice questions, before and after performing a simulation by the virtual simulator. The test was made up of 22 questions; 11 of these were core questions that were specifically designed to evaluate clinical knowledge related to the simulated case. Results: We developed a VPS called Hepius that allows students to gather clinical information from the patient’s medical history, physical exam, and investigations and allows them to formulate a differential diagnosis by using natural language. Hepius is also an ITS that provides real-time step-by-step feedback to the student and suggests specific topics the student has to review to fill in potential knowledge gaps. Results from the short-term learning test showed an increase in both mean test score (P<.001) and mean score for core questions (P<.001) when comparing presimulation and postsimulation performance. Conclusions: By combining ITS and NLP technologies, Hepius may provide medical undergraduate students with a learning tool for training them in diagnostic reasoning. This may be particularly useful in a setting where students have restricted access to clinical wards, as is happening during the COVID-19 pandemic in many countries worldwide. %M 33720840 %R 10.2196/24073 %U https://medinform.jmir.org/2021/4/e24073 %U https://doi.org/10.2196/24073 %U http://www.ncbi.nlm.nih.gov/pubmed/33720840 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e27293 %T Classification Models for COVID-19 Test Prioritization in Brazil: Machine Learning Approach %A Viana dos Santos Santana,Íris %A CM da Silveira,Andressa %A Sobrinho,Álvaro %A Chaves e Silva,Lenardo %A Dias da Silva,Leandro %A Santos,Danilo F S %A Gurjão,Edmar C %A Perkusich,Angelo %+ Federal University of the Agreste of Pernambuco, Av. Bom Pastor, s/n - Boa Vista, Garanhuns, 55292-270, Brazil, 55 87981493955, alvaro.alvares@ufape.edu.br %K COVID-19 %K test prioritization %K classification models %K medical diagnosis %D 2021 %7 8.4.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Controlling the COVID-19 outbreak in Brazil is a challenge due to the population’s size and urban density, inefficient maintenance of social distancing and testing strategies, and limited availability of testing resources. Objective: The purpose of this study is to effectively prioritize patients who are symptomatic for testing to assist early COVID-19 detection in Brazil, addressing problems related to inefficient testing and control strategies. Methods: Raw data from 55,676 Brazilians were preprocessed, and the chi-square test was used to confirm the relevance of the following features: gender, health professional, fever, sore throat, dyspnea, olfactory disorders, cough, coryza, taste disorders, and headache. Classification models were implemented relying on preprocessed data sets; supervised learning; and the algorithms multilayer perceptron (MLP), gradient boosting machine (GBM), decision tree (DT), random forest (RF), extreme gradient boosting (XGBoost), k-nearest neighbors (KNN), support vector machine (SVM), and logistic regression (LR). The models’ performances were analyzed using 10-fold cross-validation, classification metrics, and the Friedman and Nemenyi statistical tests. The permutation feature importance method was applied for ranking the features used by the classification models with the highest performances. Results: Gender, fever, and dyspnea were among the highest-ranked features used by the classification models. The comparative analysis presents MLP, GBM, DT, RF, XGBoost, and SVM as the highest performance models with similar results. KNN and LR were outperformed by the other algorithms. Applying the easy interpretability as an additional comparison criterion, the DT was considered the most suitable model. Conclusions: The DT classification model can effectively (with a mean accuracy≥89.12%) assist COVID-19 test prioritization in Brazil. The model can be applied to recommend the prioritizing of a patient who is symptomatic for COVID-19 testing. %M 33750734 %R 10.2196/27293 %U https://www.jmir.org/2021/4/e27293 %U https://doi.org/10.2196/27293 %U http://www.ncbi.nlm.nih.gov/pubmed/33750734 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e23948 %T A Multimodality Machine Learning Approach to Differentiate Severe and Nonsevere COVID-19: Model Development and Validation %A Chen,Yuanfang %A Ouyang,Liu %A Bao,Forrest S %A Li,Qian %A Han,Lei %A Zhang,Hengdong %A Zhu,Baoli %A Ge,Yaorong %A Robinson,Patrick %A Xu,Ming %A Liu,Jie %A Chen,Shi %+ Department of Occupational Disease Prevention, Jiangsu Provincial Center for Disease Control and Prevention, 172 Jiangsu Road, Nanjing, 210009, China, 86 85393210, sosolou@126.com %K COVID-19 %K clinical type %K multimodality %K classification %K machine learning %K machine learning %K diagnosis %K prediction %K reliable %K decision support %D 2021 %7 7.4.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Effectively and efficiently diagnosing patients who have COVID-19 with the accurate clinical type of the disease is essential to achieve optimal outcomes for the patients as well as to reduce the risk of overloading the health care system. Currently, severe and nonsevere COVID-19 types are differentiated by only a few features, which do not comprehensively characterize the complicated pathological, physiological, and immunological responses to SARS-CoV-2 infection in the different disease types. In addition, these type-defining features may not be readily testable at the time of diagnosis. Objective: In this study, we aimed to use a machine learning approach to understand COVID-19 more comprehensively, accurately differentiate severe and nonsevere COVID-19 clinical types based on multiple medical features, and provide reliable predictions of the clinical type of the disease. Methods: For this study, we recruited 214 confirmed patients with nonsevere COVID-19 and 148 patients with severe COVID-19. The clinical characteristics (26 features) and laboratory test results (26 features) upon admission were acquired as two input modalities. Exploratory analyses demonstrated that these features differed substantially between two clinical types. Machine learning random forest models based on all the features in each modality as well as on the top 5 features in each modality combined were developed and validated to differentiate COVID-19 clinical types. Results: Using clinical and laboratory results independently as input, the random forest models achieved >90% and >95% predictive accuracy, respectively. The importance scores of the input features were further evaluated, and the top 5 features from each modality were identified (age, hypertension, cardiovascular disease, gender, and diabetes for the clinical features modality, and dimerized plasmin fragment D, high sensitivity troponin I, absolute neutrophil count, interleukin 6, and lactate dehydrogenase for the laboratory testing modality, in descending order). Using these top 10 multimodal features as the only input instead of all 52 features combined, the random forest model was able to achieve 97% predictive accuracy. Conclusions: Our findings shed light on how the human body reacts to SARS-CoV-2 infection as a unit and provide insights on effectively evaluating the disease severity of patients with COVID-19 based on more common medical features when gold standard features are not available. We suggest that clinical information can be used as an initial screening tool for self-evaluation and triage, while laboratory test results should be applied when accuracy is the priority. %M 33714935 %R 10.2196/23948 %U https://www.jmir.org/2021/4/e23948 %U https://doi.org/10.2196/23948 %U http://www.ncbi.nlm.nih.gov/pubmed/33714935 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 9 %N 4 %P e23718 %T Leprosy Screening Based on Artificial Intelligence: Development of a Cross-Platform App %A De Souza,Márcio Luís Moreira %A Lopes,Gabriel Ayres %A Branco,Alexandre Castelo %A Fairley,Jessica K %A Fraga,Lucia Alves De Oliveira %+ Multicentre Biochemistry and Molecular Biology Program, Federal University of Juiz de Fora, R São Paulo 745 Centro, Governador Valadares-MG, Brazil, 55 33 33011000, artigoacm@gmail.com %K leprosy %K artificial intelligence %K random forest %K Python %K R %K apps %K mHealth %K shinyApp %D 2021 %7 7.4.2021 %9 Original Paper %J JMIR Mhealth Uhealth %G English %X Background: According to the World Health Organization, achieving targets for control of leprosy by 2030 will require disease elimination and interruption of transmission at the national or regional level. India and Brazil have reported the highest leprosy burden in the last few decades, revealing the need for strategies and tools to help health professionals correctly manage and control the disease. Objective: The main objective of this study was to develop a cross-platform app for leprosy screening based on artificial intelligence (AI) with the goal of increasing accessibility of an accurate method of classifying leprosy treatment for health professionals, especially for communities further away from major diagnostic centers. Toward this end, we analyzed the quality of leprosy data in Brazil on the National Notifiable Diseases Information System (SINAN). Methods: Leprosy data were extracted from the SINAN database, carefully cleaned, and used to build AI decision models based on the random forest algorithm to predict operational classification in paucibacillary or multibacillary leprosy. We used Python programming language to extract and clean the data, and R programming language to train and test the AI model via cross-validation. To allow broad access, we deployed the final random forest classification model in a web app via shinyApp using data available from the Brazilian Institute of Geography and Statistics and the Department of Informatics of the Unified Health System. Results: We mapped the dispersion of leprosy incidence in Brazil from 2014 to 2018, and found a particularly high number of cases in central Brazil in 2014 that further increased in 2018 in the state of Mato Grosso. For some municipalities, up to 80% of cases showed some data discrepancy. Of a total of 21,047 discrepancies detected, the most common was “operational classification does not match the clinical form.” After data processing, we identified a total of 77,628 cases with missing data. The sensitivity and specificity of the AI model applied for the operational classification of leprosy was 93.97% and 87.09%, respectively. Conclusions: The proposed app was able to recognize patterns in leprosy cases registered in the SINAN database and to classify new patients with paucibacillary or multibacillary leprosy, thereby reducing the probability of incorrect assignment by health centers. The collection and notification of data on leprosy in Brazil seem to lack specific validation to increase the quality of the data for implementations via AI. The AI models implemented in this work had satisfactory accuracy across Brazilian states and could be a complementary diagnosis tool, especially in remote areas with few specialist physicians. %M 33825685 %R 10.2196/23718 %U https://mhealth.jmir.org/2021/4/e23718 %U https://doi.org/10.2196/23718 %U http://www.ncbi.nlm.nih.gov/pubmed/33825685 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 4 %P e24754 %T Retracted: Diagnostic Classification and Prognostic Prediction Using Common Genetic Variants in Autism Spectrum Disorder: Genotype-Based Deep Learning %A Wang,Haishuai %A Avillach,Paul %+ Department of Biomedical Informatics, Harvard Medical School, 10 Shattuck Street, Boston, MA, 02115, United States, 1 617 432 2144, Paul_Avillach@hms.harvard.edu %K deep learning %K autism spectrum disorder %K common genetic variants, diagnostic classification %D 2021 %7 7.4.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: In the United States, about 3 million people have autism spectrum disorder (ASD), and around 1 out of 59 children are diagnosed with ASD. People with ASD have characteristic social communication deficits and repetitive behaviors. The causes of this disorder remain unknown; however, in up to 25% of cases, a genetic cause can be identified. Detecting ASD as early as possible is desirable because early detection of ASD enables timely interventions in children with ASD. Identification of ASD based on objective pathogenic mutation screening is the major first step toward early intervention and effective treatment of affected children. Objective: Recent investigation interrogated genomics data for detecting and treating autism disorders, in addition to the conventional clinical interview as a diagnostic test. Since deep neural networks perform better than shallow machine learning models on complex and high-dimensional data, in this study, we sought to apply deep learning to genetic data obtained across thousands of simplex families at risk for ASD to identify contributory mutations and to create an advanced diagnostic classifier for autism screening. Methods: After preprocessing the genomics data from the Simons Simplex Collection, we extracted top ranking common variants that may be protective or pathogenic for autism based on a chi-square test. A convolutional neural network–based diagnostic classifier was then designed using the identified significant common variants to predict autism. The performance was then compared with shallow machine learning–based classifiers and randomly selected common variants. Results: The selected contributory common variants were significantly enriched in chromosome X while chromosome Y was also discriminatory in determining the identification of autistic individuals from nonautistic individuals. The ARSD, MAGEB16, and MXRA5 genes had the largest effect in the contributory variants. Thus, screening algorithms were adapted to include these common variants. The deep learning model yielded an area under the receiver operating characteristic curve of 0.955 and an accuracy of 88% for identifying autistic individuals from nonautistic individuals. Our classifier demonstrated a considerable improvement of ~13% in terms of classification accuracy compared to standard autism screening tools. Conclusions: Common variants are informative for autism identification. Our findings also suggest that the deep learning process is a reliable method for distinguishing the diseased group from the control group based on the common variants of autism. %M 33714937 %R 10.2196/24754 %U https://medinform.jmir.org/2021/4/e24754 %U https://doi.org/10.2196/24754 %U http://www.ncbi.nlm.nih.gov/pubmed/33714937 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e26627 %T Artificial Intelligence–Enabled Analysis of Public Attitudes on Facebook and Twitter Toward COVID-19 Vaccines in the United Kingdom and the United States: Observational Study %A Hussain,Amir %A Tahir,Ahsen %A Hussain,Zain %A Sheikh,Zakariya %A Gogate,Mandar %A Dashtipour,Kia %A Ali,Azhar %A Sheikh,Aziz %+ School of Computing, Edinburgh Napier University, 10 Colinton Road, Edinburgh, EH10 5DT, United Kingdom, 44 0845 260 6040, a.hussain@napier.ac.uk %K artificial intelligence %K COVID-19 %K deep learning %K Facebook %K health informatics %K natural language processing %K public health %K sentiment analysis %K social media %K Twitter %K infodemiology %K vaccination %D 2021 %7 5.4.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Global efforts toward the development and deployment of a vaccine for COVID-19 are rapidly advancing. To achieve herd immunity, widespread administration of vaccines is required, which necessitates significant cooperation from the general public. As such, it is crucial that governments and public health agencies understand public sentiments toward vaccines, which can help guide educational campaigns and other targeted policy interventions. Objective: The aim of this study was to develop and apply an artificial intelligence–based approach to analyze public sentiments on social media in the United Kingdom and the United States toward COVID-19 vaccines to better understand the public attitude and concerns regarding COVID-19 vaccines. Methods: Over 300,000 social media posts related to COVID-19 vaccines were extracted, including 23,571 Facebook posts from the United Kingdom and 144,864 from the United States, along with 40,268 tweets from the United Kingdom and 98,385 from the United States from March 1 to November 22, 2020. We used natural language processing and deep learning–based techniques to predict average sentiments, sentiment trends, and topics of discussion. These factors were analyzed longitudinally and geospatially, and manual reading of randomly selected posts on points of interest helped identify underlying themes and validated insights from the analysis. Results: Overall averaged positive, negative, and neutral sentiments were at 58%, 22%, and 17% in the United Kingdom, compared to 56%, 24%, and 18% in the United States, respectively. Public optimism over vaccine development, effectiveness, and trials as well as concerns over their safety, economic viability, and corporation control were identified. We compared our findings to those of nationwide surveys in both countries and found them to correlate broadly. Conclusions: Artificial intelligence–enabled social media analysis should be considered for adoption by institutions and governments alongside surveys and other conventional methods of assessing public attitude. Such analyses could enable real-time assessment, at scale, of public confidence and trust in COVID-19 vaccines, help address the concerns of vaccine sceptics, and help develop more effective policies and communication strategies to maximize uptake. %M 33724919 %R 10.2196/26627 %U https://www.jmir.org/2021/4/e26627 %U https://doi.org/10.2196/26627 %U http://www.ncbi.nlm.nih.gov/pubmed/33724919 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e22394 %T Radiomic and Genomic Machine Learning Method Performance for Prostate Cancer Diagnosis: Systematic Literature Review %A Castaldo,Rossana %A Cavaliere,Carlo %A Soricelli,Andrea %A Salvatore,Marco %A Pecchia,Leandro %A Franzese,Monica %+ IRCCS SDN, 113 Via E Gianturco, Naples, 80143, Italy, 39 3470563424, carlo.cavaliere@synlab.it %K prostate cancer %K machine learning %K systematic review %K meta-analysis %K diagnosis %K imaging %K radiomics %K genomics %K clinical %K biomarkers %D 2021 %7 1.4.2021 %9 Review %J J Med Internet Res %G English %X Background: Machine learning algorithms have been drawing attention at the joining of pathology and radiology in prostate cancer research. However, due to their algorithmic learning complexity and the variability of their architecture, there is an ongoing need to analyze their performance. Objective: This study assesses the source of heterogeneity and the performance of machine learning applied to radiomic, genomic, and clinical biomarkers for the diagnosis of prostate cancer. One research focus of this study was on clearly identifying problems and issues related to the implementation of machine learning in clinical studies. Methods: Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) protocol, 816 titles were identified from the PubMed, Scopus, and OvidSP databases. Studies that used machine learning to detect prostate cancer and provided performance measures were included in our analysis. The quality of the eligible studies was assessed using the QUADAS-2 (quality assessment of diagnostic accuracy studies–version 2) tool. The hierarchical multivariate model was applied to the pooled data in a meta-analysis. To investigate the heterogeneity among studies, I2 statistics were performed along with visual evaluation of coupled forest plots. Due to the internal heterogeneity among machine learning algorithms, subgroup analysis was carried out to investigate the diagnostic capability of machine learning systems in clinical practice. Results: In the final analysis, 37 studies were included, of which 29 entered the meta-analysis pooling. The analysis of machine learning methods to detect prostate cancer reveals the limited usage of the methods and the lack of standards that hinder the implementation of machine learning in clinical applications. Conclusions: The performance of machine learning for diagnosis of prostate cancer was considered satisfactory for several studies investigating the multiparametric magnetic resonance imaging and urine biomarkers; however, given the limitations indicated in our study, further studies are warranted to extend the potential use of machine learning to clinical settings. Recommendations on the use of machine learning techniques were also provided to help researchers to design robust studies to facilitate evidence generation from the use of radiomic and genomic biomarkers. %M 33792552 %R 10.2196/22394 %U https://www.jmir.org/2021/4/e22394 %U https://doi.org/10.2196/22394 %U http://www.ncbi.nlm.nih.gov/pubmed/33792552 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 4 %P e25000 %T Mortality Prediction of Patients With Cardiovascular Disease Using Medical Claims Data Under Artificial Intelligence Architectures: Validation Study %A Tran,Linh %A Chi,Lianhua %A Bonti,Alessio %A Abdelrazek,Mohamed %A Chen,Yi-Ping Phoebe %+ Department of Computer Science and Information Technology, La Trobe University, Beth Gleeson Bldg, 2rd Fl, #242, La Trobe University, Bundoora, 3086, Australia, 61 94792454, l.chi@latrobe.edu.au %K mortality %K cardiovascular %K medical claims data %K imbalanced data %K machine learning %K deep learning %D 2021 %7 1.4.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Cardiovascular disease (CVD) is the greatest health problem in Australia, which kills more people than any other disease and incurs enormous costs for the health care system. In this study, we present a benchmark comparison of various artificial intelligence (AI) architectures for predicting the mortality rate of patients with CVD using structured medical claims data. Compared with other research in the clinical literature, our models are more efficient because we use a smaller number of features, and this study could help health professionals accurately choose AI models to predict mortality among patients with CVD using only claims data before a clinic visit. Objective: This study aims to support health clinicians in accurately predicting mortality among patients with CVD using only claims data before a clinic visit. Methods: The data set was obtained from the Medicare Benefits Scheme and Pharmaceutical Benefits Scheme service information in the period between 2004 and 2014, released by the Department of Health Australia in 2016. It included 346,201 records, corresponding to 346,201 patients. A total of five AI algorithms, including four classical machine learning algorithms (logistic regression [LR], random forest [RF], extra trees [ET], and gradient boosting trees [GBT]) and a deep learning algorithm, which is a densely connected neural network (DNN), were developed and compared in this study. In addition, because of the minority of deceased patients in the data set, a separate experiment using the Synthetic Minority Oversampling Technique (SMOTE) was conducted to enrich the data. Results: Regarding model performance, in terms of discrimination, GBT and RF were the models with the highest area under the receiver operating characteristic curve (97.8% and 97.7%, respectively), followed by ET (96.8%) and LR (96.4%), whereas DNN was the least discriminative (95.3%). In terms of reliability, LR predictions were the least calibrated compared with the other four algorithms. In this study, despite increasing the training time, SMOTE was proven to further improve the model performance of LR, whereas other algorithms, especially GBT and DNN, worked well with class imbalanced data. Conclusions: Compared with other research in the clinical literature involving AI models using claims data to predict patient health outcomes, our models are more efficient because we use a smaller number of features but still achieve high performance. This study could help health professionals accurately choose AI models to predict mortality among patients with CVD using only claims data before a clinic visit. %M 33792549 %R 10.2196/25000 %U https://medinform.jmir.org/2021/4/e25000 %U https://doi.org/10.2196/25000 %U http://www.ncbi.nlm.nih.gov/pubmed/33792549 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 9 %N 3 %P e22183 %T Interpretable Conditional Recurrent Neural Network for Weight Change Prediction: Algorithm Development and Validation Study %A Kim,Ho Heon %A Kim,Youngin %A Park,Yu Rang %+ Department of Biomedical Systems Informatics, College of Medicine, Yonsei University, 50-1 Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea, 82 0222282493, yurangpark@yuhs.ac %K explainable AI %K interpretable AI %K mHealth %K obesity %K behavior modification %K artificial intelligence %K development %K validation %K weight %K intervention %D 2021 %7 29.3.2021 %9 Original Paper %J JMIR Mhealth Uhealth %G English %X Background: In recent years, mobile-based interventions have received more attention as an alternative to on-site obesity management. Despite increased mobile interventions for obesity, there are lost opportunities to achieve better outcomes due to the lack of a predictive model using current existing longitudinal and cross-sectional health data. Noom (Noom Inc) is a mobile app that provides various lifestyle-related logs including food logging, exercise logging, and weight logging. Objective: The aim of this study was to develop a weight change predictive model using an interpretable artificial intelligence algorithm for mobile-based interventions and to explore contributing factors to weight loss. Methods: Lifelog mobile app (Noom) user data of individuals who used the weight loss program for 16 weeks in the United States were used to develop an interpretable recurrent neural network algorithm for weight prediction that considers both time-variant and time-fixed variables. From a total of 93,696 users in the coaching program, we excluded users who did not take part in the 16-week weight loss program or who were not overweight or obese or had not entered weight or meal records for the entire 16-week program. This interpretable model was trained and validated with 5-fold cross-validation (training set: 70%; testing: 30%) using the lifelog data. Mean absolute percentage error between actual weight loss and predicted weight was used to measure model performance. To better understand the behavior factors contributing to weight loss or gain, we calculated contribution coefficients in test sets. Results: A total of 17,867 users’ data were included in the analysis. The overall mean absolute percentage error of the model was 3.50%, and the error of the model declined from 3.78% to 3.45% by the end of the program. The time-level attention weighting was shown to be equally distributed at 0.0625 each week, but this gradually decreased (from 0.0626 to 0.0624) as it approached 16 weeks. Factors such as usage pattern, weight input frequency, meal input adherence, exercise, and sharp decreases in weight trajectories had negative contribution coefficients of –0.021, –0.032, –0.015, and –0.066, respectively. For time-fixed variables, being male had a contribution coefficient of –0.091. Conclusions: An interpretable algorithm, with both time-variant and time-fixed data, was used to precisely predict weight loss while preserving model transparency. This week-to-week prediction model is expected to improve weight loss and provide a global explanation of contributing factors, leading to better outcomes. %M 33779574 %R 10.2196/22183 %U https://mhealth.jmir.org/2021/3/e22183 %U https://doi.org/10.2196/22183 %U http://www.ncbi.nlm.nih.gov/pubmed/33779574 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 8 %N 3 %P e26811 %T Digital Mental Health Challenges and the Horizon Ahead for Solutions %A Balcombe,Luke %A De Leo,Diego %+ Australian Institute for Suicide Research and Prevention, Griffith University, Messines Ridge Rd, Brisbane, 4122, Australia, 61 0447505709, lukebalcombe@gmail.com %K challenges %K COVID-19 %K digital mental health implementation %K explainable artificial intelligence %K hybrid model of care %K human-computer interaction %K resilience %K technology %D 2021 %7 29.3.2021 %9 Commentary %J JMIR Ment Health %G English %X The demand outstripping supply of mental health resources during the COVID-19 pandemic presents opportunities for digital technology tools to fill this new gap and, in the process, demonstrate capabilities to increase their effectiveness and efficiency. However, technology-enabled services have faced challenges in being sustainably implemented despite showing promising outcomes in efficacy trials since the early 2000s. The ongoing failure of these implementations has been addressed in reconceptualized models and frameworks, along with various efforts to branch out among disparate developers and clinical researchers to provide them with a key for furthering evaluative research. However, the limitations of traditional research methods in dealing with the complexities of mental health care warrant a diversified approach. The crux of the challenges of digital mental health implementation is the efficacy and evaluation of existing studies. Web-based interventions are increasingly used during the pandemic, allowing for affordable access to psychological therapies. However, a lagging infrastructure and skill base has limited the application of digital solutions in mental health care. Methodologies need to be converged owing to the rapid development of digital technologies that have outpaced the evaluation of rigorous digital mental health interventions and strategies to prevent mental illness. The functions and implications of human-computer interaction require a better understanding to overcome engagement barriers, especially with predictive technologies. Explainable artificial intelligence is being incorporated into digital mental health implementation to obtain positive and responsible outcomes. Investment in digital platforms and associated apps for real-time screening, tracking, and treatment offer the promise of cost-effectiveness in vulnerable populations. Although machine learning has been limited by study conduct and reporting methods, the increasing use of unstructured data has strengthened its potential. Early evidence suggests that the advantages outweigh the disadvantages of incrementing such technology. The limitations of an evidence-based approach require better integration of decision support tools to guide policymakers with digital mental health implementation. There is a complex range of issues with effectiveness, equity, access, and ethics (eg, privacy, confidentiality, fairness, transparency, reproducibility, and accountability), which warrant resolution. Evidence-informed policies, development of eminent digital products and services, and skills to use and maintain these solutions are required. Studies need to focus on developing digital platforms with explainable artificial intelligence–based apps to enhance resilience and guide the treatment decisions of mental health practitioners. Investments in digital mental health should ensure their safety and workability. End users should encourage the use of innovative methods to encourage developers to effectively evaluate their products and services and to render them a worthwhile investment. Technology-enabled services in a hybrid model of care are most likely to be effective (eg, specialists using these services among vulnerable, at-risk populations but not severe cases of mental ill health). %M 33779570 %R 10.2196/26811 %U https://mental.jmir.org/2021/3/e26811 %U https://doi.org/10.2196/26811 %U http://www.ncbi.nlm.nih.gov/pubmed/33779570 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 3 %P e27767 %T Accuracy of an Artificial Intelligence System for Cancer Clinical Trial Eligibility Screening: Retrospective Pilot Study %A Haddad,Tufia %A Helgeson,Jane M %A Pomerleau,Katharine E %A Preininger,Anita M %A Roebuck,M Christopher %A Dankwa-Mullan,Irene %A Jackson,Gretchen Purcell %A Goetz,Matthew P %+ Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, United States, 1 507 284 3731, haddad.tufia@mayo.edu %K clinical trial matching %K clinical decision support system %K machine learning %K artificial intelligence %K screening %K clinical trials %K eligibility %K breast cancer %D 2021 %7 26.3.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Screening patients for eligibility for clinical trials is labor intensive. It requires abstraction of data elements from multiple components of the longitudinal health record and matching them to inclusion and exclusion criteria for each trial. Artificial intelligence (AI) systems have been developed to improve the efficiency and accuracy of this process. Objective: This study aims to evaluate the ability of an AI clinical decision support system (CDSS) to identify eligible patients for a set of clinical trials. Methods: This study included the deidentified data from a cohort of patients with breast cancer seen at the medical oncology clinic of an academic medical center between May and July 2017 and assessed patient eligibility for 4 breast cancer clinical trials. CDSS eligibility screening performance was validated against manual screening. Accuracy, sensitivity, specificity, positive predictive value, and negative predictive value for eligibility determinations were calculated. Disagreements between manual screeners and the CDSS were examined to identify sources of discrepancies. Interrater reliability between manual reviewers was analyzed using Cohen (pairwise) and Fleiss (three-way) κ, and the significance of differences was determined by Wilcoxon signed-rank test. Results: In total, 318 patients with breast cancer were included. Interrater reliability for manual screening ranged from 0.60-0.77, indicating substantial agreement. The overall accuracy of breast cancer trial eligibility determinations by the CDSS was 87.6%. CDSS sensitivity was 81.1% and specificity was 89%. Conclusions: The AI CDSS in this study demonstrated accuracy, sensitivity, and specificity of greater than 80% in determining the eligibility of patients for breast cancer clinical trials. CDSSs can accurately exclude ineligible patients for clinical trials and offer the potential to increase screening efficiency and accuracy. Additional research is needed to explore whether increased efficiency in screening and trial matching translates to improvements in trial enrollment, accruals, feasibility assessments, and cost. %M 33769304 %R 10.2196/27767 %U https://medinform.jmir.org/2021/3/e27767 %U https://doi.org/10.2196/27767 %U http://www.ncbi.nlm.nih.gov/pubmed/33769304 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 3 %P e21695 %T Reducing the Impact of Confounding Factors on Skin Cancer Classification via Image Segmentation: Technical Model Study %A Maron,Roman C %A Hekler,Achim %A Krieghoff-Henning,Eva %A Schmitt,Max %A Schlager,Justin G %A Utikal,Jochen S %A Brinker,Titus J %+ Digital Biomarkers for Oncology Group, National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg, 69120, Germany, 49 6221 3219304, titus.brinker@dkfz.de %K dermatology %K diagnosis %K artificial intelligence %K neural networks %K image segmentation %K confounding factors %K artifacts %K melanoma %K nevus %K deep learning %D 2021 %7 25.3.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Studies have shown that artificial intelligence achieves similar or better performance than dermatologists in specific dermoscopic image classification tasks. However, artificial intelligence is susceptible to the influence of confounding factors within images (eg, skin markings), which can lead to false diagnoses of cancerous skin lesions. Image segmentation can remove lesion-adjacent confounding factors but greatly change the image representation. Objective: The aim of this study was to compare the performance of 2 image classification workflows where images were either segmented or left unprocessed before the subsequent training and evaluation of a binary skin lesion classifier. Methods: Separate binary skin lesion classifiers (nevus vs melanoma) were trained and evaluated on segmented and unsegmented dermoscopic images. For a more informative result, separate classifiers were trained on 2 distinct training data sets (human against machine [HAM] and International Skin Imaging Collaboration [ISIC]). Each training run was repeated 5 times. The mean performance of the 5 runs was evaluated on a multi-source test set (n=688) consisting of a holdout and an external component. Results: Our findings showed that when trained on HAM, the segmented classifiers showed a higher overall balanced accuracy (75.6% [SD 1.1%]) than the unsegmented classifiers (66.7% [SD 3.2%]), which was significant in 4 out of 5 runs (P<.001). The overall balanced accuracy was numerically higher for the unsegmented ISIC classifiers (78.3% [SD 1.8%]) than for the segmented ISIC classifiers (77.4% [SD 1.5%]), which was significantly different in 1 out of 5 runs (P=.004). Conclusions: Image segmentation does not result in overall performance decrease but it causes the beneficial removal of lesion-adjacent confounding factors. Thus, it is a viable option to address the negative impact that confounding factors have on deep learning models in dermatology. However, the segmentation step might introduce new pitfalls, which require further investigations. %M 33764307 %R 10.2196/21695 %U https://www.jmir.org/2021/3/e21695 %U https://doi.org/10.2196/21695 %U http://www.ncbi.nlm.nih.gov/pubmed/33764307 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 3 %P e23888 %T Noninvasive Real-Time Mortality Prediction in Intensive Care Units Based on Gradient Boosting Method: Model Development and Validation Study %A Jiang,Huizhen %A Su,Longxiang %A Wang,Hao %A Li,Dongkai %A Zhao,Congpu %A Hong,Na %A Long,Yun %A Zhu,Weiguo %+ Department of Information Center, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, 1 Shuaifuyuan, Dongcheng District, Beijing, 100730, China, 86 01069154149, Zhuwg@pumch.cn %K real time %K mortality prediction %K intensive care unit %K noninvasive %D 2021 %7 25.3.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Monitoring critically ill patients in intensive care units (ICUs) in real time is vitally important. Although scoring systems are most often used in risk prediction of mortality, they are usually not highly precise, and the clinical data are often simply weighted. This method is inefficient and time-consuming in the clinical setting. Objective: The objective of this study was to integrate all medical data and noninvasively predict the real-time mortality of ICU patients using a gradient boosting method. Specifically, our goal was to predict mortality using a noninvasive method to minimize the discomfort to patients. Methods: In this study, we established five models to predict mortality in real time based on different features. According to the monitoring, laboratory, and scoring data, we constructed the feature engineering. The five real-time mortality prediction models were RMM (based on monitoring features), RMA (based on monitoring features and the Acute Physiology and Chronic Health Evaluation [APACHE]), RMS (based on monitoring features and Sequential Organ Failure Assessment [SOFA]), RMML (based on monitoring and laboratory features), and RM (based on all monitoring, laboratory, and scoring features). All models were built using LightGBM and tested with XGBoost. We then compared the performance of all models, with particular focus on the noninvasive method, the RMM model. Results: After extensive experiments, the area under the curve of the RMM model was 0.8264, which was superior to that of the RMA and RMS models. Therefore, predicting mortality using the noninvasive method was both efficient and practical, as it eliminated the need for extra physical interventions on patients, such as the drawing of blood. In addition, we explored the top nine features relevant to real-time mortality prediction: invasive mean blood pressure, heart rate, invasive systolic blood pressure, oxygen concentration, oxygen saturation, balance of input and output, total input, invasive diastolic blood pressure, and noninvasive mean blood pressure. These nine features should be given more focus in routine clinical practice. Conclusions: The results of this study may be helpful in real-time mortality prediction in patients in the ICU, especially the noninvasive method. It is efficient and favorable to patients, which offers a strong practical significance. %M 33764311 %R 10.2196/23888 %U https://medinform.jmir.org/2021/3/e23888 %U https://doi.org/10.2196/23888 %U http://www.ncbi.nlm.nih.gov/pubmed/33764311 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 9 %N 3 %P e25406 %T Digital Health Integration Assessment and Maturity of the United States Biopharmaceutical Industry: Forces Driving the Next Generation of Connected Autoinjectable Devices %A Rafiei,Ramin %A Williams,Chelsea %A Jiang,Jeannette %A Aungst,Timothy Dy %A Durrer,Matthias %A Tran,Dao %A Howald,Ralph %+ SHL Medical, Gubelstrasse 22, 6300, Zug, Switzerland, 1 5617135654, chelsea.williams@shl-medical.com %K digital health %K artificial intelligence %K drug delivery %K biopharma %K autoinjector %K injectable devices %K disease management %K autoimmune %K oncology %K rare diseases %D 2021 %7 18.3.2021 %9 Viewpoint %J JMIR Mhealth Uhealth %G English %X Autoinjectable devices continue to provide real-life benefits for patients with chronic conditions since their widespread adoption 30 years ago with the rise of macromolecules. Nonetheless, issues surrounding adherence, patient administration techniques, disease self-management, and data outcomes at scale persist despite product design innovation. The interface of drug device combination products and digital health technologies formulates a value proposition for next-generation autoinjectable devices to power the delivery of precision care at home and achieve the full potential of biologics. Success will largely be dependent on biopharma’s digital health maturity to implement this framework. This viewpoint measures the digital health maturity of the top 15 biopharmaceutical companies in the US biologics autoinjector market and establishes the framework for next-generation autoinjectable devices powering home-based precision care and the need for formal digital health training. %M 33621188 %R 10.2196/25406 %U https://mhealth.jmir.org/2021/3/e25406 %U https://doi.org/10.2196/25406 %U http://www.ncbi.nlm.nih.gov/pubmed/33621188 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 3 %P e23328 %T Realistic High-Resolution Body Computed Tomography Image Synthesis by Using Progressive Growing Generative Adversarial Network: Visual Turing Test %A Park,Ho Young %A Bae,Hyun-Jin %A Hong,Gil-Sun %A Kim,Minjee %A Yun,JiHye %A Park,Sungwon %A Chung,Won Jung %A Kim,NamKug %+ Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine & Asan Medical Center, 88 Olympic-ro 43-gil, Songpa-gu, Seoul, 05505, Republic of Korea, 82 2 3010 1548, hgs2013@gmail.com %K generative adversarial network %K unsupervised deep learning %K computed tomography %K synthetic body images %K visual Turing test %D 2021 %7 17.3.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Generative adversarial network (GAN)–based synthetic images can be viable solutions to current supervised deep learning challenges. However, generating highly realistic images is a prerequisite for these approaches. Objective: The aim of this study was to investigate and validate the unsupervised synthesis of highly realistic body computed tomography (CT) images by using a progressive growing GAN (PGGAN) trained to learn the probability distribution of normal data. Methods: We trained the PGGAN by using 11,755 body CT scans. Ten radiologists (4 radiologists with <5 years of experience [Group I], 4 radiologists with 5-10 years of experience [Group II], and 2 radiologists with >10 years of experience [Group III]) evaluated the results in a binary approach by using an independent validation set of 300 images (150 real and 150 synthetic) to judge the authenticity of each image. Results: The mean accuracy of the 10 readers in the entire image set was higher than random guessing (1781/3000, 59.4% vs 1500/3000, 50.0%, respectively; P<.001). However, in terms of identifying synthetic images as fake, there was no significant difference in the specificity between the visual Turing test and random guessing (779/1500, 51.9% vs 750/1500, 50.0%, respectively; P=.29). The accuracy between the 3 reader groups with different experience levels was not significantly different (Group I, 696/1200, 58.0%; Group II, 726/1200, 60.5%; and Group III, 359/600, 59.8%; P=.36). Interreader agreements were poor (κ=0.11) for the entire image set. In subgroup analysis, the discrepancies between real and synthetic CT images occurred mainly in the thoracoabdominal junction and in the anatomical details. Conclusions: The GAN can synthesize highly realistic high-resolution body CT images that are indistinguishable from real images; however, it has limitations in generating body images of the thoracoabdominal junction and lacks accuracy in the anatomical details. %M 33609339 %R 10.2196/23328 %U https://medinform.jmir.org/2021/3/e23328 %U https://doi.org/10.2196/23328 %U http://www.ncbi.nlm.nih.gov/pubmed/33609339 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 3 %P e27079 %T Emotional Attitudes of Chinese Citizens on Social Distancing During the COVID-19 Outbreak: Analysis of Social Media Data %A Shen,Lining %A Yao,Rui %A Zhang,Wenli %A Evans,Richard %A Cao,Guang %A Zhang,Zhiguo %+ School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science & Technology, No 13 Hangkong Road, Wuhan, 430030, China, 86 02783692730, sln2008@hust.edu.cn %K COVID-19 %K Sina Weibo %K social distancing measures %K emotional analysis %K machine learning %K moderating effects %K deep learning %K social media %K emotion %K attitude %K infodemiology %K infoveillance %D 2021 %7 16.3.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Wuhan, China, the epicenter of the COVID-19 pandemic, imposed citywide lockdown measures on January 23, 2020. Neighboring cities in Hubei Province followed suit with the government enforcing social distancing measures to restrict the spread of the disease throughout the province. Few studies have examined the emotional attitudes of citizens as expressed on social media toward the imposed social distancing measures and the factors that affected their emotions. Objective: The aim of this study was twofold. First, we aimed to detect the emotional attitudes of different groups of users on Sina Weibo toward the social distancing measures imposed by the People’s Government of Hubei Province. Second, the influencing factors of their emotions, as well as the impact of the imposed measures on users’ emotions, was studied. Methods: Sina Weibo, one of China’s largest social media platforms, was chosen as the primary data source. The time span of selected data was from January 21, 2020, to March 24, 2020, while analysis was completed in late June 2020. Bi-directional long short-term memory (Bi-LSTM) was used to analyze users’ emotions, while logistic regression analysis was employed to explore the influence of explanatory variables on users’ emotions, such as age and spatial location. Further, the moderating effects of social distancing measures on the relationship between user characteristics and users’ emotions were assessed by observing the interaction effects between the measures and explanatory variables. Results: Based on the 63,169 comments obtained, we identified six topics of discussion—(1) delaying the resumption of work and school, (2) travel restrictions, (3) traffic restrictions, (4) extending the Lunar New Year holiday, (5) closing public spaces, and (6) community containment. There was no multicollinearity in the data during statistical analysis; the Hosmer-Lemeshow goodness-of-fit was 0.24 (χ28=10.34, P>.24). The main emotions shown by citizens were negative, including anger and fear. Users located in Hubei Province showed the highest amount of negative emotions in Mainland China. There are statistically significant differences in the distribution of emotional polarity between social distancing measures (χ220=19,084.73, P<.001), as well as emotional polarity between genders (χ24=1784.59, P<.001) and emotional polarity between spatial locations (χ24=1659.67, P<.001). Compared with other types of social distancing measures, the measures of delaying the resumption of work and school or travel restrictions mainly had a positive moderating effect on public emotion, while traffic restrictions or community containment had a negative moderating effect on public emotion. Conclusions: Findings provide a reference point for the adoption of epidemic prevention and control measures, and are considered helpful for government agencies to take timely actions to alleviate negative emotions during public health emergencies. %M 33724200 %R 10.2196/27079 %U https://medinform.jmir.org/2021/3/e27079 %U https://doi.org/10.2196/27079 %U http://www.ncbi.nlm.nih.gov/pubmed/33724200 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 3 %P e17934 %T Hybrid Deep Learning for Medication-Related Information Extraction From Clinical Texts in French: MedExt Algorithm Development Study %A Jouffroy,Jordan %A Feldman,Sarah F %A Lerner,Ivan %A Rance,Bastien %A Burgun,Anita %A Neuraz,Antoine %+ Department of Biomedical Informatics, Necker-Enfants malades Hospital, Assistance Publique–Hôpitaux de Paris, Bâtiment Imagine - Bureau 145, 149 rue de Sèvres, Paris, , France, 33 171396585, antoine.neuraz@aphp.fr %K medication information %K natural language processing %K electronic health records %K deep learning %K rule-based system, recurrent neural network %K hybrid system %D 2021 %7 16.3.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Information related to patient medication is crucial for health care; however, up to 80% of the information resides solely in unstructured text. Manual extraction is difficult and time-consuming, and there is not a lot of research on natural language processing extracting medical information from unstructured text from French corpora. Objective: We aimed to develop a system to extract medication-related information from clinical text written in French. Methods: We developed a hybrid system combining an expert rule–based system, contextual word embedding (embedding for language model) trained on clinical notes, and a deep recurrent neural network (bidirectional long short term memory–conditional random field). The task consisted of extracting drug mentions and their related information (eg, dosage, frequency, duration, route, condition). We manually annotated 320 clinical notes from a French clinical data warehouse to train and evaluate the model. We compared the performance of our approach to those of standard approaches: rule-based or machine learning only and classic word embeddings. We evaluated the models using token-level recall, precision, and F-measure. Results: The overall F-measure was 89.9% (precision 90.8; recall: 89.2) when combining expert rules and contextualized embeddings, compared to 88.1% (precision 89.5; recall 87.2) without expert rules or contextualized embeddings. The F-measures for each category were 95.3% for medication name, 64.4% for drug class mentions, 95.3% for dosage, 92.2% for frequency, 78.8% for duration, and 62.2% for condition of the intake. Conclusions: Associating expert rules, deep contextualized embedding, and deep neural networks improved medication information extraction. Our results revealed a synergy when associating expert knowledge and latent knowledge. %M 33724196 %R 10.2196/17934 %U https://medinform.jmir.org/2021/3/e17934 %U https://doi.org/10.2196/17934 %U http://www.ncbi.nlm.nih.gov/pubmed/33724196 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 3 %P e23415 %T A Novel Convolutional Neural Network for the Diagnosis and Classification of Rosacea: Usability Study %A Zhao,Zhixiang %A Wu,Che-Ming %A Zhang,Shuping %A He,Fanping %A Liu,Fangfen %A Wang,Ben %A Huang,Yingxue %A Shi,Wei %A Jian,Dan %A Xie,Hongfu %A Yeh,Chao-Yuan %A Li,Ji %+ Department of Dermatology, Xiangya Hospital of Central South University, 87 Xiangya Rd., Changsha, 410008, China, 86 073189753406, liji_xy@csu.edu.cn %K rosacea %K artificial intelligence %K convolutional neural networks %D 2021 %7 15.3.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Rosacea is a chronic inflammatory disease with variable clinical presentations, including transient flushing, fixed erythema, papules, pustules, and phymatous changes on the central face. Owing to the diversity in the clinical manifestations of rosacea, the lack of objective biochemical examinations, and nonspecificity in histopathological findings, accurate identification of rosacea is a big challenge. Artificial intelligence has emerged as a potential tool in the identification and evaluation of some skin diseases such as melanoma, basal cell carcinoma, and psoriasis. Objective: The objective of our study was to utilize a convolutional neural network (CNN) to differentiate the clinical photos of patients with rosacea (taken from 3 different angles) from those of patients with other skin diseases such as acne, seborrheic dermatitis, and eczema that could be easily confused with rosacea. Methods: In this study, 24,736 photos comprising of 18,647 photos of patients with rosacea and 6089 photos of patients with other skin diseases such as acne, facial seborrheic dermatitis, and eczema were included and analyzed by our CNN model based on ResNet-50. Results: The CNN in our study achieved an overall accuracy and precision of 0.914 and 0.898, with an area under the receiver operating characteristic curve of 0.972 for the detection of rosacea. The accuracy of classifying 3 subtypes of rosacea, that is, erythematotelangiectatic rosacea, papulopustular rosacea, and phymatous rosacea was 83.9%, 74.3%, and 80.0%, respectively. Moreover, the accuracy and precision of our CNN to distinguish rosacea from acne reached 0.931 and 0.893, respectively. For the differentiation between rosacea, seborrheic dermatitis, and eczema, the overall accuracy of our CNN was 0.757 and the precision was 0.667. Finally, by comparing the CNN diagnosis with the diagnoses by dermatologists of different expertise levels, we found that our CNN system is capable of identifying rosacea with a performance superior to that of resident doctors or attending physicians and comparable to that of experienced dermatologists. Conclusions: The findings of our study showed that by assessing clinical images, the CNN system in our study could identify rosacea with accuracy and precision comparable to that of an experienced dermatologist. %M 33720027 %R 10.2196/23415 %U https://medinform.jmir.org/2021/3/e23415 %U https://doi.org/10.2196/23415 %U http://www.ncbi.nlm.nih.gov/pubmed/33720027 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 3 %P e19461 %T Parents’ Perspectives on Using Artificial Intelligence to Reduce Technology Interference During Early Childhood: Cross-sectional Online Survey %A Glassman,Jill %A Humphreys,Kathryn %A Yeung,Serena %A Smith,Michelle %A Jauregui,Adam %A Milstein,Arnold %A Sanders,Lee %+ Clinical Excellence Research Center, School of Medicine, Stanford University, 365 Lasuen Street, #308, Stanford, CA, 94305, United States, 1 8314195302, jill.r.glassman@stanford.edu %K parenting %K digital technology %K mobile phone %K child development %K artificial intelligence %D 2021 %7 15.3.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Parents’ use of mobile technologies may interfere with important parent-child interactions that are critical to healthy child development. This phenomenon is known as technoference. However, little is known about the population-wide awareness of this problem and the acceptability of artificial intelligence (AI)–based tools that help with mitigating technoference. Objective: This study aims to assess parents’ awareness of technoference and its harms, the acceptability of AI tools for mitigating technoference, and how each of these constructs vary across sociodemographic factors. Methods: We administered a web-based survey to a nationally representative sample of parents of children aged ≤5 years. Parents’ perceptions that their own technology use had risen to potentially problematic levels in general, their perceptions of their own parenting technoference, and the degree to which they found AI tools for mitigating technoference acceptable were assessed by using adaptations of previously validated scales. Multiple regression and mediation analyses were used to assess the relationships between these scales and each of the 6 sociodemographic factors (parent age, sex, language, ethnicity, educational attainment, and family income). Results: Of the 305 respondents, 280 provided data that met the established standards for analysis. Parents reported that a mean of 3.03 devices (SD 2.07) interfered daily in their interactions with their child. Almost two-thirds of the parents agreed with the statements “I am worried about the impact of my mobile electronic device use on my child” and “Using a computer-assisted coach while caring for my child would help me notice more quickly when my device use is interfering with my caregiving” (187/281, 66.5% and 184/282, 65.1%, respectively). Younger age, Hispanic ethnicity, and Spanish language spoken at home were associated with increased technoference awareness. Compared to parents’ perceived technoference and sociodemographic factors, parents’ perceptions of their own problematic technology use was the factor that was most associated with the acceptance of AI tools. Conclusions: Parents reported high levels of mobile device use and technoference around their youngest children. Most parents across a wide sociodemographic spectrum, especially younger parents, found the use of AI tools to help mitigate technoference during parent-child daily interaction acceptable and useful. %M 33720026 %R 10.2196/19461 %U https://www.jmir.org/2021/3/e19461 %U https://doi.org/10.2196/19461 %U http://www.ncbi.nlm.nih.gov/pubmed/33720026 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 3 %P e22453 %T Artificial Intelligence–Aided Precision Medicine for COVID-19: Strategic Areas of Research and Development %A Santus,Enrico %A Marino,Nicola %A Cirillo,Davide %A Chersoni,Emmanuele %A Montagud,Arnau %A Santuccione Chadha,Antonella %A Valencia,Alfonso %A Hughes,Kevin %A Lindvall,Charlotta %+ Barcelona Supercomputing Center, c/Jordi Girona, 29, Barcelona, Spain, 34 934137971, davide.cirillo@bsc.es %K COVID-19 %K SARS-CoV-2 %K artificial intelligence %K personalized medicine %K precision medicine %K prevention %K monitoring %K epidemic %K literature %K public health %K pandemic %D 2021 %7 12.3.2021 %9 Viewpoint %J J Med Internet Res %G English %X Artificial intelligence (AI) technologies can play a key role in preventing, detecting, and monitoring epidemics. In this paper, we provide an overview of the recently published literature on the COVID-19 pandemic in four strategic areas: (1) triage, diagnosis, and risk prediction; (2) drug repurposing and development; (3) pharmacogenomics and vaccines; and (4) mining of the medical literature. We highlight how AI-powered health care can enable public health systems to efficiently handle future outbreaks and improve patient outcomes. %M 33560998 %R 10.2196/22453 %U https://www.jmir.org/2021/3/e22453 %U https://doi.org/10.2196/22453 %U http://www.ncbi.nlm.nih.gov/pubmed/33560998 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 3 %P e25704 %T Using Machine Learning Technologies in Pressure Injury Management: Systematic Review %A Jiang,Mengyao %A Ma,Yuxia %A Guo,Siyi %A Jin,Liuqi %A Lv,Lin %A Han,Lin %A An,Ning %+ Department of Nursing, Gansu Provincial Hospital, No 160, Donggang West Road, Chengguan District, Lanzhou, 730000, China, 86 0931 8281971, LZU-hanlin@hotmail.com %K pressure injuries %K pressure ulcer %K pressure sore %K pressure damage %K decubitus ulcer %K decubitus sore %K bedsore %K artificial intelligence %K machine learning %K neural network %K support vector machine %K natural language processing %K Naive Bayes %K bayesian learning %K support vector %K random forest %K boosting %K deep learning %K machine intelligence %K computational intelligence %K computer reasoning %K management %K systematic review %D 2021 %7 10.3.2021 %9 Review %J JMIR Med Inform %G English %X Background: Pressure injury (PI) is a common and preventable problem, yet it is a challenge for at least two reasons. First, the nurse shortage is a worldwide phenomenon. Second, the majority of nurses have insufficient PI-related knowledge. Machine learning (ML) technologies can contribute to lessening the burden on medical staff by improving the prognosis and diagnostic accuracy of PI. To the best of our knowledge, there is no existing systematic review that evaluates how the current ML technologies are being used in PI management. Objective: The objective of this review was to synthesize and evaluate the literature regarding the use of ML technologies in PI management, and identify their strengths and weaknesses, as well as to identify improvement opportunities for future research and practice. Methods: We conducted an extensive search on PubMed, EMBASE, Web of Science, Cumulative Index to Nursing and Allied Health Literature (CINAHL), Cochrane Library, China National Knowledge Infrastructure (CNKI), the Wanfang database, the VIP database, and the China Biomedical Literature Database (CBM) to identify relevant articles. Searches were performed in June 2020. Two independent investigators conducted study selection, data extraction, and quality appraisal. Risk of bias was assessed using the Prediction model Risk Of Bias ASsessment Tool (PROBAST). Results: A total of 32 articles met the inclusion criteria. Twelve of those articles (38%) reported using ML technologies to develop predictive models to identify risk factors, 11 (34%) reported using them in posture detection and recognition, and 9 (28%) reported using them in image analysis for tissue classification and measurement of PI wounds. These articles presented various algorithms and measured outcomes. The overall risk of bias was judged as high. Conclusions: There is an array of emerging ML technologies being used in PI management, and their results in the laboratory show great promise. Future research should apply these technologies on a large scale with clinical data to further verify and improve their effectiveness, as well as to improve the methodological quality. %M 33688846 %R 10.2196/25704 %U https://medinform.jmir.org/2021/3/e25704 %U https://doi.org/10.2196/25704 %U http://www.ncbi.nlm.nih.gov/pubmed/33688846 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 7 %N 1 %P e26911 %T Development and Early Feasibility of Chatbots for Educating Patients With Lung Cancer and Their Caregivers in Japan: Mixed Methods Study %A Kataoka,Yuki %A Takemura,Tomoyasu %A Sasajima,Munehiko %A Katoh,Naoki %+ Department of Respiratory Medicine, Hyogo Prefectural Amagasaki General Medical Center, Higashi-Naniwa-Cho 2-17-77, Amagasaki, 660-8550, Japan, 81 6480 7000, youkiti@gmail.com %K cancer %K caregivers %K chatbot %K lung cancer %K mixed methods approach %K online health %K patients %K symptom management education %K web-based platform %D 2021 %7 10.3.2021 %9 Original Paper %J JMIR Cancer %G English %X Background: Chatbots are artificial intelligence–driven programs that interact with people. The applications of this technology include the collection and delivery of information, generation of and responding to inquiries, collection of end user feedback, and the delivery of personalized health and medical information to patients through cellphone- and web-based platforms. However, no chatbots have been developed for patients with lung cancer and their caregivers. Objective: This study aimed to develop and evaluate the early feasibility of a chatbot designed to improve the knowledge of symptom management among patients with lung cancer in Japan and their caregivers. Methods: We conducted a sequential mixed methods study that included a web-based anonymized questionnaire survey administered to physicians and paramedics from June to July 2019 (phase 1). Two physicians conducted a content analysis of the questionnaire to curate frequently asked questions (FAQs; phase 2). Based on these FAQs, we developed and integrated a chatbot into a social network service (phase 3). The physicians and paramedics involved in phase I then tested this chatbot (α test; phase 4). Thereafter, patients with lung cancer and their caregivers tested this chatbot (β test; phase 5). Results: We obtained 246 questions from 15 health care providers in phase 1. We curated 91 FAQs and their corresponding responses in phase 2. In total, 11 patients and 1 caregiver participated in the β test in phase 5. The participants were asked 60 questions, 8 (13%) of which did not match the appropriate categories. After the β test, 7 (64%) participants responded to the postexperimental questionnaire. The mean satisfaction score was 2.7 (SD 0.5) points out of 5. Conclusions: Medical staff providing care to patients with lung cancer can use the categories specified in this chatbot to educate patients on how they can manage their symptoms. Further studies are required to improve chatbots in terms of interaction with patients. %M 33688839 %R 10.2196/26911 %U https://cancer.jmir.org/2021/1/e26911 %U https://doi.org/10.2196/26911 %U http://www.ncbi.nlm.nih.gov/pubmed/33688839 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 3 %P e21331 %T Detection of Bulbar Involvement in Patients With Amyotrophic Lateral Sclerosis by Machine Learning Voice Analysis: Diagnostic Decision Support Development Study %A Tena,Alberto %A Claria,Francec %A Solsona,Francesc %A Meister,Einar %A Povedano,Monica %+ Department of Computer Science, Universitat de Lleida, Jaume II, 69, Lleida, Spain, 34 973702735, francesc.solsona@udl.cat %K amyotrophic lateral sclerosis %K bulbar involvement %K voice %K diagnosis %K machine learning %D 2021 %7 10.3.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Bulbar involvement is a term used in amyotrophic lateral sclerosis (ALS) that refers to motor neuron impairment in the corticobulbar area of the brainstem, which produces a dysfunction of speech and swallowing. One of the earliest symptoms of bulbar involvement is voice deterioration characterized by grossly defective articulation; extremely slow, laborious speech; marked hypernasality; and severe harshness. Bulbar involvement requires well-timed and carefully coordinated interventions. Therefore, early detection is crucial to improving the quality of life and lengthening the life expectancy of patients with ALS who present with this dysfunction. Recent research efforts have focused on voice analysis to capture bulbar involvement. Objective: The main objective of this paper was (1) to design a methodology for diagnosing bulbar involvement efficiently through the acoustic parameters of uttered vowels in Spanish, and (2) to demonstrate that the performance of the automated diagnosis of bulbar involvement is superior to human diagnosis. Methods: The study focused on the extraction of features from the phonatory subsystem—jitter, shimmer, harmonics-to-noise ratio, and pitch—from the utterance of the five Spanish vowels. Then, we used various supervised classification algorithms, preceded by principal component analysis of the features obtained. Results: To date, support vector machines have performed better (accuracy 95.8%) than the models analyzed in the related work. We also show how the model can improve human diagnosis, which can often misdiagnose bulbar involvement. Conclusions: The results obtained are very encouraging and demonstrate the efficiency and applicability of the automated model presented in this paper. It may be an appropriate tool to help in the diagnosis of ALS by multidisciplinary clinical teams, in particular to improve the diagnosis of bulbar involvement. %M 33688838 %R 10.2196/21331 %U https://medinform.jmir.org/2021/3/e21331 %U https://doi.org/10.2196/21331 %U http://www.ncbi.nlm.nih.gov/pubmed/33688838 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 3 %P e24870 %T Machine Learning for Mental Health in Social Media: Bibliometric Study %A Kim,Jina %A Lee,Daeun %A Park,Eunil %+ Department of Applied Artificial Intelligence, Sungkyunkwan University, 312 International Hall, Sungkyunkwan-ro 25-2, Seoul, 03063, Republic of Korea, 82 2 740 1864, eunilpark@skku.edu %K bibliometric analysis %K machine learning %K mental health %K social media %D 2021 %7 8.3.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Social media platforms provide an easily accessible and time-saving communication approach for individuals with mental disorders compared to face-to-face meetings with medical providers. Recently, machine learning (ML)-based mental health exploration using large-scale social media data has attracted significant attention. Objective: We aimed to provide a bibliometric analysis and discussion on research trends of ML for mental health in social media. Methods: Publications addressing social media and ML in the field of mental health were retrieved from the Scopus and Web of Science databases. We analyzed the publication distribution to measure productivity on sources, countries, institutions, authors, and research subjects, and visualized the trends in this field using a keyword co-occurrence network. The research methodologies of previous studies with high citations are also thoroughly described. Results: We obtained a total of 565 relevant papers published from 2015 to 2020. In the last 5 years, the number of publications has demonstrated continuous growth with Lecture Notes in Computer Science and Journal of Medical Internet Research as the two most productive sources based on Scopus and Web of Science records. In addition, notable methodological approaches with data resources presented in high-ranking publications were investigated. Conclusions: The results of this study highlight continuous growth in this research area. Moreover, we retrieved three main discussion points from a comprehensive overview of highly cited publications that provide new in-depth directions for both researchers and practitioners. %M 33683209 %R 10.2196/24870 %U https://www.jmir.org/2021/3/e24870 %U https://doi.org/10.2196/24870 %U http://www.ncbi.nlm.nih.gov/pubmed/33683209 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 3 %P e22951 %T Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation %A Zhao,Yiqing %A Fu,Sunyang %A Bielinski,Suzette J %A Decker,Paul A %A Chamberlain,Alanna M %A Roger,Veronique L %A Liu,Hongfang %A Larson,Nicholas B %+ Department of Health Sciences Research, Mayo Clinic, 205 3rd Ave SW, Rochester, MN, 55905, United States, 1 507 293 1700, Larson.Nicholas@mayo.edu %K stroke %K natural language processing %K electronic health records %K machine learning %D 2021 %7 8.3.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Stroke is an important clinical outcome in cardiovascular research. However, the ascertainment of incident stroke is typically accomplished via time-consuming manual chart abstraction. Current phenotyping efforts using electronic health records for stroke focus on case ascertainment rather than incident disease, which requires knowledge of the temporal sequence of events. Objective: The aim of this study was to develop a machine learning–based phenotyping algorithm for incident stroke ascertainment based on diagnosis codes, procedure codes, and clinical concepts extracted from clinical notes using natural language processing. Methods: The algorithm was trained and validated using an existing epidemiology cohort consisting of 4914 patients with atrial fibrillation (AF) with manually curated incident stroke events. Various combinations of feature sets and machine learning classifiers were compared. Using a heuristic rule based on the composition of concepts and codes, we further detected the stroke subtype (ischemic stroke/transient ischemic attack or hemorrhagic stroke) of each identified stroke. The algorithm was further validated using a cohort (n=150) stratified sampled from a population in Olmsted County, Minnesota (N=74,314). Results: Among the 4914 patients with AF, 740 had validated incident stroke events. The best-performing stroke phenotyping algorithm used clinical concepts, diagnosis codes, and procedure codes as features in a random forest classifier. Among patients with stroke codes in the general population sample, the best-performing model achieved a positive predictive value of 86% (43/50; 95% CI 0.74-0.93) and a negative predictive value of 96% (96/100). For subtype identification, we achieved an accuracy of 83% in the AF cohort and 80% in the general population sample. Conclusions: We developed and validated a machine learning–based algorithm that performed well for identifying incident stroke and for determining type of stroke. The algorithm also performed well on a sample from a general population, further demonstrating its generalizability and potential for adoption by other institutions. %M 33683212 %R 10.2196/22951 %U https://www.jmir.org/2021/3/e22951 %U https://doi.org/10.2196/22951 %U http://www.ncbi.nlm.nih.gov/pubmed/33683212 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 3 %P e25121 %T Predictive Modeling of 30-Day Emergency Hospital Transport of German Patients Using a Personal Emergency Response: Retrospective Study and Comparison with the United States %A op den Buijs,Jorn %A Pijl,Marten %A Landgraf,Andreas %+ Philips Research, High Tech Campus 34, Eindhoven, 5656 AE, Netherlands, 31 631926890, jorn.op.den.buijs@philips.com %K emergency hospital transport %K predictive modeling %K personal emergency response system %K population health management %K emergency transport %K emergency response system %K emergency response %K health management %D 2021 %7 8.3.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Predictive analytics based on data from remote monitoring of elderly via a personal emergency response system (PERS) in the United States can identify subscribers at high risk for emergency hospital transport. These risk predictions can subsequently be used to proactively target interventions and prevent avoidable, costly health care use. It is, however, unknown if PERS-based risk prediction with targeted interventions could also be applied in the German health care setting. Objective: The objectives were to develop and validate a predictive model of 30-day emergency hospital transport based on data from a German PERS provider and compare the model with our previously published predictive model developed on data from a US PERS provider. Methods: Retrospective data of 5805 subscribers to a German PERS service were used to develop and validate an extreme gradient boosting predictive model of 30-day hospital transport, including predictors derived from subscriber demographics, self-reported medical conditions, and a 2-year history of case data. Models were trained on 80% (4644/5805) of the data, and performance was evaluated on an independent test set of 20% (1161/5805). Results were compared with our previously published prediction model developed on a data set of PERS users in the United States. Results: German PERS subscribers were on average aged 83.6 years, with 64.0% (743/1161) females, with 65.4% (759/1161) reported 3 or more chronic conditions. A total of 1.4% (350/24,847) of subscribers had one or more emergency transports in 30 days in the test set, which was significantly lower compared with the US data set (2455/109,966, 2.2%). Performance of the predictive model of emergency hospital transport, as evaluated by area under the receiver operator characteristic curve (AUC), was 0.749 (95% CI 0.721-0.777), which was similar to the US prediction model (AUC=0.778 [95% CI 0.769-0.788]). The top 1% (12/1161) of predicted high-risk patients were 10.7 times more likely to experience an emergency hospital transport in 30 days than the overall German PERS population. This lift was comparable to a model lift of 11.9 obtained by the US predictive model. Conclusions: Despite differences in emergency care use, PERS-based collected subscriber data can be used to predict use outcomes in different international settings. These predictive analytic tools can be used by health care organizations to extend population health management into the home by identifying and delivering timelier targeted interventions to high-risk patients. This could lead to overall improved patient experience, higher quality of care, and more efficient resource use. %M 33682679 %R 10.2196/25121 %U https://medinform.jmir.org/2021/3/e25121 %U https://doi.org/10.2196/25121 %U http://www.ncbi.nlm.nih.gov/pubmed/33682679 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 3 %P e26646 %T Future Medical Artificial Intelligence Application Requirements and Expectations of Physicians in German University Hospitals: Web-Based Survey %A Maassen,Oliver %A Fritsch,Sebastian %A Palm,Julia %A Deffge,Saskia %A Kunze,Julian %A Marx,Gernot %A Riedel,Morris %A Schuppert,Andreas %A Bickenbach,Johannes %+ Department of Intensive Care Medicine, University Hospital RWTH Aachen, Pauwelsstraße 30, Aachen, 52074, Germany, 49 2418080444, oliver.maassen@rwth-aachen.de %K artificial intelligence %K AI %K machine learning %K algorithms %K clinical decision support %K physician %K requirement %K expectation %K hospital care %D 2021 %7 5.3.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: The increasing development of artificial intelligence (AI) systems in medicine driven by researchers and entrepreneurs goes along with enormous expectations for medical care advancement. AI might change the clinical practice of physicians from almost all medical disciplines and in most areas of health care. While expectations for AI in medicine are high, practical implementations of AI for clinical practice are still scarce in Germany. Moreover, physicians’ requirements and expectations of AI in medicine and their opinion on the usage of anonymized patient data for clinical and biomedical research have not been investigated widely in German university hospitals. Objective: This study aimed to evaluate physicians’ requirements and expectations of AI in medicine and their opinion on the secondary usage of patient data for (bio)medical research (eg, for the development of machine learning algorithms) in university hospitals in Germany. Methods: A web-based survey was conducted addressing physicians of all medical disciplines in 8 German university hospitals. Answers were given using Likert scales and general demographic responses. Physicians were asked to participate locally via email in the respective hospitals. Results: The online survey was completed by 303 physicians (female: 121/303, 39.9%; male: 173/303, 57.1%; no response: 9/303, 3.0%) from a wide range of medical disciplines and work experience levels. Most respondents either had a positive (130/303, 42.9%) or a very positive attitude (82/303, 27.1%) towards AI in medicine. There was a significant association between the personal rating of AI in medicine and the self-reported technical affinity level (H4=48.3, P<.001). A vast majority of physicians expected the future of medicine to be a mix of human and artificial intelligence (273/303, 90.1%) but also requested a scientific evaluation before the routine implementation of AI-based systems (276/303, 91.1%). Physicians were most optimistic that AI applications would identify drug interactions (280/303, 92.4%) to improve patient care substantially but were quite reserved regarding AI-supported diagnosis of psychiatric diseases (62/303, 20.5%). Of the respondents, 82.5% (250/303) agreed that there should be open access to anonymized patient databases for medical and biomedical research. Conclusions: Physicians in stationary patient care in German university hospitals show a generally positive attitude towards using most AI applications in medicine. Along with this optimism comes several expectations and hopes that AI will assist physicians in clinical decision making. Especially in fields of medicine where huge amounts of data are processed (eg, imaging procedures in radiology and pathology) or data are collected continuously (eg, cardiology and intensive care medicine), physicians’ expectations of AI to substantially improve future patient care are high. In the study, the greatest potential was seen in the application of AI for the identification of drug interactions, assumedly due to the rising complexity of drug administration to polymorbid, polypharmacy patients. However, for the practical usage of AI in health care, regulatory and organizational challenges still have to be mastered. %M 33666563 %R 10.2196/26646 %U https://www.jmir.org/2021/3/e26646 %U https://doi.org/10.2196/26646 %U http://www.ncbi.nlm.nih.gov/pubmed/33666563 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 3 %P e23483 %T Artificial Intelligence Techniques That May Be Applied to Primary Care Data to Facilitate Earlier Diagnosis of Cancer: Systematic Review %A Jones,Owain T %A Calanzani,Natalia %A Saji,Smiji %A Duffy,Stephen W %A Emery,Jon %A Hamilton,Willie %A Singh,Hardeep %A de Wit,Niek J %A Walter,Fiona M %+ Primary Care Unit, Department of Public Health & Primary Care, University of Cambridge, 2 Wort's Causeway, Cambridge, CB1 8RN, United Kingdom, 44 1223762554, otj24@medschl.cam.ac.uk %K artificial intelligence %K machine learning %K electronic health records %K primary health care %K early detection of cancer %D 2021 %7 3.3.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: More than 17 million people worldwide, including 360,000 people in the United Kingdom, were diagnosed with cancer in 2018. Cancer prognosis and disease burden are highly dependent on the disease stage at diagnosis. Most people diagnosed with cancer first present in primary care settings, where improved assessment of the (often vague) presenting symptoms of cancer could lead to earlier detection and improved outcomes for patients. There is accumulating evidence that artificial intelligence (AI) can assist clinicians in making better clinical decisions in some areas of health care. Objective: This study aimed to systematically review AI techniques that may facilitate earlier diagnosis of cancer and could be applied to primary care electronic health record (EHR) data. The quality of the evidence, the phase of development the AI techniques have reached, the gaps that exist in the evidence, and the potential for use in primary care were evaluated. Methods: We searched MEDLINE, Embase, SCOPUS, and Web of Science databases from January 01, 2000, to June 11, 2019, and included all studies providing evidence for the accuracy or effectiveness of applying AI techniques for the early detection of cancer, which may be applicable to primary care EHRs. We included all study designs in all settings and languages. These searches were extended through a scoping review of AI-based commercial technologies. The main outcomes assessed were measures of diagnostic accuracy for cancer. Results: We identified 10,456 studies; 16 studies met the inclusion criteria, representing the data of 3,862,910 patients. A total of 13 studies described the initial development and testing of AI algorithms, and 3 studies described the validation of an AI algorithm in independent data sets. One study was based on prospectively collected data; only 3 studies were based on primary care data. We found no data on implementation barriers or cost-effectiveness. Risk of bias assessment highlighted a wide range of study quality. The additional scoping review of commercial AI technologies identified 21 technologies, only 1 meeting our inclusion criteria. Meta-analysis was not undertaken because of the heterogeneity of AI modalities, data set characteristics, and outcome measures. Conclusions: AI techniques have been applied to EHR-type data to facilitate early diagnosis of cancer, but their use in primary care settings is still at an early stage of maturity. Further evidence is needed on their performance using primary care data, implementation barriers, and cost-effectiveness before widespread adoption into routine primary care clinical practice can be recommended. %M 33656443 %R 10.2196/23483 %U https://www.jmir.org/2021/3/e23483 %U https://doi.org/10.2196/23483 %U http://www.ncbi.nlm.nih.gov/pubmed/33656443 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 3 %P e18607 %T A Chatbot for Perinatal Women’s and Partners’ Obstetric and Mental Health Care: Development and Usability Evaluation Study %A Chung,Kyungmi %A Cho,Hee Young %A Park,Jin Young %+ Department of Psychiatry, Yonsei University College of Medicine, Yongin Severance Hospital, Yonsei University Health System, 363, Dongbaekjukjeon-daero, Giheung-gu, Yongin-si, Republic of Korea, 82 31 5189 8148, empathy@yuhs.ac %K chatbot %K mobile phone %K instant messaging %K mobile health %K perinatal care %K usability %K user experience %K usability testing %D 2021 %7 3.3.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: To motivate people to adopt medical chatbots, the establishment of a specialized medical knowledge database that fits their personal interests is of great importance in developing a chatbot for perinatal care, particularly with the help of health professionals. Objective: The objectives of this study are to develop and evaluate a user-friendly question-and-answer (Q&A) knowledge database–based chatbot (Dr. Joy) for perinatal women’s and their partners’ obstetric and mental health care by applying a text-mining technique and implementing contextual usability testing (UT), respectively, thus determining whether this medical chatbot built on mobile instant messenger (KakaoTalk) can provide its male and female users with good user experience. Methods: Two men aged 38 and 40 years and 13 women aged 27 to 43 years in pregnancy preparation or different pregnancy stages were enrolled. All participants completed the 7-day-long UT, during which they were given the daily tasks of asking Dr. Joy at least 3 questions at any time and place and then giving the chatbot either positive or negative feedback with emoji, using at least one feature of the chatbot, and finally, sending a facilitator all screenshots for the history of the day’s use via KakaoTalk before midnight. One day after the UT completion, all participants were asked to fill out a questionnaire on the evaluation of usability, perceived benefits and risks, intention to seek and share health information on the chatbot, and strengths and weaknesses of its use, as well as demographic characteristics. Results: Despite the relatively higher score of ease of learning (EOL), the results of the Spearman correlation indicated that EOL was not significantly associated with usefulness (ρ=0.26; P=.36), ease of use (ρ=0.19; P=.51), satisfaction (ρ=0.21; P=.46), or total usability scores (ρ=0.32; P=.24). Unlike EOL, all 3 subfactors and the total usability had significant positive associations with each other (all ρ>0.80; P<.001). Furthermore, perceived risks exhibited no significant negative associations with perceived benefits (ρ=−0.29; P=.30) or intention to seek (SEE; ρ=−0.28; P=.32) or share (SHA; ρ=−0.24; P=.40) health information on the chatbot via KakaoTalk, whereas perceived benefits exhibited significant positive associations with both SEE and SHA. Perceived benefits were more strongly associated with SEE (ρ=0.94; P<.001) than with SHA (ρ=0.70; P=.004). Conclusions: This study provides the potential for the uptake of this newly developed Q&A knowledge database–based KakaoTalk chatbot for obstetric and mental health care. As Dr. Joy had quality contents with both utilitarian and hedonic value, its male and female users could be encouraged to use medical chatbots in a convenient, easy-to-use, and enjoyable manner. To boost their continued usage intention for Dr. Joy, its Q&A sets need to be periodically updated to satisfy user intent by monitoring both male and female user utterances. %M 33656442 %R 10.2196/18607 %U https://medinform.jmir.org/2021/3/e18607 %U https://doi.org/10.2196/18607 %U http://www.ncbi.nlm.nih.gov/pubmed/33656442 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 3 %P e26997 %T Preferences for Artificial Intelligence Clinicians Before and During the COVID-19 Pandemic: Discrete Choice Experiment and Propensity Score Matching Study %A Liu,Taoran %A Tsang,Winghei %A Xie,Yifei %A Tian,Kang %A Huang,Fengqiu %A Chen,Yanhui %A Lau,Oiying %A Feng,Guanrui %A Du,Jianhao %A Chu,Bojia %A Shi,Tingyu %A Zhao,Junjie %A Cai,Yiming %A Hu,Xueyan %A Akinwunmi,Babatunde %A Huang,Jian %A Zhang,Casper J P %A Ming,Wai-Kit %+ Department of Public Health and Preventive Medicine, School of Medicine, Jinan University, 601 Huangpu W Ave, Tianhe District, Guangzhou, 510632, China, 86 85228852, wkming@connect.hku.hk %K propensity score matching %K discrete latent traits %K patients’ preferences %K artificial intelligence %K COVID-19 %K preference %K discrete choice %K choice %K traditional medicine %K public health %K resource %K patient %K diagnosis %K accuracy %D 2021 %7 2.3.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) methods can potentially be used to relieve the pressure that the COVID-19 pandemic has exerted on public health. In cases of medical resource shortages caused by the pandemic, changes in people’s preferences for AI clinicians and traditional clinicians are worth exploring. Objective: We aimed to quantify and compare people’s preferences for AI clinicians and traditional clinicians before and during the COVID-19 pandemic, and to assess whether people’s preferences were affected by the pressure of pandemic. Methods: We used the propensity score matching method to match two different groups of respondents with similar demographic characteristics. Respondents were recruited in 2017 and 2020. A total of 2048 respondents (2017: n=1520; 2020: n=528) completed the questionnaire and were included in the analysis. Multinomial logit models and latent class models were used to assess people’s preferences for different diagnosis methods. Results: In total, 84.7% (1115/1317) of respondents in the 2017 group and 91.3% (482/528) of respondents in the 2020 group were confident that AI diagnosis methods would outperform human clinician diagnosis methods in the future. Both groups of matched respondents believed that the most important attribute of diagnosis was accuracy, and they preferred to receive combined diagnoses from both AI and human clinicians (2017: odds ratio [OR] 1.645, 95% CI 1.535-1.763; P<.001; 2020: OR 1.513, 95% CI 1.413-1.621; P<.001; reference: clinician diagnoses). The latent class model identified three classes with different attribute priorities. In class 1, preferences for combined diagnoses and accuracy remained constant in 2017 and 2020, and high accuracy (eg, 100% accuracy in 2017: OR 1.357, 95% CI 1.164-1.581) was preferred. In class 2, the matched data from 2017 were similar to those from 2020; combined diagnoses from both AI and human clinicians (2017: OR 1.204, 95% CI 1.039-1.394; P=.011; 2020: OR 2.009, 95% CI 1.826-2.211; P<.001; reference: clinician diagnoses) and an outpatient waiting time of 20 minutes (2017: OR 1.349, 95% CI 1.065-1.708; P<.001; 2020: OR 1.488, 95% CI 1.287-1.721; P<.001; reference: 0 minutes) were consistently preferred. In class 3, the respondents in the 2017 and 2020 groups preferred different diagnosis methods; respondents in the 2017 group preferred clinician diagnoses, whereas respondents in the 2020 group preferred AI diagnoses. In the latent class, which was stratified according to sex, all male and female respondents in the 2017 and 2020 groups believed that accuracy was the most important attribute of diagnosis. Conclusions: Individuals’ preferences for receiving clinical diagnoses from AI and human clinicians were generally unaffected by the pandemic. Respondents believed that accuracy and expense were the most important attributes of diagnosis. These findings can be used to guide policies that are relevant to the development of AI-based health care. %M 33556034 %R 10.2196/26997 %U https://www.jmir.org/2021/3/e26997 %U https://doi.org/10.2196/26997 %U http://www.ncbi.nlm.nih.gov/pubmed/33556034 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 3 %P e24134 %T A Novel Mobile App (Heali) for Disease Treatment in Participants With Irritable Bowel Syndrome: Randomized Controlled Pilot Trial %A Rafferty,Aaron J %A Hall,Rick %A Johnston,Carol S %+ College of Health Solutions, Arizona State University, HLTHN 532 Phoenix Downtown Campus, Phoenix, AZ, 85004, United States, 1 602 496 2539, Carol.johnston@asu.edu %K irritable bowel syndrome %K artificial intelligence %K mobile app %K low FODMAP diet %K randomized controlled trial %D 2021 %7 2.3.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: A diet high in fermentable, oligo-, di-, monosaccharides and polyols (FODMAPs) has been shown to exacerbate symptoms of irritable bowel syndrome (IBS). Previous literature reports significant improvement in IBS symptoms with initiation of a low FODMAP diet (LFD) and monitored reintroduction. However, dietary adherence to the LFD is difficult, with patients stating that the information given by health care providers is often generalized and nonspecific, requiring them to search for supplementary information to fit their needs. Objective: The aim of our study was to determine whether Heali, a novel artificial intelligence dietary mobile app can improve adherence to the LFD, IBS symptom severity, and quality of life outcomes in adults with IBS or IBS-like symptoms over a 4-week period. Methods: Participants were randomized into 2 groups: the control group (CON), in which participants received educational materials, and the experimental group (APP), in which participants received access to the mobile app and educational materials. Over the course of this unblinded online trial, all participants completed a battery of 5 questionnaires at baseline and at the end of the trial to document IBS symptoms, quality of life, LFD knowledge, and LFD adherence. Results: We enrolled 58 participants in the study (29 in each group), and 25 participants completed the study in its entirety (11 and 14 for the CON and APP groups, respectively). Final, per-protocol analyses showed greater improvement in quality of life score for the APP group compared to the CON group (31.1 and 11.8, respectively; P=.04). Reduction in total IBS symptom severity score was 24% greater for the APP group versus the CON group. Although this did not achieve significance (–170 vs –138 respectively; P=.37), the reduction in the subscore for bowel habit dissatisfaction was 2-fold greater for the APP group than for the CON group (P=.05). Conclusions: This initial study provides preliminary evidence that Heali may provide therapeutic benefit to its users, specifically improvements in quality of life and bowel habits. Although this study was underpowered, findings from this study warrant further research in a larger sample of participants to test the efficacy of Heali app use to improve outcomes for patients with IBS. Trial Registration: ClinicalTrials.gov NCT04256551; https://clinicaltrials.gov/ct2/show/NCT04256551 %M 33650977 %R 10.2196/24134 %U https://www.jmir.org/2021/3/e24134 %U https://doi.org/10.2196/24134 %U http://www.ncbi.nlm.nih.gov/pubmed/33650977 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 3 %P e25635 %T Machine Learning Approach to Predict the Probability of Recurrence of Renal Cell Carcinoma After Surgery: Prediction Model Development Study %A Kim,HyungMin %A Lee,Sun Jung %A Park,So Jin %A Choi,In Young %A Hong,Sung-Hoo %+ Department of Urology, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University, 222, Banpo-daero, Seocho-gu, Seoul, Republic of Korea, 82 2 2258 6228, toomey@catholic.ac.kr %K renal cell carcinoma %K recurrence %K machine learning %K naïve Bayes %K algorithm %K cancer %K surgery %K web-based %K database %K prediction %K probability %K carcinoma %K kidney %K model %K development %D 2021 %7 1.3.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Renal cell carcinoma (RCC) has a high recurrence rate of 20% to 30% after nephrectomy for clinically localized disease, and more than 40% of patients eventually die of the disease, making regular monitoring and constant management of utmost importance. Objective: The objective of this study was to develop an algorithm that predicts the probability of recurrence of RCC within 5 and 10 years of surgery. Methods: Data from 6849 Korean patients with RCC were collected from eight tertiary care hospitals listed in the KOrean Renal Cell Carcinoma (KORCC) web-based database. To predict RCC recurrence, analytical data from 2814 patients were extracted from the database. Eight machine learning algorithms were used to predict the probability of RCC recurrence, and the results were compared. Results: Within 5 years of surgery, the highest area under the receiver operating characteristic curve (AUROC) was obtained from the naïve Bayes (NB) model, with a value of 0.836. Within 10 years of surgery, the highest AUROC was obtained from the NB model, with a value of 0.784. Conclusions: An algorithm was developed that predicts the probability of RCC recurrence within 5 and 10 years using the KORCC database, a large-scale RCC cohort in Korea. It is expected that the developed algorithm will help clinicians manage prognosis and establish customized treatment strategies for patients with RCC after surgery. %M 33646127 %R 10.2196/25635 %U https://medinform.jmir.org/2021/3/e25635 %U https://doi.org/10.2196/25635 %U http://www.ncbi.nlm.nih.gov/pubmed/33646127 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 2 %P e23458 %T Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study %A Ikemura,Kenji %A Bellin,Eran %A Yagi,Yukako %A Billett,Henny %A Saada,Mahmoud %A Simone,Katelyn %A Stahl,Lindsay %A Szymanski,James %A Goldstein,D Y %A Reyes Gil,Morayma %+ Department of Pathology, Albert Einstein College of Medicine, Montefiore Medical Center, 111 E 210th St, The Bronx, NY, 10467, United States, 1 9493703777, kikemura@montefiore.org %K automated machine learning %K COVID-19 %K biomarker %K ranking %K decision support tool %K machine learning %K decision support %K Shapley additive explanation %K partial dependence plot %K dimensionality reduction %D 2021 %7 26.2.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: During a pandemic, it is important for clinicians to stratify patients and decide who receives limited medical resources. Machine learning models have been proposed to accurately predict COVID-19 disease severity. Previous studies have typically tested only one machine learning algorithm and limited performance evaluation to area under the curve analysis. To obtain the best results possible, it may be important to test different machine learning algorithms to find the best prediction model. Objective: In this study, we aimed to use automated machine learning (autoML) to train various machine learning algorithms. We selected the model that best predicted patients’ chances of surviving a SARS-CoV-2 infection. In addition, we identified which variables (ie, vital signs, biomarkers, comorbidities, etc) were the most influential in generating an accurate model. Methods: Data were retrospectively collected from all patients who tested positive for COVID-19 at our institution between March 1 and July 3, 2020. We collected 48 variables from each patient within 36 hours before or after the index time (ie, real-time polymerase chain reaction positivity). Patients were followed for 30 days or until death. Patients’ data were used to build 20 machine learning models with various algorithms via autoML. The performance of machine learning models was measured by analyzing the area under the precision-recall curve (AUPCR). Subsequently, we established model interpretability via Shapley additive explanation and partial dependence plots to identify and rank variables that drove model predictions. Afterward, we conducted dimensionality reduction to extract the 10 most influential variables. AutoML models were retrained by only using these 10 variables, and the output models were evaluated against the model that used 48 variables. Results: Data from 4313 patients were used to develop the models. The best model that was generated by using autoML and 48 variables was the stacked ensemble model (AUPRC=0.807). The two best independent models were the gradient boost machine and extreme gradient boost models, which had an AUPRC of 0.803 and 0.793, respectively. The deep learning model (AUPRC=0.73) was substantially inferior to the other models. The 10 most influential variables for generating high-performing models were systolic and diastolic blood pressure, age, pulse oximetry level, blood urea nitrogen level, lactate dehydrogenase level, D-dimer level, troponin level, respiratory rate, and Charlson comorbidity score. After the autoML models were retrained with these 10 variables, the stacked ensemble model still had the best performance (AUPRC=0.791). Conclusions: We used autoML to develop high-performing models that predicted the survival of patients with COVID-19. In addition, we identified important variables that correlated with mortality. This is proof of concept that autoML is an efficient, effective, and informative method for generating machine learning–based clinical decision support tools. %M 33539308 %R 10.2196/23458 %U https://www.jmir.org/2021/2/e23458 %U https://doi.org/10.2196/23458 %U http://www.ncbi.nlm.nih.gov/pubmed/33539308 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 2 %P e22976 %T Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory %A Rosado,Eduardo %A Garcia-Remesal,Miguel %A Paraiso-Medina,Sergio %A Pazos,Alejandro %A Maojo,Victor %+ Biomedical Informatics Group, School of Computer Science, Universidad Politecnica de Madrid, Campus de Montegancedo, s/n, Madrid, 28660, Spain, 34 699059254, vmaojo@fi.upm.es %K biomedical databases %K natural language processing %K deep learning %K internet %K biomedical knowledge %D 2021 %7 25.2.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Currently, existing biomedical literature repositories do not commonly provide users with specific means to locate and remotely access biomedical databases. Objective: To address this issue, we developed the Biomedical Database Inventory (BiDI), a repository linking to biomedical databases automatically extracted from the scientific literature. BiDI provides an index of data resources and a path to access them seamlessly. Methods: We designed an ensemble of deep learning methods to extract database mentions. To train the system, we annotated a set of 1242 articles that included mentions of database publications. Such a data set was used along with transfer learning techniques to train an ensemble of deep learning natural language processing models targeted at database publication detection. Results: The system obtained an F1 score of 0.929 on database detection, showing high precision and recall values. When applying this model to the PubMed and PubMed Central databases, we identified over 10,000 unique databases. The ensemble model also extracted the weblinks to the reported databases and discarded irrelevant links. For the extraction of weblinks, the model achieved a cross-validated F1 score of 0.908. We show two use cases: one related to “omics” and the other related to the COVID-19 pandemic. Conclusions: BiDI enables access to biomedical resources over the internet and facilitates data-driven research and other scientific initiatives. The repository is openly available online and will be regularly updated with an automatic text processing pipeline. The approach can be reused to create repositories of different types (ie, biomedical and others). %M 33629960 %R 10.2196/22976 %U https://medinform.jmir.org/2021/2/e22976 %U https://doi.org/10.2196/22976 %U http://www.ncbi.nlm.nih.gov/pubmed/33629960 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 2 %P e20298 %T A Risk Prediction Model Based on Machine Learning for Cognitive Impairment Among Chinese Community-Dwelling Elderly People With Normal Cognition: Development and Validation Study %A Hu,Mingyue %A Shu,Xinhui %A Yu,Gang %A Wu,Xinyin %A Välimäki,Maritta %A Feng,Hui %+ Xiangya Nursing School, Central South University, Yuelu District, 172 Tongzipo Road, Changsha , China, 86 15173121969, feng.hui@csu.edu.cn %K prediction model %K cognitive impairment %K machine learning %K nomogram %D 2021 %7 24.2.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Identifying cognitive impairment early enough could support timely intervention that may hinder or delay the trajectory of cognitive impairment, thus increasing the chances for successful cognitive aging. Objective: We aimed to build a prediction model based on machine learning for cognitive impairment among Chinese community-dwelling elderly people with normal cognition. Methods: A prospective cohort of 6718 older people from the Chinese Longitudinal Healthy Longevity Survey (CLHLS) register, followed between 2008 and 2011, was used to develop and validate the prediction model. Participants were included if they were aged 60 years or above, were community-dwelling elderly people, and had a cognitive Mini-Mental State Examination (MMSE) score ≥18. They were excluded if they were diagnosed with a severe disease (eg, cancer and dementia) or were living in institutions. Cognitive impairment was identified using the Chinese version of the MMSE. Several machine learning algorithms (random forest, XGBoost, naïve Bayes, and logistic regression) were used to assess the 3-year risk of developing cognitive impairment. Optimal cutoffs and adjusted parameters were explored in validation data, and the model was further evaluated in test data. A nomogram was established to vividly present the prediction model. Results: The mean age of the participants was 80.4 years (SD 10.3 years), and 50.85% (3416/6718) were female. During a 3-year follow-up, 991 (14.8%) participants were identified with cognitive impairment. Among 45 features, the following four features were finally selected to develop the model: age, instrumental activities of daily living, marital status, and baseline cognitive function. The concordance index of the model constructed by logistic regression was 0.814 (95% CI 0.781-0.846). Older people with normal cognitive functioning having a nomogram score of less than 170 were considered to have a low 3-year risk of cognitive impairment, and those with a score of 170 or greater were considered to have a high 3-year risk of cognitive impairment. Conclusions: This simple and feasible cognitive impairment prediction model could identify community-dwelling elderly people at the greatest 3-year risk for cognitive impairment, which could help community nurses in the early identification of dementia. %M 33625369 %R 10.2196/20298 %U https://www.jmir.org/2021/2/e20298 %U https://doi.org/10.2196/20298 %U http://www.ncbi.nlm.nih.gov/pubmed/33625369 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 2 %P e22841 %T Patients’ Preferences for Artificial Intelligence Applications Versus Clinicians in Disease Diagnosis During the SARS-CoV-2 Pandemic in China: Discrete Choice Experiment %A Liu,Taoran %A Tsang,Winghei %A Huang,Fengqiu %A Lau,Oi Ying %A Chen,Yanhui %A Sheng,Jie %A Guo,Yiwei %A Akinwunmi,Babatunde %A Zhang,Casper JP %A Ming,Wai-Kit %+ Department of Public Health and Preventive Medicine, School of Medicine, Jinan University, West Huangpu Road 601, Guangzhou, 510000, China, 86 14715485116, wkming@connect.hku.hk %K discrete choice experiment %K artificial intelligence %K patient preference %K multinomial logit analysis %K questionnaire %K latent-class conditional logit %K app %K human clinicians %K diagnosis %K COVID-19 %K China %D 2021 %7 23.2.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Misdiagnosis, arbitrary charges, annoying queues, and clinic waiting times among others are long-standing phenomena in the medical industry across the world. These factors can contribute to patient anxiety about misdiagnosis by clinicians. However, with the increasing growth in use of big data in biomedical and health care communities, the performance of artificial intelligence (Al) techniques of diagnosis is improving and can help avoid medical practice errors, including under the current circumstance of COVID-19. Objective: This study aims to visualize and measure patients’ heterogeneous preferences from various angles of AI diagnosis versus clinicians in the context of the COVID-19 epidemic in China. We also aim to illustrate the different decision-making factors of the latent class of a discrete choice experiment (DCE) and prospects for the application of AI techniques in judgment and management during the pandemic of SARS-CoV-2 and in the future. Methods: A DCE approach was the main analysis method applied in this paper. Attributes from different dimensions were hypothesized: diagnostic method, outpatient waiting time, diagnosis time, accuracy, follow-up after diagnosis, and diagnostic expense. After that, a questionnaire is formed. With collected data from the DCE questionnaire, we apply Sawtooth software to construct a generalized multinomial logit (GMNL) model, mixed logit model, and latent class model with the data sets. Moreover, we calculate the variables’ coefficients, standard error, P value, and odds ratio (OR) and form a utility report to present the importance and weighted percentage of attributes. Results: A total of 55.8% of the respondents (428 out of 767) opted for AI diagnosis regardless of the description of the clinicians. In the GMNL model, we found that people prefer the 100% accuracy level the most (OR 4.548, 95% CI 4.048-5.110, P<.001). For the latent class model, the most acceptable model consists of 3 latent classes of respondents. The attributes with the most substantial effects and highest percentage weights are the accuracy (39.29% in general) and expense of diagnosis (21.69% in general), especially the preferences for the diagnosis “accuracy” attribute, which is constant across classes. For class 1 and class 3, people prefer the AI + clinicians method (class 1: OR 1.247, 95% CI 1.036-1.463, P<.001; class 3: OR 1.958, 95% CI 1.769-2.167, P<.001). For class 2, people prefer the AI method (OR 1.546, 95% CI 0.883-2.707, P=.37). The OR of levels of attributes increases with the increase of accuracy across all classes. Conclusions: Latent class analysis was prominent and useful in quantifying preferences for attributes of diagnosis choice. People’s preferences for the “accuracy” and “diagnostic expenses” attributes are palpable. AI will have a potential market. However, accuracy and diagnosis expenses need to be taken into consideration. %M 33493130 %R 10.2196/22841 %U https://www.jmir.org/2021/2/e22841 %U https://doi.org/10.2196/22841 %U http://www.ncbi.nlm.nih.gov/pubmed/33493130 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 2 %P e23026 %T Learning From Past Respiratory Infections to Predict COVID-19 Outcomes: Retrospective Study %A Sang,Shengtian %A Sun,Ran %A Coquet,Jean %A Carmichael,Harris %A Seto,Tina %A Hernandez-Boussard,Tina %+ Department of Medicine, Biomedical Informatics, Stanford University, 1265 Welch Rd, 245, Stanford, CA, 94305-5479, United States, 1 650 725 5507, boussard@stanford.edu %K COVID-19 %K invasive mechanical ventilation %K all-cause mortality %K machine learning %K artificial intelligence %K respiratory %K infection %K outcome %K data %K feasibility %K framework %D 2021 %7 22.2.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: For the clinical care of patients with well-established diseases, randomized trials, literature, and research are supplemented with clinical judgment to understand disease prognosis and inform treatment choices. In the void created by a lack of clinical experience with COVID-19, artificial intelligence (AI) may be an important tool to bolster clinical judgment and decision making. However, a lack of clinical data restricts the design and development of such AI tools, particularly in preparation for an impending crisis or pandemic. Objective: This study aimed to develop and test the feasibility of a “patients-like-me” framework to predict the deterioration of patients with COVID-19 using a retrospective cohort of patients with similar respiratory diseases. Methods: Our framework used COVID-19–like cohorts to design and train AI models that were then validated on the COVID-19 population. The COVID-19–like cohorts included patients diagnosed with bacterial pneumonia, viral pneumonia, unspecified pneumonia, influenza, and acute respiratory distress syndrome (ARDS) at an academic medical center from 2008 to 2019. In total, 15 training cohorts were created using different combinations of the COVID-19–like cohorts with the ARDS cohort for exploratory purposes. In this study, two machine learning models were developed: one to predict invasive mechanical ventilation (IMV) within 48 hours for each hospitalized day, and one to predict all-cause mortality at the time of admission. Model performance was assessed using the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, positive predictive value, and negative predictive value. We established model interpretability by calculating SHapley Additive exPlanations (SHAP) scores to identify important features. Results: Compared to the COVID-19–like cohorts (n=16,509), the patients hospitalized with COVID-19 (n=159) were significantly younger, with a higher proportion of patients of Hispanic ethnicity, a lower proportion of patients with smoking history, and fewer patients with comorbidities (P<.001). Patients with COVID-19 had a lower IMV rate (15.1 versus 23.2, P=.02) and shorter time to IMV (2.9 versus 4.1 days, P<.001) compared to the COVID-19–like patients. In the COVID-19–like training data, the top models achieved excellent performance (AUROC>0.90). Validating in the COVID-19 cohort, the top-performing model for predicting IMV was the XGBoost model (AUROC=0.826) trained on the viral pneumonia cohort. Similarly, the XGBoost model trained on all 4 COVID-19–like cohorts without ARDS achieved the best performance (AUROC=0.928) in predicting mortality. Important predictors included demographic information (age), vital signs (oxygen saturation), and laboratory values (white blood cell count, cardiac troponin, albumin, etc). Our models had class imbalance, which resulted in high negative predictive values and low positive predictive values. Conclusions: We provided a feasible framework for modeling patient deterioration using existing data and AI technology to address data limitations during the onset of a novel, rapidly changing pandemic. %M 33534724 %R 10.2196/23026 %U https://www.jmir.org/2021/2/e23026 %U https://doi.org/10.2196/23026 %U http://www.ncbi.nlm.nih.gov/pubmed/33534724 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 2 %P e21037 %T Automated Computer Vision Assessment of Hypomimia in Parkinson Disease: Proof-of-Principle Pilot Study %A Abrami,Avner %A Gunzler,Steven %A Kilbane,Camilla %A Ostrand,Rachel %A Ho,Bryan %A Cecchi,Guillermo %+ IBM Research – Computational Biology Center, 1101 Kitchawan Rd, Yorktown Heights, NY, 10598, United States, 1 1 914 945 1815, gcecchi@us.ibm.com %K Parkinson disease %K hypomimia %K computer vision %K telemedicine %D 2021 %7 22.2.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Facial expressions require the complex coordination of 43 different facial muscles. Parkinson disease (PD) affects facial musculature leading to “hypomimia” or “masked facies.” Objective: We aimed to determine whether modern computer vision techniques can be applied to detect masked facies and quantify drug states in PD. Methods: We trained a convolutional neural network on images extracted from videos of 107 self-identified people with PD, along with 1595 videos of controls, in order to detect PD hypomimia cues. This trained model was applied to clinical interviews of 35 PD patients in their on and off drug motor states, and seven journalist interviews of the actor Alan Alda obtained before and after he was diagnosed with PD. Results: The algorithm achieved a test set area under the receiver operating characteristic curve of 0.71 on 54 subjects to detect PD hypomimia, compared to a value of 0.75 for trained neurologists using the United Parkinson Disease Rating Scale-III Facial Expression score. Additionally, the model accuracy to classify the on and off drug states in the clinical samples was 63% (22/35), in contrast to an accuracy of 46% (16/35) when using clinical rater scores. Finally, each of Alan Alda’s seven interviews were successfully classified as occurring before (versus after) his diagnosis, with 100% accuracy (7/7). Conclusions: This proof-of-principle pilot study demonstrated that computer vision holds promise as a valuable tool for PD hypomimia and for monitoring a patient’s motor state in an objective and noninvasive way, particularly given the increasing importance of telemedicine. %M 33616535 %R 10.2196/21037 %U https://www.jmir.org/2021/2/e21037 %U https://doi.org/10.2196/21037 %U http://www.ncbi.nlm.nih.gov/pubmed/33616535 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 10 %N 2 %P e26552 %T Investigating the Ethical and Data Governance Issues of Artificial Intelligence in Surgery: Protocol for a Delphi Study %A Lam,Kyle %A Iqbal,Fahad M %A Purkayastha,Sanjay %A Kinross,James M %+ Imperial College London, 10th Floor QEQM Building, Praed St, London, , United Kingdom, 44 796 490 4213, k.lam@imperial.ac.uk %K artificial intelligence %K digital surgery %K Delphi %K ethics %K data governance %K digital technology %K operating room %K surgery %D 2021 %7 22.2.2021 %9 Protocol %J JMIR Res Protoc %G English %X Background: The rapid uptake of digital technology into the operating room has the potential to improve patient outcomes, increase efficiency of the use of operating rooms, and allow surgeons to progress quickly up learning curves. These technologies are, however, dependent on huge amounts of data, and the consequences of their mismanagement are significant. While the field of artificial intelligence ethics is able to provide a broad framework for those designing and implementing these technologies into the operating room, there is a need to determine and address the ethical and data governance challenges of using digital technology in this unique environment. Objective: The objectives of this study are to define the term digital surgery and gain expert consensus on the key ethical and data governance issues, barriers, and future research goals of the use of artificial intelligence in surgery. Methods: Experts from the fields of surgery, ethics and law, policy, artificial intelligence, and industry will be invited to participate in a 4-round consensus Delphi exercise. In the first round, participants will supply free-text responses across 4 key domains: ethics, data governance, barriers, and future research goals. They will also be asked to provide their understanding of the term digital surgery. In subsequent rounds, statements will be grouped, and participants will be asked to rate the importance of each issue on a 9-point Likert scale ranging from 1 (not at all important) to 9 (critically important). Consensus is defined a priori as a score of 7 to 9 by 70% of respondents and 1 to 3 by less than 30% of respondents. A final online meeting round will be held to discuss inclusion of statements and draft a consensus document. Results: Full ethical approval has been obtained for the study by the local research ethics committee at Imperial College, London (20IC6136). We anticipate round 1 to commence in January 2021. Conclusions: The results of this study will define the term digital surgery, identify the key issues and barriers, and shape future research in this area. International Registered Report Identifier (IRRID): PRR1-10.2196/26552 %M 33616543 %R 10.2196/26552 %U https://www.researchprotocols.org/2021/2/e26552 %U https://doi.org/10.2196/26552 %U http://www.ncbi.nlm.nih.gov/pubmed/33616543 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 2 %P e24221 %T Use and Control of Artificial Intelligence in Patients Across the Medical Workflow: Single-Center Questionnaire Study of Patient Perspectives %A Lennartz,Simon %A Dratsch,Thomas %A Zopfs,David %A Persigehl,Thorsten %A Maintz,David %A Große Hokamp,Nils %A Pinto dos Santos,Daniel %+ Institute for Diagnostic and Interventional Radiology, Faculty of Medicine and University Hospital Cologne, University of Cologne, Kerpener Straße 62, Cologne, 50937, Germany, 49 22147896063, daniel.pinto-dos-santos@uk-koeln.de %K artificial intelligence %K clinical implementation %K questionnaire %K survey %D 2021 %7 17.2.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) is gaining increasing importance in many medical specialties, yet data on patients’ opinions on the use of AI in medicine are scarce. Objective: This study aimed to investigate patients’ opinions on the use of AI in different aspects of the medical workflow and the level of control and supervision under which they would deem the application of AI in medicine acceptable. Methods: Patients scheduled for computed tomography or magnetic resonance imaging voluntarily participated in an anonymized questionnaire between February 10, 2020, and May 24, 2020. Patient information, confidence in physicians vs AI in different clinical tasks, opinions on the control of AI, preference in cases of disagreement between AI and physicians, and acceptance of the use of AI for diagnosing and treating diseases of different severity were recorded. Results: In total, 229 patients participated. Patients favored physicians over AI for all clinical tasks except for treatment planning based on current scientific evidence. In case of disagreement between physicians and AI regarding diagnosis and treatment planning, most patients preferred the physician’s opinion to AI (96.2% [153/159] vs 3.8% [6/159] and 94.8% [146/154] vs 5.2% [8/154], respectively; P=.001). AI supervised by a physician was considered more acceptable than AI without physician supervision at diagnosis (confidence rating 3.90 [SD 1.20] vs 1.64 [SD 1.03], respectively; P=.001) and therapy (3.77 [SD 1.18] vs 1.57 [SD 0.96], respectively; P=.001). Conclusions: Patients favored physicians over AI in most clinical tasks and strongly preferred an application of AI with physician supervision. However, patients acknowledged that AI could help physicians integrate the most recent scientific evidence into medical care. Application of AI in medicine should be disclosed and controlled to protect patient interests and meet ethical standards. %M 33595451 %R 10.2196/24221 %U http://www.jmir.org/2021/2/e24221/ %U https://doi.org/10.2196/24221 %U http://www.ncbi.nlm.nih.gov/pubmed/33595451 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 2 %P e24572 %T Development and Validation of a Machine Learning Approach for Automated Severity Assessment of COVID-19 Based on Clinical and Imaging Data: Retrospective Study %A Quiroz,Juan Carlos %A Feng,You-Zhen %A Cheng,Zhong-Yuan %A Rezazadegan,Dana %A Chen,Ping-Kang %A Lin,Qi-Ting %A Qian,Long %A Liu,Xiao-Fang %A Berkovsky,Shlomo %A Coiera,Enrico %A Song,Lei %A Qiu,Xiaoming %A Liu,Sidong %A Cai,Xiang-Ran %+ Centre for Health Informatics, Australian Institute of Health Innovation, Faculty of Medicine, Health and Human Sciences, Macquarie University, 75 Talvera Road, Macquarie Park, 2113, Australia, 61 29852729, sidong.liu@mq.edu.au %K algorithm %K clinical data %K clinical features %K COVID-19 %K CT scans %K development %K imaging %K imbalanced data %K machine learning %K oversampling %K severity assessment %K validation %D 2021 %7 11.2.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: COVID-19 has overwhelmed health systems worldwide. It is important to identify severe cases as early as possible, such that resources can be mobilized and treatment can be escalated. Objective: This study aims to develop a machine learning approach for automated severity assessment of COVID-19 based on clinical and imaging data. Methods: Clinical data—including demographics, signs, symptoms, comorbidities, and blood test results—and chest computed tomography scans of 346 patients from 2 hospitals in the Hubei Province, China, were used to develop machine learning models for automated severity assessment in diagnosed COVID-19 cases. We compared the predictive power of the clinical and imaging data from multiple machine learning models and further explored the use of four oversampling methods to address the imbalanced classification issue. Features with the highest predictive power were identified using the Shapley Additive Explanations framework. Results: Imaging features had the strongest impact on the model output, while a combination of clinical and imaging features yielded the best performance overall. The identified predictive features were consistent with those reported previously. Although oversampling yielded mixed results, it achieved the best model performance in our study. Logistic regression models differentiating between mild and severe cases achieved the best performance for clinical features (area under the curve [AUC] 0.848; sensitivity 0.455; specificity 0.906), imaging features (AUC 0.926; sensitivity 0.818; specificity 0.901), and a combination of clinical and imaging features (AUC 0.950; sensitivity 0.764; specificity 0.919). The synthetic minority oversampling method further improved the performance of the model using combined features (AUC 0.960; sensitivity 0.845; specificity 0.929). Conclusions: Clinical and imaging features can be used for automated severity assessment of COVID-19 and can potentially help triage patients with COVID-19 and prioritize care delivery to those at a higher risk of severe disease. %M 33534723 %R 10.2196/24572 %U http://medinform.jmir.org/2021/2/e24572/ %U https://doi.org/10.2196/24572 %U http://www.ncbi.nlm.nih.gov/pubmed/33534723 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 2 %P e24246 %T A Machine Learning Prediction Model of Respiratory Failure Within 48 Hours of Patient Admission for COVID-19: Model Development and Validation %A Bolourani,Siavash %A Brenner,Max %A Wang,Ping %A McGinn,Thomas %A Hirsch,Jamie S %A Barnaby,Douglas %A Zanos,Theodoros P %A , %+ Feinstein Institutes for Medical Research, Northwell Health, 350 Community Dr, Room 1257, Manhasset, NY, 11030, United States, 1 5165620484, tzanos@northwell.edu %K artificial intelligence %K prognostic %K model %K pandemic %K severe acute respiratory syndrome coronavirus 2 %K modeling %K development %K validation %K COVID-19 %K machine learning %D 2021 %7 10.2.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Predicting early respiratory failure due to COVID-19 can help triage patients to higher levels of care, allocate scarce resources, and reduce morbidity and mortality by appropriately monitoring and treating the patients at greatest risk for deterioration. Given the complexity of COVID-19, machine learning approaches may support clinical decision making for patients with this disease. Objective: Our objective is to derive a machine learning model that predicts respiratory failure within 48 hours of admission based on data from the emergency department. Methods: Data were collected from patients with COVID-19 who were admitted to Northwell Health acute care hospitals and were discharged, died, or spent a minimum of 48 hours in the hospital between March 1 and May 11, 2020. Of 11,525 patients, 933 (8.1%) were placed on invasive mechanical ventilation within 48 hours of admission. Variables used by the models included clinical and laboratory data commonly collected in the emergency department. We trained and validated three predictive models (two based on XGBoost and one that used logistic regression) using cross-hospital validation. We compared model performance among all three models as well as an established early warning score (Modified Early Warning Score) using receiver operating characteristic curves, precision-recall curves, and other metrics. Results: The XGBoost model had the highest mean accuracy (0.919; area under the curve=0.77), outperforming the other two models as well as the Modified Early Warning Score. Important predictor variables included the type of oxygen delivery used in the emergency department, patient age, Emergency Severity Index level, respiratory rate, serum lactate, and demographic characteristics. Conclusions: The XGBoost model had high predictive accuracy, outperforming other early warning scores. The clinical plausibility and predictive ability of XGBoost suggest that the model could be used to predict 48-hour respiratory failure in admitted patients with COVID-19. %M 33476281 %R 10.2196/24246 %U http://www.jmir.org/2021/2/e24246/ %U https://doi.org/10.2196/24246 %U http://www.ncbi.nlm.nih.gov/pubmed/33476281 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 2 %P e23693 %T Fast and Accurate Detection of COVID-19 Along With 14 Other Chest Pathologies Using a Multi-Level Classification: Algorithm Development and Validation Study %A Albahli,Saleh %A Yar,Ghulam Nabi Ahmad Hassan %+ Department of Information Technology, College of Computer, Qassim University, Buraydah, 51452, Saudi Arabia, 966 163012604, salbahli@qu.edu.sa %K COVID-19 %K chest x-ray %K convolutional neural network %K data augmentation %K biomedical imaging %K automatic detection %D 2021 %7 10.2.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: COVID-19 has spread very rapidly, and it is important to build a system that can detect it in order to help an overwhelmed health care system. Many research studies on chest diseases rely on the strengths of deep learning techniques. Although some of these studies used state-of-the-art techniques and were able to deliver promising results, these techniques are not very useful if they can detect only one type of disease without detecting the others. Objective: The main objective of this study was to achieve a fast and more accurate diagnosis of COVID-19. This study proposes a diagnostic technique that classifies COVID-19 x-ray images from normal x-ray images and those specific to 14 other chest diseases. Methods: In this paper, we propose a novel, multilevel pipeline, based on deep learning models, to detect COVID-19 along with other chest diseases based on x-ray images. This pipeline reduces the burden of a single network to classify a large number of classes. The deep learning models used in this study were pretrained on the ImageNet dataset, and transfer learning was used for fast training. The lungs and heart were segmented from the whole x-ray images and passed onto the first classifier that checks whether the x-ray is normal, COVID-19 affected, or characteristic of another chest disease. If it is neither a COVID-19 x-ray image nor a normal one, then the second classifier comes into action and classifies the image as one of the other 14 diseases. Results: We show how our model uses state-of-the-art deep neural networks to achieve classification accuracy for COVID-19 along with 14 other chest diseases and normal cases based on x-ray images, which is competitive with currently used state-of-the-art models. Due to the lack of data in some classes such as COVID-19, we applied 10-fold cross-validation through the ResNet50 model. Our classification technique thus achieved an average training accuracy of 96.04% and test accuracy of 92.52% for the first level of classification (ie, 3 classes). For the second level of classification (ie, 14 classes), our technique achieved a maximum training accuracy of 88.52% and test accuracy of 66.634% by using ResNet50. We also found that when all the 16 classes were classified at once, the overall accuracy for COVID-19 detection decreased, which in the case of ResNet50 was 88.92% for training data and 71.905% for test data. Conclusions: Our proposed pipeline can detect COVID-19 with a higher accuracy along with detecting 14 other chest diseases based on x-ray images. This is achieved by dividing the classification task into multiple steps rather than classifying them collectively. %M 33529154 %R 10.2196/23693 %U http://www.jmir.org/2021/2/e23693/ %U https://doi.org/10.2196/23693 %U http://www.ncbi.nlm.nih.gov/pubmed/33529154 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 2 %P e22320 %T The Need for Ethnoracial Equity in Artificial Intelligence for Diabetes Management: Review and Recommendations %A Pham,Quynh %A Gamble,Anissa %A Hearn,Jason %A Cafazzo,Joseph A %+ Institute of Health Policy, Management and Evaluation, Dalla Lana School of Public Health, University of Toronto, Health Sciences Building, 155 College Street, Toronto, ON, M5T 1P8, Canada, 1 4163404800 ext 4765, q.pham@uhn.ca %K diabetes %K artificial intelligence %K digital health %K ethnoracial equity %K ethnicity %K race %D 2021 %7 10.2.2021 %9 Viewpoint %J J Med Internet Res %G English %X There is clear evidence to suggest that diabetes does not affect all populations equally. Among adults living with diabetes, those from ethnoracial minority communities—foreign-born, immigrant, refugee, and culturally marginalized—are at increased risk of poor health outcomes. Artificial intelligence (AI) is actively being researched as a means of improving diabetes management and care; however, several factors may predispose AI to ethnoracial bias. To better understand whether diabetes AI interventions are being designed in an ethnoracially equitable manner, we conducted a secondary analysis of 141 articles included in a 2018 review by Contreras and Vehi entitled “Artificial Intelligence for Diabetes Management and Decision Support: Literature Review.” Two members of our research team independently reviewed each article and selected those reporting ethnoracial data for further analysis. Only 10 articles (7.1%) were ultimately selected for secondary analysis in our case study. Of the 131 excluded articles, 118 (90.1%) failed to mention participants’ ethnic or racial backgrounds. The included articles reported ethnoracial data under various categories, including race (n=6), ethnicity (n=2), race/ethnicity (n=3), and percentage of Caucasian participants (n=1). Among articles specifically reporting race, the average distribution was 69.5% White, 17.1% Black, and 3.7% Asian. Only 2 articles reported inclusion of Native American participants. Given the clear ethnic and racial differences in diabetes biomarkers, prevalence, and outcomes, the inclusion of ethnoracial training data is likely to improve the accuracy of predictive models. Such considerations are imperative in AI-based tools, which are predisposed to negative biases due to their black-box nature and proneness to distributional shift. Based on our findings, we propose a short questionnaire to assess ethnoracial equity in research describing AI-based diabetes interventions. At this unprecedented time in history, AI can either mitigate or exacerbate disparities in health care. Future accounts of the infancy of diabetes AI must reflect our early and decisive action to confront ethnoracial inequities before they are coded into our systems and perpetuate the very biases we aim to eliminate. If we take deliberate and meaningful steps now toward training our algorithms to be ethnoracially inclusive, we can architect innovations in diabetes care that are bound by the diverse fabric of our society. %M 33565982 %R 10.2196/22320 %U http://www.jmir.org/2021/2/e22320/ %U https://doi.org/10.2196/22320 %U http://www.ncbi.nlm.nih.gov/pubmed/33565982 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 2 %P e22164 %T Identifying Myocardial Infarction Using Hierarchical Template Matching–Based Myocardial Strain: Algorithm Development and Usability Study %A Bhalodiya,Jayendra Maganbhai %A Palit,Arnab %A Giblin,Gerard %A Tiwari,Manoj Kumar %A Prasad,Sanjay K %A Bhudia,Sunil K %A Arvanitis,Theodoros N %A Williams,Mark A %+ Institute of Digital Healthcare, Warwick Manufacturing Group, University of Warwick, Gibbet Hill Rd, Coventry, CV4 7AL, United Kingdom, 44 7448404975, jayendra.bhalodiya@warwick.ac.uk %K left ventricle %K myocardial infarction %K myocardium %K strain %D 2021 %7 10.2.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Myocardial infarction (MI; location and extent of infarction) can be determined by late enhancement cardiac magnetic resonance (CMR) imaging, which requires the injection of a potentially harmful gadolinium-based contrast agent (GBCA). Alternatively, emerging research in the area of myocardial strain has shown potential to identify MI using strain values. Objective: This study aims to identify the location of MI by developing an applied algorithmic method of circumferential strain (CS) values, which are derived through a novel hierarchical template matching (HTM) method. Methods: HTM-based CS H-spread from end-diastole to end-systole was used to develop an applied method. Grid-tagging magnetic resonance imaging was used to calculate strain values in the left ventricular (LV) myocardium, followed by the 16-segment American Heart Association model. The data set was used with k-fold cross-validation to estimate the percentage reduction of H-spread among infarcted and noninfarcted LV segments. A total of 43 participants (38 MI and 5 healthy) who underwent CMR imaging were retrospectively selected. Infarcted segments detected by using this method were validated by comparison with late enhancement CMR, and the diagnostic performance of the applied algorithmic method was evaluated with a receiver operating characteristic curve test. Results: The H-spread of the CS was reduced in infarcted segments compared with noninfarcted segments of the LV. The reductions were 30% in basal segments, 30% in midventricular segments, and 20% in apical LV segments. The diagnostic accuracy of detection, using the reported method, was represented by area under the curve values, which were 0.85, 0.82, and 0.87 for basal, midventricular, and apical slices, respectively, demonstrating good agreement with the late-gadolinium enhancement–based detections. Conclusions: The proposed applied algorithmic method has the potential to accurately identify the location of infarcted LV segments without the administration of late-gadolinium enhancement. Such an approach adds the potential to safely identify MI, potentially reduce patient scanning time, and extend the utility of CMR in patients who are contraindicated for the use of GBCA. %M 33565992 %R 10.2196/22164 %U https://medinform.jmir.org/2021/2/e22164 %U https://doi.org/10.2196/22164 %U http://www.ncbi.nlm.nih.gov/pubmed/33565992 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 7 %N 2 %P e25935 %T Collaborating in the Time of COVID-19: The Scope and Scale of Innovative Responses to a Global Pandemic %A Bernardo,Theresa %A Sobkowich,Kurtis Edward %A Forrest,Russell Othmer %A Stewart,Luke Silva %A D'Agostino,Marcelo %A Perez Gutierrez,Enrique %A Gillis,Daniel %+ Department of Population Medicine, University of Guelph, 50 Stone Rd E, Guelph, ON, N1G 2W1, Canada, 1 519 824 4120 ext 54184, theresabernardo@gmail.com %K crowdsourcing %K artificial intelligence %K collaboration %K personal protective equipment %K big data %K AI %K COVID-19 %K innovation %K information sharing %K communication %K teamwork %K knowledge %K dissemination %D 2021 %7 9.2.2021 %9 Viewpoint %J JMIR Public Health Surveill %G English %X The emergence of COVID-19 spurred the formation of myriad teams to tackle every conceivable aspect of the virus and thwart its spread. Enabled by global digital connectedness, collaboration has become a constant theme throughout the pandemic, resulting in the expedition of the scientific process (including vaccine development), rapid consolidation of global outbreak data and statistics, and experimentation with novel partnerships. To document the evolution of these collaborative efforts, the authors collected illustrative examples as the pandemic unfolded, supplemented with publications from the JMIR COVID-19 Special Issue. Over 60 projects rooted in collaboration are categorized into five main themes: knowledge dissemination, data propagation, crowdsourcing, artificial intelligence, and hardware design and development. They highlight the numerous ways that citizens, industry professionals, researchers, and academics have come together worldwide to consolidate information and produce products to combat the COVID-19 pandemic. Initially, researchers and citizen scientists scrambled to access quality data within an overwhelming quantity of information. As global curated data sets emerged, derivative works such as visualizations or models were developed that depended on consistent data and would fail when there were unanticipated changes. Crowdsourcing was used to collect and analyze data, aid in contact tracing, and produce personal protective equipment by sharing open designs for 3D printing. An international consortium of entrepreneurs and researchers created a ventilator based on an open-source design. A coalition of nongovernmental organizations and governmental organizations, led by the White House Office of Science and Technology Policy, created a shared open resource of over 200,000 research publications about COVID-19 and subsequently offered cash prizes for the best solutions to 17 key questions involving artificial intelligence. A thread of collaboration weaved throughout the pandemic response, which will shape future efforts. Novel partnerships will cross boundaries to create better processes, products, and solutions to consequential societal challenges. %M 33503001 %R 10.2196/25935 %U http://publichealth.jmir.org/2021/2/e25935/ %U https://doi.org/10.2196/25935 %U http://www.ncbi.nlm.nih.gov/pubmed/33503001 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 5 %N 2 %P e25184 %T Preliminary Screening for Hereditary Breast and Ovarian Cancer Using a Chatbot Augmented Intelligence Genetic Counselor: Development and Feasibility Study %A Sato,Ann %A Haneda,Eri %A Suganuma,Nobuyasu %A Narimatsu,Hiroto %+ Department of Genetic Medicine, Kanagawa Cancer Center, 2-3-2 Nakao, Asahi-ku, Yokohama, Kanagawa, 241-8515, Japan, 81 045 520 2222, hiroto-narimatsu@umin.org %K artificial intelligence %K augmented intelligence %K hereditary cancer %K familial cancer %K IBM Watson %K preliminary screening %K cancer %K genetics %K chatbot %K screening %K feasibility %D 2021 %7 5.2.2021 %9 Original Paper %J JMIR Form Res %G English %X Background: Breast cancer is the most common form of cancer in Japan; genetic background and hereditary breast and ovarian cancer (HBOC) are implicated. The key to HBOC diagnosis involves screening to identify high-risk individuals. However, genetic medicine is still developing; thus, many patients who may potentially benefit from genetic medicine have not yet been identified. Objective: This study’s objective is to develop a chatbot system that uses augmented intelligence for HBOC screening to determine whether patients meet the National Comprehensive Cancer Network (NCCN) BRCA1/2 testing criteria. Methods: The system was evaluated by a doctor specializing in genetic medicine and certified genetic counselors. We prepared 3 scenarios and created a conversation with the chatbot to reflect each one. Then we evaluated chatbot feasibility, the required time, the medical accuracy of conversations and family history, and the final result. Results: The times required for the conversation were 7 minutes for scenario 1, 15 minutes for scenario 2, and 16 minutes for scenario 3. Scenarios 1 and 2 met the BRCA1/2 testing criteria, but scenario 3 did not, and this result was consistent with the findings of 3 experts who retrospectively reviewed conversations with the chatbot according to the 3 scenarios. A family history comparison ascertained by the chatbot with the actual scenarios revealed that each result was consistent with each scenario. From a genetic medicine perspective, no errors were noted by the 3 experts. Conclusions: This study demonstrated that chatbot systems could be applied to preliminary genetic medicine screening for HBOC. %M 33544084 %R 10.2196/25184 %U https://formative.jmir.org/2021/2/e25184 %U https://doi.org/10.2196/25184 %U http://www.ncbi.nlm.nih.gov/pubmed/33544084 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 2 %P e25187 %T Machine Learning–Based Early Warning Systems for Clinical Deterioration: Systematic Scoping Review %A Muralitharan,Sankavi %A Nelson,Walter %A Di,Shuang %A McGillion,Michael %A Devereaux,PJ %A Barr,Neil Grant %A Petch,Jeremy %+ Centre for Data Science and Digital Health, Hamilton Health Sciences, 293 Wellington St. N, Hamilton, ON, L8L 8E7, Canada, 1 2897882965, sankavi_22@hotmail.com %K machine learning %K early warning systems %K clinical deterioration %K ambulatory care %K acute care %K remote patient monitoring %K vital signs %K sepsis %K cardiorespiratory instability %K risk prediction %D 2021 %7 4.2.2021 %9 Review %J J Med Internet Res %G English %X Background: Timely identification of patients at a high risk of clinical deterioration is key to prioritizing care, allocating resources effectively, and preventing adverse outcomes. Vital signs–based, aggregate-weighted early warning systems are commonly used to predict the risk of outcomes related to cardiorespiratory instability and sepsis, which are strong predictors of poor outcomes and mortality. Machine learning models, which can incorporate trends and capture relationships among parameters that aggregate-weighted models cannot, have recently been showing promising results. Objective: This study aimed to identify, summarize, and evaluate the available research, current state of utility, and challenges with machine learning–based early warning systems using vital signs to predict the risk of physiological deterioration in acutely ill patients, across acute and ambulatory care settings. Methods: PubMed, CINAHL, Cochrane Library, Web of Science, Embase, and Google Scholar were searched for peer-reviewed, original studies with keywords related to “vital signs,” “clinical deterioration,” and “machine learning.” Included studies used patient vital signs along with demographics and described a machine learning model for predicting an outcome in acute and ambulatory care settings. Data were extracted following PRISMA, TRIPOD, and Cochrane Collaboration guidelines. Results: We identified 24 peer-reviewed studies from 417 articles for inclusion; 23 studies were retrospective, while 1 was prospective in nature. Care settings included general wards, intensive care units, emergency departments, step-down units, medical assessment units, postanesthetic wards, and home care. Machine learning models including logistic regression, tree-based methods, kernel-based methods, and neural networks were most commonly used to predict the risk of deterioration. The area under the curve for models ranged from 0.57 to 0.97. Conclusions: In studies that compared performance, reported results suggest that machine learning–based early warning systems can achieve greater accuracy than aggregate-weighted early warning systems but several areas for further research were identified. While these models have the potential to provide clinical decision support, there is a need for standardized outcome measures to allow for rigorous evaluation of performance across models. Further research needs to address the interpretability of model outputs by clinicians, clinical efficacy of these systems through prospective study design, and their potential impact in different clinical settings. %M 33538696 %R 10.2196/25187 %U https://www.jmir.org/2021/2/e25187 %U https://doi.org/10.2196/25187 %U http://www.ncbi.nlm.nih.gov/pubmed/33538696 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 2 %P e23436 %T Hidden Variables in Deep Learning Digital Pathology and Their Potential to Cause Batch Effects: Prediction Model Study %A Schmitt,Max %A Maron,Roman Christoph %A Hekler,Achim %A Stenzinger,Albrecht %A Hauschild,Axel %A Weichenthal,Michael %A Tiemann,Markus %A Krahl,Dieter %A Kutzner,Heinz %A Utikal,Jochen Sven %A Haferkamp,Sebastian %A Kather,Jakob Nikolas %A Klauschen,Frederick %A Krieghoff-Henning,Eva %A Fröhling,Stefan %A von Kalle,Christof %A Brinker,Titus Josef %+ Digital Biomarkers for Oncology Group, National Center for Tumor Diseases, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 460, Heidelberg, 69120, Germany, 49 6221 3219304, titus.brinker@dkfz.de %K artificial intelligence %K machine learning %K deep learning %K neural networks %K convolutional neural networks %K pathology %K clinical pathology %K digital pathology %K pitfalls %K artifacts %D 2021 %7 2.2.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: An increasing number of studies within digital pathology show the potential of artificial intelligence (AI) to diagnose cancer using histological whole slide images, which requires large and diverse data sets. While diversification may result in more generalizable AI-based systems, it can also introduce hidden variables. If neural networks are able to distinguish/learn hidden variables, these variables can introduce batch effects that compromise the accuracy of classification systems. Objective: The objective of the study was to analyze the learnability of an exemplary selection of hidden variables (patient age, slide preparation date, slide origin, and scanner type) that are commonly found in whole slide image data sets in digital pathology and could create batch effects. Methods: We trained four separate convolutional neural networks (CNNs) to learn four variables using a data set of digitized whole slide melanoma images from five different institutes. For robustness, each CNN training and evaluation run was repeated multiple times, and a variable was only considered learnable if the lower bound of the 95% confidence interval of its mean balanced accuracy was above 50.0%. Results: A mean balanced accuracy above 50.0% was achieved for all four tasks, even when considering the lower bound of the 95% confidence interval. Performance between tasks showed wide variation, ranging from 56.1% (slide preparation date) to 100% (slide origin). Conclusions: Because all of the analyzed hidden variables are learnable, they have the potential to create batch effects in dermatopathology data sets, which negatively affect AI-based classification systems. Practitioners should be aware of these and similar pitfalls when developing and evaluating such systems and address these and potentially other batch effect variables in their data sets through sufficient data set stratification. %M 33528370 %R 10.2196/23436 %U https://www.jmir.org/2021/2/e23436 %U https://doi.org/10.2196/23436 %U http://www.ncbi.nlm.nih.gov/pubmed/33528370 %0 Journal Article %@ 2562-7600 %I JMIR Publications %V 4 %N 1 %P e23933 %T Predicted Influences of Artificial Intelligence on Nursing Education: Scoping Review %A Buchanan,Christine %A Howitt,M Lyndsay %A Wilson,Rita %A Booth,Richard G %A Risling,Tracie %A Bamford,Megan %+ Registered Nurses' Association of Ontario, 500-4211 Yonge Street, Toronto, ON, M2P 2A9, Canada, 1 800 268 7199 ext 281, cbuchanan@rnao.ca %K nursing %K artificial intelligence %K education %K review %D 2021 %7 28.1.2021 %9 Review %J JMIR Nursing %G English %X Background: It is predicted that artificial intelligence (AI) will transform nursing across all domains of nursing practice, including administration, clinical care, education, policy, and research. Increasingly, researchers are exploring the potential influences of AI health technologies (AIHTs) on nursing in general and on nursing education more specifically. However, little emphasis has been placed on synthesizing this body of literature. Objective: A scoping review was conducted to summarize the current and predicted influences of AIHTs on nursing education over the next 10 years and beyond. Methods: This scoping review followed a previously published protocol from April 2020. Using an established scoping review methodology, the databases of MEDLINE, Cumulative Index to Nursing and Allied Health Literature, Embase, PsycINFO, Cochrane Database of Systematic Reviews, Cochrane Central, Education Resources Information Centre, Scopus, Web of Science, and Proquest were searched. In addition to the use of these electronic databases, a targeted website search was performed to access relevant grey literature. Abstracts and full-text studies were independently screened by two reviewers using prespecified inclusion and exclusion criteria. Included literature focused on nursing education and digital health technologies that incorporate AI. Data were charted using a structured form and narratively summarized into categories. Results: A total of 27 articles were identified (20 expository papers, six studies with quantitative or prototyping methods, and one qualitative study). The population included nurses, nurse educators, and nursing students at the entry-to-practice, undergraduate, graduate, and doctoral levels. A variety of AIHTs were discussed, including virtual avatar apps, smart homes, predictive analytics, virtual or augmented reality, and robots. The two key categories derived from the literature were (1) influences of AI on nursing education in academic institutions and (2) influences of AI on nursing education in clinical practice. Conclusions: Curricular reform is urgently needed within nursing education programs in academic institutions and clinical practice settings to prepare nurses and nursing students to practice safely and efficiently in the age of AI. Additionally, nurse educators need to adopt new and evolving pedagogies that incorporate AI to better support students at all levels of education. Finally, nursing students and practicing nurses must be equipped with the requisite knowledge and skills to effectively assess AIHTs and safely integrate those deemed appropriate to support person-centered compassionate nursing care in practice settings. International Registered Report Identifier (IRRID): RR2-10.2196/17490 %M 34345794 %R 10.2196/23933 %U https://nursing.jmir.org/2021/1/e23933/ %U https://doi.org/10.2196/23933 %U http://www.ncbi.nlm.nih.gov/pubmed/34345794 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 1 %P e24973 %T Deep Learning Models for Predicting Severe Progression in COVID-19-Infected Patients: Retrospective Study %A Ho,Thao Thi %A Park,Jongmin %A Kim,Taewoo %A Park,Byunggeon %A Lee,Jaehee %A Kim,Jin Young %A Kim,Ki Beom %A Choi,Sooyoung %A Kim,Young Hwan %A Lim,Jae-Kwang %A Choi,Sanghun %+ School of Mechanical Engineering, Kyungpook National University, 80 Daehak-ro, Buk-gu, Daegu, 41566, Republic of Korea, 82 53 950 5578, s-choi@knu.ac.kr %K COVID-19 %K deep learning %K artificial neural network %K convolutional neural network %K lung CT %D 2021 %7 28.1.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Many COVID-19 patients rapidly progress to respiratory failure with a broad range of severities. Identification of high-risk cases is critical for early intervention. Objective: The aim of this study is to develop deep learning models that can rapidly identify high-risk COVID-19 patients based on computed tomography (CT) images and clinical data. Methods: We analyzed 297 COVID-19 patients from five hospitals in Daegu, South Korea. A mixed artificial convolutional neural network (ACNN) model, combining an artificial neural network for clinical data and a convolutional neural network for 3D CT imaging data, was developed to classify these cases as either high risk of severe progression (ie, event) or low risk (ie, event-free). Results: Using the mixed ACNN model, we were able to obtain high classification performance using novel coronavirus pneumonia lesion images (ie, 93.9% accuracy, 80.8% sensitivity, 96.9% specificity, and 0.916 area under the curve [AUC] score) and lung segmentation images (ie, 94.3% accuracy, 74.7% sensitivity, 95.9% specificity, and 0.928 AUC score) for event versus event-free groups. Conclusions: Our study successfully differentiated high-risk cases among COVID-19 patients using imaging and clinical features. The developed model can be used as a predictive tool for interventions in aggressive therapies. %M 33455900 %R 10.2196/24973 %U http://medinform.jmir.org/2021/1/e24973/ %U https://doi.org/10.2196/24973 %U http://www.ncbi.nlm.nih.gov/pubmed/33455900 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 1 %P e24924 %T Machine Learning Prediction of Foodborne Disease Pathogens: Algorithm Development and Validation Study %A Wang,Hanxue %A Cui,Wenjuan %A Guo,Yunchang %A Du,Yi %A Zhou,Yuanchun %+ Computer Network Information Center, Chinese Academy of Sciences, No 4, South Fourth Street, Zhongguancun, Haidian District, Beijing, 100190, China, 86 15810134970, duyi@cnic.cn %K foodborne disease %K pathogens prediction %K machine learning %D 2021 %7 26.1.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Foodborne diseases have a high global incidence; thus, they place a heavy burden on public health and the social economy. Foodborne pathogens, as the main factor of foodborne diseases, play an important role in the treatment and prevention of foodborne diseases; however, foodborne diseases caused by different pathogens lack specificity in their clinical features, and there is a low proportion of actual clinical pathogen detection in real life. Objective: We aimed to analyze foodborne disease case data, select appropriate features based on analysis results, and use machine learning methods to classify foodborne disease pathogens to predict foodborne disease pathogens for cases where the pathogen is not known or tested. Methods: We extracted features such as space, time, and exposed food from foodborne disease case data and analyzed the relationships between these features and the foodborne disease pathogens using a variety of machine learning methods to classify foodborne disease pathogens. We compared the results of four models to obtain the pathogen prediction model with the highest accuracy. Results: The gradient boost decision tree model obtained the highest accuracy, with accuracy approaching 69% in identifying 4 pathogens: Salmonella, Norovirus, Escherichia coli, and Vibrio parahaemolyticus. By evaluating the importance of features such as time of illness, geographical longitude and latitude, and diarrhea frequency, we found that these features play important roles in classifying foodborne disease pathogens. Conclusions: Data analysis can reflect the distribution of some features of foodborne diseases and the relationships among the features. The classification of pathogens based on the analysis results and machine learning methods can provide beneficial support for clinical auxiliary diagnosis and treatment of foodborne diseases. %M 33496675 %R 10.2196/24924 %U http://medinform.jmir.org/2021/1/e24924/ %U https://doi.org/10.2196/24924 %U http://www.ncbi.nlm.nih.gov/pubmed/33496675 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 1 %P e19739 %T An Application of Machine Learning to Etiological Diagnosis of Secondary Hypertension: Retrospective Study Using Electronic Medical Records %A Diao,Xiaolin %A Huo,Yanni %A Yan,Zhanzheng %A Wang,Haibin %A Yuan,Jing %A Wang,Yuxin %A Cai,Jun %A Zhao,Wei %+ Department of Information Center, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, 167 Beilishi Road, Beijing, 100037, China, 86 1 333 119 2899, zw@fuwai.com %K secondary hypertension %K etiological diagnosis %K machine learning %K prediction model %D 2021 %7 25.1.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Secondary hypertension is a kind of hypertension with a definite etiology and may be cured. Patients with suspected secondary hypertension can benefit from timely detection and treatment and, conversely, will have a higher risk of morbidity and mortality than those with primary hypertension. Objective: The aim of this study was to develop and validate machine learning (ML) prediction models of common etiologies in patients with suspected secondary hypertension. Methods: The analyzed data set was retrospectively extracted from electronic medical records of patients discharged from Fuwai Hospital between January 1, 2016, and June 30, 2019. A total of 7532 unique patients were included and divided into 2 data sets by time: 6302 patients in 2016-2018 as the training data set for model building and 1230 patients in 2019 as the validation data set for further evaluation. Extreme Gradient Boosting (XGBoost) was adopted to develop 5 models to predict 4 etiologies of secondary hypertension and occurrence of any of them (named as composite outcome), including renovascular hypertension (RVH), primary aldosteronism (PA), thyroid dysfunction, and aortic stenosis. Both univariate logistic analysis and Gini Impurity were used for feature selection. Grid search and 10-fold cross-validation were used to select the optimal hyperparameters for each model. Results: Validation of the composite outcome prediction model showed good performance with an area under the receiver-operating characteristic curve (AUC) of 0.924 in the validation data set, while the 4 prediction models of RVH, PA, thyroid dysfunction, and aortic stenosis achieved AUC of 0.938, 0.965, 0.959, and 0.946, respectively, in the validation data set. A total of 79 clinical indicators were identified in all and finally used in our prediction models. The result of subgroup analysis on the composite outcome prediction model demonstrated high discrimination with AUCs all higher than 0.890 among all age groups of adults. Conclusions: The ML prediction models in this study showed good performance in detecting 4 etiologies of patients with suspected secondary hypertension; thus, they may potentially facilitate clinical diagnosis decision making of secondary hypertension in an intelligent way. %M 33492233 %R 10.2196/19739 %U http://medinform.jmir.org/2021/1/e19739/ %U https://doi.org/10.2196/19739 %U http://www.ncbi.nlm.nih.gov/pubmed/33492233 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 1 %P e20123 %T Risk Stratification for Early Detection of Diabetes and Hypertension in Resource-Limited Settings: Machine Learning Analysis %A Boutilier,Justin J %A Chan,Timothy C Y %A Ranjan,Manish %A Deo,Sarang %+ Department of Industrial and Systems Engineering, University of Wisconsin-Madison, 1513 University Avenue, Madison, WI, 53706, United States, 1 6082630350, jboutilier@wisc.edu %K machine learning %K diabetes %K hypertension %K screening %K global health %D 2021 %7 21.1.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: The impending scale up of noncommunicable disease screening programs in low- and middle-income countries coupled with limited health resources require that such programs be as accurate as possible at identifying patients at high risk. Objective: The aim of this study was to develop machine learning–based risk stratification algorithms for diabetes and hypertension that are tailored for the at-risk population served by community-based screening programs in low-resource settings. Methods: We trained and tested our models by using data from 2278 patients collected by community health workers through door-to-door and camp-based screenings in the urban slums of Hyderabad, India between July 14, 2015 and April 21, 2018. We determined the best models for predicting short-term (2-month) risk of diabetes and hypertension (a model for diabetes and a model for hypertension) and compared these models to previously developed risk scores from the United States and the United Kingdom by using prediction accuracy as characterized by the area under the receiver operating characteristic curve (AUC) and the number of false negatives. Results: We found that models based on random forest had the highest prediction accuracy for both diseases and were able to outperform the US and UK risk scores in terms of AUC by 35.5% for diabetes (improvement of 0.239 from 0.671 to 0.910) and 13.5% for hypertension (improvement of 0.094 from 0.698 to 0.792). For a fixed screening specificity of 0.9, the random forest model was able to reduce the expected number of false negatives by 620 patients per 1000 screenings for diabetes and 220 patients per 1000 screenings for hypertension. This improvement reduces the cost of incorrect risk stratification by US $1.99 (or 35%) per screening for diabetes and US $1.60 (or 21%) per screening for hypertension. Conclusions: In the next decade, health systems in many countries are planning to spend significant resources on noncommunicable disease screening programs and our study demonstrates that machine learning models can be leveraged by these programs to effectively utilize limited resources by improving risk stratification. %M 33475518 %R 10.2196/20123 %U http://www.jmir.org/2021/1/e20123/ %U https://doi.org/10.2196/20123 %U http://www.ncbi.nlm.nih.gov/pubmed/33475518 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 1 %P e24618 %T Development of Social Support Networks by Patients With Depression Through Online Health Communities: Social Network Analysis %A Lu,Yingjie %A Luo,Shuwen %A Liu,Xuan %+ School of Business, East China University of Science and Technology, Meilong Road 130, Shanghai, 200237, China, 86 2164252489, xuanliu@ecust.edu.cn %K online depression community %K social support network %K exponential random graph model %K informational support %K emotional support %K mental health %K depression %K social network %D 2021 %7 7.1.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: In recent years, people with mental health problems are increasingly using online social networks to receive social support. For example, in online depression communities, patients can share their experiences, exchange valuable information, and receive emotional support to help them cope with their disease. Therefore, it is critical to understand how patients with depression develop online social support networks to exchange informational and emotional support. Objective: Our aim in this study was to investigate which user attributes have significant effects on the formation of informational and emotional support networks in online depression communities and to further examine whether there is an association between the two social networks. Methods: We used social network theory and constructed exponential random graph models to help understand the informational and emotional support networks in online depression communities. A total of 74,986 original posts were retrieved from 1077 members in an online depression community in China from April 2003 to September 2017 and the available data were extracted. An informational support network of 1077 participant nodes and 6557 arcs and an emotional support network of 1077 participant nodes and 6430 arcs were constructed to examine the endogenous (purely structural) effects and exogenous (actor-relation) effects on each support network separately, as well as the cross-network effects between the two networks. Results: We found significant effects of two important structural features, reciprocity and transitivity, on the formation of both the informational support network (r=3.6247, P<.001, and r=1.6232, P<.001, respectively) and the emotional support network (r=4.4111, P<.001, and r=0.0177, P<.001, respectively). The results also showed significant effects of some individual factors on the formation of the two networks. No significant effects of homophily were found for gender (r=0.0783, P=.20, and r=0.1122, P=.25, respectively) in the informational or emotional support networks. There was no tendency for users who had great influence (r=0.3253, P=.05) or wrote more posts (r=0.3896, P=.07) or newcomers (r=–0.0452, P=.66) to form informational support ties more easily. However, users who spent more time online (r=0.6680, P<.001) or provided more replies to other posts (r=0.5026, P<.001) were more likely to form informational support ties. Users who had a big influence (r=0.8325, P<.001), spent more time online (r=0.5839, P<.001), wrote more posts (r=2.4025, P<.001), or provided more replies to other posts (r=0.2259, P<.001) were more likely to form emotional support ties, and newcomers (r=–0.4224, P<.001) were less likely than old-timers to receive emotional support. In addition, we found that there was a significant entrainment effect (r=0.7834, P<.001) and a nonsignificant exchange effect (r=–0.2757, P=.32) between the two networks. Conclusions: This study makes several important theoretical contributions to the research on online depression communities and has important practical implications for the managers of online depression communities and the users involved in these communities. %M 33279878 %R 10.2196/24618 %U http://medinform.jmir.org/2021/1/e24618/ %U https://doi.org/10.2196/24618 %U http://www.ncbi.nlm.nih.gov/pubmed/33279878 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 10 %N 1 %P e21453 %T Natural Language Processing–Based Virtual Cofacilitator for Online Cancer Support Groups: Protocol for an Algorithm Development and Validation Study %A Leung,Yvonne W %A Wouterloot,Elise %A Adikari,Achini %A Hirst,Graeme %A de Silva,Daswin %A Wong,Jiahui %A Bender,Jacqueline L %A Gancarz,Mathew %A Gratzer,David %A Alahakoon,Damminda %A Esplen,Mary Jane %+ de Souza Institute, University Health Network, 222 St Patrick St Rm 503, Toronto, ON, M5T 1V4, Canada, 1 844 758 6891, yvonne.leung@desouzainstitute.com %K artificial intelligence %K cancer %K online support groups %K emotional distress %K natural language processing %K participant engagement %D 2021 %7 7.1.2021 %9 Protocol %J JMIR Res Protoc %G English %X Background: Cancer and its treatment can significantly impact the short- and long-term psychological well-being of patients and families. Emotional distress and depressive symptomatology are often associated with poor treatment adherence, reduced quality of life, and higher mortality. Cancer support groups, especially those led by health care professionals, provide a safe place for participants to discuss fear, normalize stress reactions, share solidarity, and learn about effective strategies to build resilience and enhance coping. However, in-person support groups may not always be accessible to individuals; geographic distance is one of the barriers for access, and compromised physical condition (eg, fatigue, pain) is another. Emerging evidence supports the effectiveness of online support groups in reducing access barriers. Text-based and professional-led online support groups have been offered by Cancer Chat Canada. Participants join the group discussion using text in real time. However, therapist leaders report some challenges leading text-based online support groups in the absence of visual cues, particularly in tracking participant distress. With multiple participants typing at the same time, the nuances of the text messages or red flags for distress can sometimes be missed. Recent advances in artificial intelligence such as deep learning–based natural language processing offer potential solutions. This technology can be used to analyze online support group text data to track participants’ expressed emotional distress, including fear, sadness, and hopelessness. Artificial intelligence allows session activities to be monitored in real time and alerts the therapist to participant disengagement. Objective: We aim to develop and evaluate an artificial intelligence–based cofacilitator prototype to track and monitor online support group participants’ distress through real-time analysis of text-based messages posted during synchronous sessions. Methods: An artificial intelligence–based cofacilitator will be developed to identify participants who are at-risk for increased emotional distress and track participant engagement and in-session group cohesion levels, providing real-time alerts for therapist to follow-up; generate postsession participant profiles that contain discussion content keywords and emotion profiles for each session; and automatically suggest tailored resources to participants according to their needs. The study is designed to be conducted in 4 phases consisting of (1) development based on a subset of data and an existing natural language processing framework, (2) performance evaluation using human scoring, (3) beta testing, and (4) user experience evaluation. Results: This study received ethics approval in August 2019. Phase 1, development of an artificial intelligence–based cofacilitator, was completed in January 2020. As of December 2020, phase 2 is underway. The study is expected to be completed by September 2021. Conclusions: An artificial intelligence–based cofacilitator offers a promising new mode of delivery of person-centered online support groups tailored to individual needs. International Registered Report Identifier (IRRID): DERR1-10.2196/21453 %M 33410754 %R 10.2196/21453 %U https://www.researchprotocols.org/2021/1/e21453 %U https://doi.org/10.2196/21453 %U http://www.ncbi.nlm.nih.gov/pubmed/33410754 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 1 %P e19928 %T Utilization of Self-Diagnosis Health Chatbots in Real-World Settings: Case Study %A Fan,Xiangmin %A Chao,Daren %A Zhang,Zhan %A Wang,Dakuo %A Li,Xiaohua %A Tian,Feng %+ School of Computer Science and Information Systems, Pace University, 1 Pace Plaza, New York, NY, 10078, United States, 1 9147733254, zzhang@pace.edu %K self-diagnosis %K chatbot %K conversational agent %K human–artificial intelligence interaction %K artificial intelligence %K diagnosis %K case study %K eHealth %K real world %K user experience %D 2021 %7 6.1.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI)-driven chatbots are increasingly being used in health care, but most chatbots are designed for a specific population and evaluated in controlled settings. There is little research documenting how health consumers (eg, patients and caregivers) use chatbots for self-diagnosis purposes in real-world scenarios. Objective: The aim of this research was to understand how health chatbots are used in a real-world context, what issues and barriers exist in their usage, and how the user experience of this novel technology can be improved. Methods: We employed a data-driven approach to analyze the system log of a widely deployed self-diagnosis chatbot in China. Our data set consisted of 47,684 consultation sessions initiated by 16,519 users over 6 months. The log data included a variety of information, including users’ nonidentifiable demographic information, consultation details, diagnostic reports, and user feedback. We conducted both statistical analysis and content analysis on this heterogeneous data set. Results: The chatbot users spanned all age groups, including middle-aged and older adults. Users consulted the chatbot on a wide range of medical conditions, including those that often entail considerable privacy and social stigma issues. Furthermore, we distilled 2 prominent issues in the use of the chatbot: (1) a considerable number of users dropped out in the middle of their consultation sessions, and (2) some users pretended to have health concerns and used the chatbot for nontherapeutic purposes. Finally, we identified a set of user concerns regarding the use of the chatbot, including insufficient actionable information and perceived inaccurate diagnostic suggestions. Conclusions: Although health chatbots are considered to be convenient tools for enhancing patient-centered care, there are issues and barriers impeding the optimal use of this novel technology. Designers and developers should employ user-centered approaches to address the issues and user concerns to achieve the best uptake and utilization. We conclude the paper by discussing several design implications, including making the chatbots more informative, easy-to-use, and trustworthy, as well as improving the onboarding experience to enhance user engagement. %M 33404508 %R 10.2196/19928 %U https://www.jmir.org/2021/1/e19928 %U https://doi.org/10.2196/19928 %U http://www.ncbi.nlm.nih.gov/pubmed/33404508 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 12 %P e21965 %T Automatically Explaining Machine Learning Prediction Results on Asthma Hospital Visits in Patients With Asthma: Secondary Analysis %A Luo,Gang %A Johnson,Michael D %A Nkoy,Flory L %A He,Shan %A Stone,Bryan L %+ Department of Biomedical Informatics and Medical Education, University of Washington, Building C, Box 358047, 850 Republican Street, Seattle, WA, 98195, United States, 1 2062214596, gangluo@cs.wisc.edu %K asthma %K forecasting %K machine learning %K patient care management %D 2020 %7 31.12.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Asthma is a major chronic disease that poses a heavy burden on health care. To facilitate the allocation of care management resources aimed at improving outcomes for high-risk patients with asthma, we recently built a machine learning model to predict asthma hospital visits in the subsequent year in patients with asthma. Our model is more accurate than previous models. However, like most machine learning models, it offers no explanation of its prediction results. This creates a barrier for use in care management, where interpretability is desired. Objective: This study aims to develop a method to automatically explain the prediction results of the model and recommend tailored interventions without lowering the performance measures of the model. Methods: Our data were imbalanced, with only a small portion of data instances linking to future asthma hospital visits. To handle imbalanced data, we extended our previous method of automatically offering rule-formed explanations for the prediction results of any machine learning model on tabular data without lowering the model’s performance measures. In a secondary analysis of the 334,564 data instances from Intermountain Healthcare between 2005 and 2018 used to form our model, we employed the extended method to automatically explain the prediction results of our model and recommend tailored interventions. The patient cohort consisted of all patients with asthma who received care at Intermountain Healthcare between 2005 and 2018, and resided in Utah or Idaho as recorded at the visit. Results: Our method explained the prediction results for 89.7% (391/436) of the patients with asthma who, per our model’s correct prediction, were likely to incur asthma hospital visits in the subsequent year. Conclusions: This study is the first to demonstrate the feasibility of automatically offering rule-formed explanations for the prediction results of any machine learning model on imbalanced tabular data without lowering the performance measures of the model. After further improvement, our asthma outcome prediction model coupled with the automatic explanation function could be used by clinicians to guide the allocation of limited asthma care management resources and the identification of appropriate interventions. %M 33382379 %R 10.2196/21965 %U http://medinform.jmir.org/2020/12/e21965/ %U https://doi.org/10.2196/21965 %U http://www.ncbi.nlm.nih.gov/pubmed/33382379 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 12 %P e22422 %T Deep Neural Network for Reducing the Screening Workload in Systematic Reviews for Clinical Guidelines: Algorithm Validation Study %A Yamada,Tomohide %A Yoneoka,Daisuke %A Hiraike,Yuta %A Hino,Kimihiro %A Toyoshiba,Hiroyoshi %A Shishido,Akira %A Noma,Hisashi %A Shojima,Nobuhiro %A Yamauchi,Toshimasa %+ University Institute for Population Health, King’s College London, Addison House, Guys Campus, London, SE1 1UL, United Kingdom, 44 (0)20 7848 6625, bqx07367@yahoo.co.jp %K machine learning %K evidence-based medicine %K systematic review %K meta-analysis %K clinical guideline %K deep learning %K neural network %D 2020 %7 30.12.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Performing systematic reviews is a time-consuming and resource-intensive process. Objective: We investigated whether a machine learning system could perform systematic reviews more efficiently. Methods: All systematic reviews and meta-analyses of interventional randomized controlled trials cited in recent clinical guidelines from the American Diabetes Association, American College of Cardiology, American Heart Association (2 guidelines), and American Stroke Association were assessed. After reproducing the primary screening data set according to the published search strategy of each, we extracted correct articles (those actually reviewed) and incorrect articles (those not reviewed) from the data set. These 2 sets of articles were used to train a neural network–based artificial intelligence engine (Concept Encoder, Fronteo Inc). The primary endpoint was work saved over sampling at 95% recall (WSS@95%). Results: Among 145 candidate reviews of randomized controlled trials, 8 reviews fulfilled the inclusion criteria. For these 8 reviews, the machine learning system significantly reduced the literature screening workload by at least 6-fold versus that of manual screening based on WSS@95%. When machine learning was initiated using 2 correct articles that were randomly selected by a researcher, a 10-fold reduction in workload was achieved versus that of manual screening based on the WSS@95% value, with high sensitivity for eligible studies. The area under the receiver operating characteristic curve increased dramatically every time the algorithm learned a correct article. Conclusions: Concept Encoder achieved a 10-fold reduction of the screening workload for systematic review after learning from 2 randomly selected studies on the target topic. However, few meta-analyses of randomized controlled trials were included. Concept Encoder could facilitate the acquisition of evidence for clinical guidelines. %M 33262102 %R 10.2196/22422 %U https://www.jmir.org/2020/12/e22422 %U https://doi.org/10.2196/22422 %U http://www.ncbi.nlm.nih.gov/pubmed/33262102 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 12 %P e25442 %T An Artificial Intelligence Model to Predict the Mortality of COVID-19 Patients at Hospital Admission Time Using Routine Blood Samples: Development and Validation of an Ensemble Model %A Ko,Hoon %A Chung,Heewon %A Kang,Wu Seong %A Park,Chul %A Kim,Do Wan %A Kim,Seong Eun %A Chung,Chi Ryang %A Ko,Ryoung Eun %A Lee,Hooseok %A Seo,Jae Ho %A Choi,Tae-Young %A Jaimes,Rafael %A Kim,Kyung Won %A Lee,Jinseok %+ Biomedical Engineering, Wonkwang University, Iksan Daero, Iksan, 54538, Republic of Korea, 82 1638506970, gonasago@gmail.com %K COVID-19 %K artificial intelligence %K blood samples %K mortality prediction %D 2020 %7 23.12.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: COVID-19, which is accompanied by acute respiratory distress, multiple organ failure, and death, has spread worldwide much faster than previously thought. However, at present, it has limited treatments. Objective: To overcome this issue, we developed an artificial intelligence (AI) model of COVID-19, named EDRnet (ensemble learning model based on deep neural network and random forest models), to predict in-hospital mortality using a routine blood sample at the time of hospital admission. Methods: We selected 28 blood biomarkers and used the age and gender information of patients as model inputs. To improve the mortality prediction, we adopted an ensemble approach combining deep neural network and random forest models. We trained our model with a database of blood samples from 361 COVID-19 patients in Wuhan, China, and applied it to 106 COVID-19 patients in three Korean medical institutions. Results: In the testing data sets, EDRnet provided high sensitivity (100%), specificity (91%), and accuracy (92%). To extend the number of patient data points, we developed a web application (BeatCOVID19) where anyone can access the model to predict mortality and can register his or her own blood laboratory results. Conclusions: Our new AI model, EDRnet, accurately predicts the mortality rate for COVID-19. It is publicly available and aims to help health care providers fight COVID-19 and improve patients’ outcomes. %M 33301414 %R 10.2196/25442 %U http://www.jmir.org/2020/12/e25442/ %U https://doi.org/10.2196/25442 %U http://www.ncbi.nlm.nih.gov/pubmed/33301414 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 12 %P e23082 %T Model-Based Reasoning of Clinical Diagnosis in Integrative Medicine: Real-World Methodological Study of Electronic Medical Records and Natural Language Processing Methods %A Geng,Wenye %A Qin,Xuanfeng %A Yang,Tao %A Cong,Zhilei %A Wang,Zhuo %A Kong,Qing %A Tang,Zihui %A Jiang,Lin %+ Department of Integrative Medicine, Fudan University Huashan Hospital, No 12 Urumuqi Mid Road, Shanghai, China, 86 021 5288 8236, dr_zhtang@yeah.net %K model-based reasoning %K integrative medicine %K electronic medical records %K natural language processing %D 2020 %7 21.12.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Integrative medicine is a form of medicine that combines practices and treatments from alternative medicine with conventional medicine. The diagnosis in integrative medicine involves the clinical diagnosis based on modern medicine and syndrome pattern diagnosis. Electronic medical records (EMRs) are the systematized collection of patients health information stored in a digital format that can be shared across different health care settings. Although syndrome and sign information or relative information can be extracted from the EMR and content texts can be mapped to computability vectors using natural language processing techniques, application of artificial intelligence techniques to support physicians in medical practices remains a major challenge. Objective: The purpose of this study was to investigate model-based reasoning (MBR) algorithms for the clinical diagnosis in integrative medicine based on EMRs and natural language processing. We also estimated the associations among the factors of sample size, number of syndrome pattern type, and diagnosis in modern medicine using the MBR algorithms. Methods: A total of 14,075 medical records of clinical cases were extracted from the EMRs as the development data set, and an external test data set consisting of 1000 medical records of clinical cases was extracted from independent EMRs. MBR methods based on word embedding, machine learning, and deep learning algorithms were developed for the automatic diagnosis of syndrome pattern in integrative medicine. MBR algorithms combining rule-based reasoning (RBR) were also developed. A standard evaluation metrics consisting of accuracy, precision, recall, and F1 score was used for the performance estimation of the methods. The association analyses were conducted on the sample size, number of syndrome pattern type, and diagnosis of lung diseases with the best algorithms. Results: The Word2Vec convolutional neural network (CNN) MBR algorithms showed high performance (accuracy of 0.9586 in the test data set) in the syndrome pattern diagnosis of lung diseases. The Word2Vec CNN MBR combined with RBR also showed high performance (accuracy of 0.9229 in the test data set). The diagnosis of lung diseases could enhance the performance of the Word2Vec CNN MBR algorithms. Each group sample size and syndrome pattern type affected the performance of these algorithms. Conclusions: The MBR methods based on Word2Vec and CNN showed high performance in the syndrome pattern diagnosis of lung diseases in integrative medicine. The parameters of each group’s sample size, syndrome pattern type, and diagnosis of lung diseases were associated with the performance of the methods. Trial Registration: ClinicalTrials.gov NCT03274908; https://clinicaltrials.gov/ct2/show/NCT03274908 %M 33346740 %R 10.2196/23082 %U http://medinform.jmir.org/2020/12/e23082/ %U https://doi.org/10.2196/23082 %U http://www.ncbi.nlm.nih.gov/pubmed/33346740 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 12 %P e19127 %T Technical Aspects of Developing Chatbots for Medical Applications: Scoping Review %A Safi,Zeineb %A Abd-Alrazaq,Alaa %A Khalifa,Mohamed %A Househ,Mowafa %+ Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, P.O. Box 34110, Doha Al Luqta St, Ar-Rayyan, Doha, Qatar, 974 55708549, mhouseh@hbku.edu.qa %K chatbots %K conversational agents %K medical applications %K scoping review %K technical aspects %D 2020 %7 18.12.2020 %9 Review %J J Med Internet Res %G English %X Background: Chatbots are applications that can conduct natural language conversations with users. In the medical field, chatbots have been developed and used to serve different purposes. They provide patients with timely information that can be critical in some scenarios, such as access to mental health resources. Since the development of the first chatbot, ELIZA, in the late 1960s, much effort has followed to produce chatbots for various health purposes developed in different ways. Objective: This study aimed to explore the technical aspects and development methodologies associated with chatbots used in the medical field to explain the best methods of development and support chatbot development researchers on their future work. Methods: We searched for relevant articles in 8 literature databases (IEEE, ACM, Springer, ScienceDirect, Embase, MEDLINE, PsycINFO, and Google Scholar). We also performed forward and backward reference checking of the selected articles. Study selection was performed by one reviewer, and 50% of the selected studies were randomly checked by a second reviewer. A narrative approach was used for result synthesis. Chatbots were classified based on the different technical aspects of their development. The main chatbot components were identified in addition to the different techniques for implementing each module. Results: The original search returned 2481 publications, of which we identified 45 studies that matched our inclusion and exclusion criteria. The most common language of communication between users and chatbots was English (n=23). We identified 4 main modules: text understanding module, dialog management module, database layer, and text generation module. The most common technique for developing text understanding and dialogue management is the pattern matching method (n=18 and n=25, respectively). The most common text generation is fixed output (n=36). Very few studies relied on generating original output. Most studies kept a medical knowledge base to be used by the chatbot for different purposes throughout the conversations. A few studies kept conversation scripts and collected user data and previous conversations. Conclusions: Many chatbots have been developed for medical use, at an increasing rate. There is a recent, apparent shift in adopting machine learning–based approaches for developing chatbot systems. Further research can be conducted to link clinical outcomes to different chatbot development techniques and technical characteristics. %M 33337337 %R 10.2196/19127 %U http://www.jmir.org/2020/12/e19127/ %U https://doi.org/10.2196/19127 %U http://www.ncbi.nlm.nih.gov/pubmed/33337337 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 12 %P e22649 %T Detecting Miscoded Diabetes Diagnosis Codes in Electronic Health Records for Quality Improvement: Temporal Deep Learning Approach %A Rashidian,Sina %A Abell-Hart,Kayley %A Hajagos,Janos %A Moffitt,Richard %A Lingam,Veena %A Garcia,Victor %A Tsai,Chao-Wei %A Wang,Fusheng %A Dong,Xinyu %A Sun,Siao %A Deng,Jianyuan %A Gupta,Rajarsi %A Miller,Joshua %A Saltz,Joel %A Saltz,Mary %+ Department of Computer Science, Stony Brook University, 2212 Computer Science, Stony Brook, NY, 11794, United States, 1 631 632 8470, srashidian@cs.stonybrook.edu %K electronic health records %K diabetes %K deep learning %D 2020 %7 17.12.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Diabetes affects more than 30 million patients across the United States. With such a large disease burden, even a small error in classification can be significant. Currently billing codes, assigned at the time of a medical encounter, are the “gold standard” reflecting the actual diseases present in an individual, and thus in aggregate reflect disease prevalence in the population. These codes are generated by highly trained coders and by health care providers but are not always accurate. Objective: This work provides a scalable deep learning methodology to more accurately classify individuals with diabetes across multiple health care systems. Methods: We leveraged a long short-term memory-dense neural network (LSTM-DNN) model to identify patients with or without diabetes using data from 5 acute care facilities with 187,187 patients and 275,407 encounters, incorporating data elements including laboratory test results, diagnostic/procedure codes, medications, demographic data, and admission information. Furthermore, a blinded physician panel reviewed discordant cases, providing an estimate of the total impact on the population. Results: When predicting the documented diagnosis of diabetes, our model achieved an 84% F1 score, 96% area under the curve–receiver operating characteristic curve, and 91% average precision on a heterogeneous data set from 5 distinct health facilities. However, in 81% of cases where the model disagreed with the documented phenotype, a blinded physician panel agreed with the model. Taken together, this suggests that 4.3% of our studied population have either missing or improper diabetes diagnosis. Conclusions: This study demonstrates that deep learning methods can improve clinical phenotyping even when patient data are noisy, sparse, and heterogeneous. %M 33331828 %R 10.2196/22649 %U http://medinform.jmir.org/2020/12/e22649/ %U https://doi.org/10.2196/22649 %U http://www.ncbi.nlm.nih.gov/pubmed/33331828 %0 Journal Article %@ 2562-7600 %I JMIR Publications %V 3 %N 1 %P e23939 %T Predicted Influences of Artificial Intelligence on the Domains of Nursing: Scoping Review %A Buchanan,Christine %A Howitt,M Lyndsay %A Wilson,Rita %A Booth,Richard G %A Risling,Tracie %A Bamford,Megan %+ Registered Nurses' Association of Ontario, 500-4211 Yonge Street, Toronto, ON, M2P 2A9, Canada, 1 800 268 7199 ext 281, cbuchanan@rnao.ca %K nursing %K artificial intelligence %K machine learning %K robotics %K patient-centered care %K review %D 2020 %7 17.12.2020 %9 Review %J JMIR Nursing %G English %X Background: Artificial intelligence (AI) is set to transform the health system, yet little research to date has explored its influence on nurses—the largest group of health professionals. Furthermore, there has been little discussion on how AI will influence the experience of person-centered compassionate care for patients, families, and caregivers. Objective: This review aims to summarize the extant literature on the emerging trends in health technologies powered by AI and their implications on the following domains of nursing: administration, clinical practice, policy, and research. This review summarizes the findings from 3 research questions, examining how these emerging trends might influence the roles and functions of nurses and compassionate nursing care over the next 10 years and beyond. Methods: Using an established scoping review methodology, MEDLINE, CINAHL, EMBASE, PsycINFO, Cochrane Database of Systematic Reviews, Cochrane Central, Education Resources Information Center, Scopus, Web of Science, and ProQuest databases were searched. In addition to the electronic database searches, a targeted website search was performed to access relevant gray literature. Abstracts and full-text studies were independently screened by 2 reviewers using prespecified inclusion and exclusion criteria. Included articles focused on nursing and digital health technologies that incorporate AI. Data were charted using structured forms and narratively summarized. Results: A total of 131 articles were retrieved from the scoping review for the 3 research questions that were the focus of this manuscript (118 from database sources and 13 from targeted websites). Emerging AI technologies discussed in the review included predictive analytics, smart homes, virtual health care assistants, and robots. The results indicated that AI has already begun to influence nursing roles, workflows, and the nurse-patient relationship. In general, robots are not viewed as replacements for nurses. There is a consensus that health technologies powered by AI may have the potential to enhance nursing practice. Consequently, nurses must proactively define how person-centered compassionate care will be preserved in the age of AI. Conclusions: Nurses have a shared responsibility to influence decisions related to the integration of AI into the health system and to ensure that this change is introduced in a way that is ethical and aligns with core nursing values such as compassionate care. Furthermore, nurses must advocate for patient and nursing involvement in all aspects of the design, implementation, and evaluation of these technologies. International Registered Report Identifier (IRRID): RR2-10.2196/17490 %M 34406963 %R 10.2196/23939 %U https://nursing.jmir.org/2020/1/e23939/ %U https://doi.org/10.2196/23939 %U http://www.ncbi.nlm.nih.gov/pubmed/34406963 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 12 %P e24478 %T Computing SARS-CoV-2 Infection Risk From Symptoms, Imaging, and Test Data: Diagnostic Model Development %A D'Ambrosia,Christopher %A Christensen,Henrik %A Aronoff-Spencer,Eliah %+ Division of Infectious Diseases and Global Public Health, School of Medicine, University of California San Diego, 9500 Gilman Drive 0711, San Diego, CA, 92101, United States, 1 6462348153, earonoffspencer@health.ucsd.edu %K health %K informatics %K computation %K COVID-19 %K infection %K risk %K symptom %K imaging %K diagnostic %K probability %K machine learning %K Bayesian %K model %D 2020 %7 16.12.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Assigning meaningful probabilities of SARS-CoV-2 infection risk presents a diagnostic challenge across the continuum of care. Objective: The aim of this study was to develop and clinically validate an adaptable, personalized diagnostic model to assist clinicians in ruling in and ruling out COVID-19 in potential patients. We compared the diagnostic performance of probabilistic, graphical, and machine learning models against a previously published benchmark model. Methods: We integrated patient symptoms and test data using machine learning and Bayesian inference to quantify individual patient risk of SARS-CoV-2 infection. We trained models with 100,000 simulated patient profiles based on 13 symptoms and estimated local prevalence, imaging, and molecular diagnostic performance from published reports. We tested these models with consecutive patients who presented with a COVID-19–compatible illness at the University of California San Diego Medical Center over the course of 14 days starting in March 2020. Results: We included 55 consecutive patients with fever (n=43, 78%) or cough (n=42, 77%) presenting for ambulatory (n=11, 20%) or hospital care (n=44, 80%). In total, 51% (n=28) were female and 49% (n=27) were aged <60 years. Common comorbidities included diabetes (n=12, 22%), hypertension (n=15, 27%), cancer (n=9, 16%), and cardiovascular disease (n=7, 13%). Of these, 69% (n=38) were confirmed via reverse transcription-polymerase chain reaction (RT-PCR) to be positive for SARS-CoV-2 infection, and 20% (n=11) had repeated negative nucleic acid testing and an alternate diagnosis. Bayesian inference network, distance metric learning, and ensemble models discriminated between patients with SARS-CoV-2 infection and alternate diagnoses with sensitivities of 81.6%-84.2%, specificities of 58.8%-70.6%, and accuracies of 61.4%-71.8%. After integrating imaging and laboratory test statistics with the predictions of the Bayesian inference network, changes in diagnostic uncertainty at each step in the simulated clinical evaluation process were highly sensitive to location, symptom, and diagnostic test choices. Conclusions: Decision support models that incorporate symptoms and available test results can help providers diagnose SARS-CoV-2 infection in real-world settings. %M 33301417 %R 10.2196/24478 %U http://www.jmir.org/2020/12/e24478/ %U https://doi.org/10.2196/24478 %U http://www.ncbi.nlm.nih.gov/pubmed/33301417 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 12 %P e18418 %T Limitations of Deep Learning Attention Mechanisms in Clinical Research: Empirical Case Study Based on the Korean Diabetic Disease Setting %A Kim,Junetae %A Lee,Sangwon %A Hwang,Eugene %A Ryu,Kwang Sun %A Jeong,Hanseok %A Lee,Jae Wook %A Hwangbo,Yul %A Choi,Kui Son %A Cha,Hyo Soung %+ Cancer Data Center, National Cancer Control Institute, National Cancer Center, 809 Madu 1(il)-dong, Ilsandong-gu, Goyang-si, Gyeonggi-do, 10408, Republic of Korea, 82 31 920 1892, kkido@ncc.re.kr %K attention %K deep learning %K explainable artificial intelligence %K uncertainty awareness %K Bayesian deep learning %K artificial intelligence %K health data %D 2020 %7 16.12.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Despite excellent prediction performance, noninterpretability has undermined the value of applying deep-learning algorithms in clinical practice. To overcome this limitation, attention mechanism has been introduced to clinical research as an explanatory modeling method. However, potential limitations of using this attractive method have not been clarified to clinical researchers. Furthermore, there has been a lack of introductory information explaining attention mechanisms to clinical researchers. Objective: The aim of this study was to introduce the basic concepts and design approaches of attention mechanisms. In addition, we aimed to empirically assess the potential limitations of current attention mechanisms in terms of prediction and interpretability performance. Methods: First, the basic concepts and several key considerations regarding attention mechanisms were identified. Second, four approaches to attention mechanisms were suggested according to a two-dimensional framework based on the degrees of freedom and uncertainty awareness. Third, the prediction performance, probability reliability, concentration of variable importance, consistency of attention results, and generalizability of attention results to conventional statistics were assessed in the diabetic classification modeling setting. Fourth, the potential limitations of attention mechanisms were considered. Results: Prediction performance was very high for all models. Probability reliability was high in models with uncertainty awareness. Variable importance was concentrated in several variables when uncertainty awareness was not considered. The consistency of attention results was high when uncertainty awareness was considered. The generalizability of attention results to conventional statistics was poor regardless of the modeling approach. Conclusions: The attention mechanism is an attractive technique with potential to be very promising in the future. However, it may not yet be desirable to rely on this method to assess variable importance in clinical settings. Therefore, along with theoretical studies enhancing attention mechanisms, more empirical studies investigating potential limitations should be encouraged. %M 33325832 %R 10.2196/18418 %U http://www.jmir.org/2020/12/e18418/ %U https://doi.org/10.2196/18418 %U http://www.ncbi.nlm.nih.gov/pubmed/33325832 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 12 %P e20756 %T Artificial Intelligence in the Fight Against COVID-19: Scoping Review %A Abd-Alrazaq,Alaa %A Alajlani,Mohannad %A Alhuwail,Dari %A Schneider,Jens %A Al-Kuwari,Saif %A Shah,Zubair %A Hamdi,Mounir %A Househ,Mowafa %+ Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, PO Box 5825, Doha Al Luqta St, Ar-Rayyan, Doha, , Qatar, 974 55708549, mhouseh@hbku.edu.qa %K artificial intelligence %K machine learning %K deep learning %K natural language processing %K coronavirus %K COVID-19 %K 2019-nCoV %K SARS-CoV-2 %D 2020 %7 15.12.2020 %9 Review %J J Med Internet Res %G English %X Background: In December 2019, COVID-19 broke out in Wuhan, China, leading to national and international disruptions in health care, business, education, transportation, and nearly every aspect of our daily lives. Artificial intelligence (AI) has been leveraged amid the COVID-19 pandemic; however, little is known about its use for supporting public health efforts. Objective: This scoping review aims to explore how AI technology is being used during the COVID-19 pandemic, as reported in the literature. Thus, it is the first review that describes and summarizes features of the identified AI techniques and data sets used for their development and validation. Methods: A scoping review was conducted following the guidelines of PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews). We searched the most commonly used electronic databases (eg, MEDLINE, EMBASE, and PsycInfo) between April 10 and 12, 2020. These terms were selected based on the target intervention (ie, AI) and the target disease (ie, COVID-19). Two reviewers independently conducted study selection and data extraction. A narrative approach was used to synthesize the extracted data. Results: We considered 82 studies out of the 435 retrieved studies. The most common use of AI was diagnosing COVID-19 cases based on various indicators. AI was also employed in drug and vaccine discovery or repurposing and for assessing their safety. Further, the included studies used AI for forecasting the epidemic development of COVID-19 and predicting its potential hosts and reservoirs. Researchers used AI for patient outcome–related tasks such as assessing the severity of COVID-19, predicting mortality risk, its associated factors, and the length of hospital stay. AI was used for infodemiology to raise awareness to use water, sanitation, and hygiene. The most prominent AI technique used was convolutional neural network, followed by support vector machine. Conclusions: The included studies showed that AI has the potential to fight against COVID-19. However, many of the proposed methods are not yet clinically accepted. Thus, the most rewarding research will be on methods promising value beyond COVID-19. More efforts are needed for developing standardized reporting protocols or guidelines for studies on AI. %M 33284779 %R 10.2196/20756 %U http://www.jmir.org/2020/12/e20756/ %U https://doi.org/10.2196/20756 %U http://www.ncbi.nlm.nih.gov/pubmed/33284779 %0 Journal Article %@ 2291-9279 %I JMIR Publications %V 8 %N 4 %P e24049 %T The Impact of Artificial Intelligence on the Chess World %A Duca Iliescu,Delia Monica %+ Transilvania University of Brasov, Bdul Eroilor 29, Brasov, Romania, 40 268413000, delia.duca@unitbv.ro %K artificial intelligence %K games %K chess %K AlphaZero %K MuZero %K cheat detection %K coronavirus %D 2020 %7 10.12.2020 %9 Viewpoint %J JMIR Serious Games %G English %X This paper focuses on key areas in which artificial intelligence has affected the chess world, including cheat detection methods, which are especially necessary recently, as there has been an unexpected rise in the popularity of online chess. Many major chess events that were to take place in 2020 have been canceled, but the global popularity of chess has in fact grown in recent months due to easier conversion of the game from offline to online formats compared with other games. Still, though a game of chess can be easily played online, there are some concerns about the increased chances of cheating. Artificial intelligence can address these concerns. %M 33300493 %R 10.2196/24049 %U http://games.jmir.org/2020/4/e24049/ %U https://doi.org/10.2196/24049 %U http://www.ncbi.nlm.nih.gov/pubmed/33300493 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 12 %P e18097 %T Evaluation of Four Artificial Intelligence–Assisted Self-Diagnosis Apps on Three Diagnoses: Two-Year Follow-Up Study %A Ćirković,Aleksandar %+ Schulgasse 21, Weiden, 92637, Germany, 49 1788603753, aleksandar.cirkovic@mailbox.org %K artificial intelligence %K machine learning %K mobile apps %K medical diagnosis %K mHealth %D 2020 %7 4.12.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Consumer-oriented mobile self-diagnosis apps have been developed using undisclosed algorithms, presumably based on machine learning and other artificial intelligence (AI) technologies. The US Food and Drug Administration now discerns apps with learning AI algorithms from those with stable ones and treats the former as medical devices. To the author’s knowledge, no self-diagnosis app testing has been performed in the field of ophthalmology so far. Objective: The objective of this study was to test apps that were previously mentioned in the scientific literature on a set of diagnoses in a deliberate time interval, comparing the results and looking for differences that hint at “nonlocked” learning algorithms. Methods: Four apps from the literature were chosen (Ada, Babylon, Buoy, and Your.MD). A set of three ophthalmology diagnoses (glaucoma, retinal tear, dry eye syndrome) representing three levels of urgency was used to simultaneously test the apps’ diagnostic efficiency and treatment recommendations in this specialty. Two years was the chosen time interval between the tests (2018 and 2020). Scores were awarded by one evaluating physician using a defined scheme. Results: Two apps (Ada and Your.MD) received significantly higher scores than the other two. All apps either worsened in their results between 2018 and 2020 or remained unchanged at a low level. The variation in the results over time indicates “nonlocked” learning algorithms using AI technologies. None of the apps provided correct diagnoses and treatment recommendations for all three diagnoses in 2020. Two apps (Babylon and Your.MD) asked significantly fewer questions than the other two (P<.001). Conclusions: “Nonlocked” algorithms are used by self-diagnosis apps. The diagnostic efficiency of the tested apps seems to worsen over time, with some apps being more capable than others. Systematic studies on a wider scale are necessary for health care providers and patients to correctly assess the safety and efficacy of such apps and for correct classification by health care regulating authorities. %M 33275113 %R 10.2196/18097 %U https://www.jmir.org/2020/12/e18097 %U https://doi.org/10.2196/18097 %U http://www.ncbi.nlm.nih.gov/pubmed/33275113 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 9 %N 12 %P e22996 %T An Artificial Intelligence–Based, Personalized Smartphone App to Improve Childhood Immunization Coverage and Timelines Among Children in Pakistan: Protocol for a Randomized Controlled Trial %A Kazi,Abdul Momin %A Qazi,Saad Ahmed %A Khawaja,Sadori %A Ahsan,Nazia %A Ahmed,Rao Moueed %A Sameen,Fareeha %A Khan Mughal,Muhammad Ayub %A Saqib,Muhammad %A Ali,Sikander %A Kaleemuddin,Hussain %A Rauf,Yasir %A Raza,Mehreen %A Jamal,Saima %A Abbasi,Munir %A Stergioulas,Lampros K %+ Department of Pediatrics and Child Health, Aga Khan University, Stadium Road, PO Box 3500, Karachi, 74800, Pakistan, 92 2134864232, momin.kazi@aku.edu %K artificial intelligence %K AI %K routine childhood immunization %K EPI %K LMICs %K mHealth %K Pakistan %K personalized messages %K routine immunization %K smartphone apps %K vaccine-preventable illnesses %D 2020 %7 4.12.2020 %9 Protocol %J JMIR Res Protoc %G English %X Background: The immunization uptake rates in Pakistan are much lower than desired. Major reasons include lack of awareness, parental forgetfulness regarding schedules, and misinformation regarding vaccines. In light of the COVID-19 pandemic and distancing measures, routine childhood immunization (RCI) coverage has been adversely affected, as caregivers avoid tertiary care hospitals or primary health centers. Innovative and cost-effective measures must be taken to understand and deal with the issue of low immunization rates. However, only a few smartphone-based interventions have been carried out in low- and middle-income countries (LMICs) to improve RCI. Objective: The primary objectives of this study are to evaluate whether a personalized mobile app can improve children’s on-time visits at 10 and 14 weeks of age for RCI as compared with standard care and to determine whether an artificial intelligence model can be incorporated into the app. Secondary objectives are to determine the perceptions and attitudes of caregivers regarding childhood vaccinations and to understand the factors that might influence the effect of a mobile phone–based app on vaccination improvement. Methods: A mixed methods randomized controlled trial was designed with intervention and control arms. The study will be conducted at the Aga Khan University Hospital vaccination center. Caregivers of newborns or infants visiting the center for their children’s 6-week vaccination will be recruited. The intervention arm will have access to a smartphone app with text, voice, video, and pictorial messages regarding RCI. This app will be developed based on the findings of the pretrial qualitative component of the study, in addition to no-show study findings, which will explore caregivers’ perceptions about RCI and a mobile phone–based app in improving RCI coverage. Results: Pretrial qualitative in-depth interviews were conducted in February 2020. Enrollment of study participants for the randomized controlled trial is in process. Study exit interviews will be conducted at the 14-week immunization visits, provided the caregivers visit the immunization facility at that time, or over the phone when the children are 18 weeks of age. Conclusions: This study will generate useful insights into the feasibility, acceptability, and usability of an Android-based smartphone app for improving RCI in Pakistan and in LMICs. Trial Registration: ClinicalTrials.gov NCT04449107; https://clinicaltrials.gov/ct2/show/NCT04449107 International Registered Report Identifier (IRRID): DERR1-10.2196/22996 %M 33274726 %R 10.2196/22996 %U https://www.researchprotocols.org/2020/12/e22996 %U https://doi.org/10.2196/22996 %U http://www.ncbi.nlm.nih.gov/pubmed/33274726 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 12 %P e24048 %T Development and External Validation of a Machine Learning Tool to Rule Out COVID-19 Among Adults in the Emergency Department Using Routine Blood Tests: A Large, Multicenter, Real-World Study %A Plante,Timothy B %A Blau,Aaron M %A Berg,Adrian N %A Weinberg,Aaron S %A Jun,Ik C %A Tapson,Victor F %A Kanigan,Tanya S %A Adib,Artur B %+ Larner College of Medicine at the University of Vermont, 360 S Park Drive, Suite 206B, Colchester, VT, 05446, United States, 1 802 656 3688, timothy.plante@uvm.edu %K COVID-19 %K SARS-CoV-2 %K machine learning %K artificial intelligence %K electronic medical records %K laboratory results %K development %K validation %K testing %K model %K emergency department %D 2020 %7 2.12.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Conventional diagnosis of COVID-19 with reverse transcription polymerase chain reaction (RT-PCR) testing (hereafter, PCR) is associated with prolonged time to diagnosis and significant costs to run the test. The SARS-CoV-2 virus might lead to characteristic patterns in the results of widely available, routine blood tests that could be identified with machine learning methodologies. Machine learning modalities integrating findings from these common laboratory test results might accelerate ruling out COVID-19 in emergency department patients. Objective: We sought to develop (ie, train and internally validate with cross-validation techniques) and externally validate a machine learning model to rule out COVID 19 using only routine blood tests among adults in emergency departments. Methods: Using clinical data from emergency departments (EDs) from 66 US hospitals before the pandemic (before the end of December 2019) or during the pandemic (March-July 2020), we included patients aged ≥20 years in the study time frame. We excluded those with missing laboratory results. Model training used 2183 PCR-confirmed cases from 43 hospitals during the pandemic; negative controls were 10,000 prepandemic patients from the same hospitals. External validation used 23 hospitals with 1020 PCR-confirmed cases and 171,734 prepandemic negative controls. The main outcome was COVID 19 status predicted using same-day routine laboratory results. Model performance was assessed with area under the receiver operating characteristic (AUROC) curve as well as sensitivity, specificity, and negative predictive value (NPV). Results: Of 192,779 patients included in the training, external validation, and sensitivity data sets (median age decile 50 [IQR 30-60] years, 40.5% male [78,249/192,779]), AUROC for training and external validation was 0.91 (95% CI 0.90-0.92). Using a risk score cutoff of 1.0 (out of 100) in the external validation data set, the model achieved sensitivity of 95.9% and specificity of 41.7%; with a cutoff of 2.0, sensitivity was 92.6% and specificity was 59.9%. At the cutoff of 2.0, the NPVs at a prevalence of 1%, 10%, and 20% were 99.9%, 98.6%, and 97%, respectively. Conclusions: A machine learning model developed with multicenter clinical data integrating commonly collected ED laboratory data demonstrated high rule-out accuracy for COVID-19 status, and might inform selective use of PCR-based testing. %M 33226957 %R 10.2196/24048 %U https://www.jmir.org/2020/12/e24048 %U https://doi.org/10.2196/24048 %U http://www.ncbi.nlm.nih.gov/pubmed/33226957 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 11 %P e23930 %T Machine Learning Electronic Health Record Identification of Patients with Rheumatoid Arthritis: Algorithm Pipeline Development and Validation Study %A Maarseveen,Tjardo D %A Meinderink,Timo %A Reinders,Marcel J T %A Knitza,Johannes %A Huizinga,Tom W J %A Kleyer,Arnd %A Simon,David %A van den Akker,Erik B %A Knevel,Rachel %+ Department of Rheumatology, Leiden University Medical Center, C1-R k. 41, Albinusdreef 2, Leiden, 2333 ZA, Netherlands, 31 611307780, R.Knevel@lumc.nl %K Supervised machine learning %K Electronic Health Records %K Natural Language Processing %K Support Vector Machine %K Gradient Boosting %K Rheumatoid Arthritis %D 2020 %7 30.11.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Financial codes are often used to extract diagnoses from electronic health records. This approach is prone to false positives. Alternatively, queries are constructed, but these are highly center and language specific. A tantalizing alternative is the automatic identification of patients by employing machine learning on format-free text entries. Objective: The aim of this study was to develop an easily implementable workflow that builds a machine learning algorithm capable of accurately identifying patients with rheumatoid arthritis from format-free text fields in electronic health records. Methods: Two electronic health record data sets were employed: Leiden (n=3000) and Erlangen (n=4771). Using a portion of the Leiden data (n=2000), we compared 6 different machine learning methods and a naïve word-matching algorithm using 10-fold cross-validation. Performances were compared using the area under the receiver operating characteristic curve (AUROC) and the area under the precision recall curve (AUPRC), and F1 score was used as the primary criterion for selecting the best method to build a classifying algorithm. We selected the optimal threshold of positive predictive value for case identification based on the output of the best method in the training data. This validation workflow was subsequently applied to a portion of the Erlangen data (n=4293). For testing, the best performing methods were applied to remaining data (Leiden n=1000; Erlangen n=478) for an unbiased evaluation. Results: For the Leiden data set, the word-matching algorithm demonstrated mixed performance (AUROC 0.90; AUPRC 0.33; F1 score 0.55), and 4 methods significantly outperformed word-matching, with support vector machines performing best (AUROC 0.98; AUPRC 0.88; F1 score 0.83). Applying this support vector machine classifier to the test data resulted in a similarly high performance (F1 score 0.81; positive predictive value [PPV] 0.94), and with this method, we could identify 2873 patients with rheumatoid arthritis in less than 7 seconds out of the complete collection of 23,300 patients in the Leiden electronic health record system. For the Erlangen data set, gradient boosting performed best (AUROC 0.94; AUPRC 0.85; F1 score 0.82) in the training set, and applied to the test data, resulted once again in good results (F1 score 0.67; PPV 0.97). Conclusions: We demonstrate that machine learning methods can extract the records of patients with rheumatoid arthritis from electronic health record data with high precision, allowing research on very large populations for limited costs. Our approach is language and center independent and could be applied to any type of diagnosis. We have developed our pipeline into a universally applicable and easy-to-implement workflow to equip centers with their own high-performing algorithm. This allows the creation of observational studies of unprecedented size covering different countries for low cost from already available data in electronic health record systems. %M 33252349 %R 10.2196/23930 %U http://medinform.jmir.org/2020/11/e23930/ %U https://doi.org/10.2196/23930 %U http://www.ncbi.nlm.nih.gov/pubmed/33252349 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 11 %P e20549 %T Use Characteristics and Triage Acuity of a Digital Symptom Checker in a Large Integrated Health System: Population-Based Descriptive Study %A Morse,Keith E %A Ostberg,Nicolai P %A Jones,Veena G %A Chan,Albert S %+ Department of Pediatrics, Stanford University School of Medicine, 750 Welch Road, Suite 315, Palo Alto, CA, 94304, United States, 1 650 723 5711, kmorse@stanfordchildrens.org %K symptom checker %K chatbot %K computer-assisted diagnosis %K diagnostic self-evaluation %K artificial intelligence %K self-care %K COVID-19 %D 2020 %7 30.11.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Pressure on the US health care system has been increasing due to a combination of aging populations, rising health care expenditures, and most recently, the COVID-19 pandemic. Responses to this pressure are hindered in part by reliance on a limited supply of highly trained health care professionals, creating a need for scalable technological solutions. Digital symptom checkers are artificial intelligence–supported software tools that use a conversational “chatbot” format to support rapid diagnosis and consistent triage. The COVID-19 pandemic has brought new attention to these tools due to the need to avoid face-to-face contact and preserve urgent care capacity. However, evidence-based deployment of these chatbots requires an understanding of user demographics and associated triage recommendations generated by a large general population. Objective: In this study, we evaluate the user demographics and levels of triage acuity provided by a symptom checker chatbot deployed in partnership with a large integrated health system in the United States. Methods: This population-based descriptive study included all web-based symptom assessments completed on the website and patient portal of the Sutter Health system (24 hospitals in Northern California) from April 24, 2019, to February 1, 2020. User demographics were compared to relevant US Census population data. Results: A total of 26,646 symptom assessments were completed during the study period. Most assessments (17,816/26,646, 66.9%) were completed by female users. The mean user age was 34.3 years (SD 14.4 years), compared to a median age of 37.3 years of the general population. The most common initial symptom was abdominal pain (2060/26,646, 7.7%). A substantial number of assessments (12,357/26,646, 46.4%) were completed outside of typical physician office hours. Most users were advised to seek medical care on the same day (7299/26,646, 27.4%) or within 2-3 days (6301/26,646, 23.6%). Over a quarter of the assessments indicated a high degree of urgency (7723/26,646, 29.0%). Conclusions: Users of the symptom checker chatbot were broadly representative of our patient population, although they skewed toward younger and female users. The triage recommendations were comparable to those of nurse-staffed telephone triage lines. Although the emergence of COVID-19 has increased the interest in remote medical assessment tools, it is important to take an evidence-based approach to their deployment. %M 33170799 %R 10.2196/20549 %U https://www.jmir.org/2020/11/e20549 %U https://doi.org/10.2196/20549 %U http://www.ncbi.nlm.nih.gov/pubmed/33170799 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 11 %P e19416 %T A Human-Algorithm Integration System for Hip Fracture Detection on Plain Radiography: System Development and Validation Study %A Cheng,Chi-Tung %A Chen,Chih-Chi %A Cheng,Fu-Jen %A Chen,Huan-Wu %A Su,Yi-Siang %A Yeh,Chun-Nan %A Chung,I-Fang %A Liao,Chien-Hung %+ Department of Trauma and Emergency Surgery, Linkou Chang Gung Memorial Hospital, Chang Gung University, Trauma Center, 5 Fuxin Street, Kweishan District, Taoyuan, 33328, Taiwan, 886 975365628, surgymet@gmail.com %K hip fracture %K neural network %K computer %K artificial intelligence %K algorithms %K human augmentation %K deep learning %K diagnosis %D 2020 %7 27.11.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Hip fracture is the most common type of fracture in elderly individuals. Numerous deep learning (DL) algorithms for plain pelvic radiographs (PXRs) have been applied to improve the accuracy of hip fracture diagnosis. However, their efficacy is still undetermined. Objective: The objective of this study is to develop and validate a human-algorithm integration (HAI) system to improve the accuracy of hip fracture diagnosis in a real clinical environment. Methods: The HAI system with hip fracture detection ability was developed using a deep learning algorithm trained on trauma registry data and 3605 PXRs from August 2008 to December 2016. To compare their diagnostic performance before and after HAI system assistance using an independent testing dataset, 34 physicians were recruited. We analyzed the physicians’ accuracy, sensitivity, specificity, and agreement with the algorithm; we also performed subgroup analyses according to physician specialty and experience. Furthermore, we applied the HAI system in the emergency departments of different hospitals to validate its value in the real world. Results: With the support of the algorithm, which achieved 91% accuracy, the diagnostic performance of physicians was significantly improved in the independent testing dataset, as was revealed by the sensitivity (physician alone, median 95%; HAI, median 99%; P<.001), specificity (physician alone, median 90%; HAI, median 95%; P<.001), accuracy (physician alone, median 90%; HAI, median 96%; P<.001), and human-algorithm agreement [physician alone κ, median 0.69 (IQR 0.63-0.74); HAI κ, median 0.80 (IQR 0.76-0.82); P<.001. With the help of the HAI system, the primary physicians showed significant improvement in their diagnostic performance to levels comparable to those of consulting physicians, and both the experienced and less-experienced physicians benefited from the HAI system. After the HAI system had been applied in 3 departments for 5 months, 587 images were examined. The sensitivity, specificity, and accuracy of the HAI system for detecting hip fractures were 97%, 95.7%, and 96.08%, respectively. Conclusions: HAI currently impacts health care, and integrating this technology into emergency departments is feasible. The developed HAI system can enhance physicians’ hip fracture diagnostic performance. %M 33245279 %R 10.2196/19416 %U http://medinform.jmir.org/2020/11/e19416/ %U https://doi.org/10.2196/19416 %U http://www.ncbi.nlm.nih.gov/pubmed/33245279 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 11 %P e23472 %T Deep Learning–Based Detection of Early Renal Function Impairment Using Retinal Fundus Images: Model Development and Validation %A Kang,Eugene Yu-Chuan %A Hsieh,Yi-Ting %A Li,Chien-Hung %A Huang,Yi-Jin %A Kuo,Chang-Fu %A Kang,Je-Ho %A Chen,Kuan-Jen %A Lai,Chi-Chun %A Wu,Wei-Chi %A Hwang,Yih-Shiou %+ Department of Ophthalmology, Chang Gung Memorial Hospital, Linkou Medical Center, No. 5, Fu-Hsin Rd., Taoyuan, 333, Taiwan, 886 3 3281200 ext 8666, yihshiou.hwang@gmail.com %K deep learning %K renal function %K retinal fundus image %K diabetes %K renal %K kidney %K retinal %K eye %K imaging %K impairment %K detection %K development %K validation %K model %D 2020 %7 26.11.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Retinal imaging has been applied for detecting eye diseases and cardiovascular risks using deep learning–based methods. Furthermore, retinal microvascular and structural changes were found in renal function impairments. However, a deep learning–based method using retinal images for detecting early renal function impairment has not yet been well studied. Objective: This study aimed to develop and evaluate a deep learning model for detecting early renal function impairment using retinal fundus images. Methods: This retrospective study enrolled patients who underwent renal function tests with color fundus images captured at any time between January 1, 2001, and August 31, 2019. A deep learning model was constructed to detect impaired renal function from the images. Early renal function impairment was defined as estimated glomerular filtration rate <90 mL/min/1.73 m2. Model performance was evaluated with respect to the receiver operating characteristic curve and area under the curve (AUC). Results: In total, 25,706 retinal fundus images were obtained from 6212 patients for the study period. The images were divided at an 8:1:1 ratio. The training, validation, and testing data sets respectively contained 20,787, 2189, and 2730 images from 4970, 621, and 621 patients. There were 10,686 and 15,020 images determined to indicate normal and impaired renal function, respectively. The AUC of the model was 0.81 in the overall population. In subgroups stratified by serum hemoglobin A1c (HbA1c) level, the AUCs were 0.81, 0.84, 0.85, and 0.87 for the HbA1c levels of ≤6.5%, >6.5%, >7.5%, and >10%, respectively. Conclusions: The deep learning model in this study enables the detection of early renal function impairment using retinal fundus images. The model was more accurate for patients with elevated serum HbA1c levels. %M 33139242 %R 10.2196/23472 %U http://medinform.jmir.org/2020/11/e23472/ %U https://doi.org/10.2196/23472 %U http://www.ncbi.nlm.nih.gov/pubmed/33139242 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 11 %P e18563 %T Automated Diagnosis of Various Gastrointestinal Lesions Using a Deep Learning–Based Classification and Retrieval Framework With a Large Endoscopic Database: Model Development and Validation %A Owais,Muhammad %A Arsalan,Muhammad %A Mahmood,Tahir %A Kang,Jin Kyu %A Park,Kang Ryoung %+ Division of Electronics and Electrical Engineering, Dongguk University, 30 Pildong-ro 1-gil, Jung-gu, Seoul, 04620, Republic of Korea, 82 10 3111 7022, parkgr@dgu.edu %K artificial intelligence %K endoscopic video retrieval %K content-based medical image retrieval %K polyp detection %K deep learning %K computer-aided diagnosis %D 2020 %7 26.11.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: The early diagnosis of various gastrointestinal diseases can lead to effective treatment and reduce the risk of many life-threatening conditions. Unfortunately, various small gastrointestinal lesions are undetectable during early-stage examination by medical experts. In previous studies, various deep learning–based computer-aided diagnosis tools have been used to make a significant contribution to the effective diagnosis and treatment of gastrointestinal diseases. However, most of these methods were designed to detect a limited number of gastrointestinal diseases, such as polyps, tumors, or cancers, in a specific part of the human gastrointestinal tract. Objective: This study aimed to develop a comprehensive computer-aided diagnosis tool to assist medical experts in diagnosing various types of gastrointestinal diseases. Methods: Our proposed framework comprises a deep learning–based classification network followed by a retrieval method. In the first step, the classification network predicts the disease type for the current medical condition. Then, the retrieval part of the framework shows the relevant cases (endoscopic images) from the previous database. These past cases help the medical expert validate the current computer prediction subjectively, which ultimately results in better diagnosis and treatment. Results: All the experiments were performed using 2 endoscopic data sets with a total of 52,471 frames and 37 different classes. The optimal performances obtained by our proposed method in accuracy, F1 score, mean average precision, and mean average recall were 96.19%, 96.99%, 98.18%, and 95.86%, respectively. The overall performance of our proposed diagnostic framework substantially outperformed state-of-the-art methods. Conclusions: This study provides a comprehensive computer-aided diagnosis framework for identifying various types of gastrointestinal diseases. The results show the superiority of our proposed method over various other recent methods and illustrate its potential for clinical diagnosis and treatment. Our proposed network can be applicable to other classification domains in medical imaging, such as computed tomography scans, magnetic resonance imaging, and ultrasound sequences. %M 33242010 %R 10.2196/18563 %U http://www.jmir.org/2020/11/e18563/ %U https://doi.org/10.2196/18563 %U http://www.ncbi.nlm.nih.gov/pubmed/33242010 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 11 %P e20031 %T Web- and Artificial Intelligence–Based Image Recognition For Sperm Motility Analysis: Verification Study %A Tsai,Vincent FS %A Zhuang,Bin %A Pong,Yuan-Hung %A Hsieh,Ju-Ton %A Chang,Hong-Chiang %+ Department of Urology, National Taiwan University Hospital, 7, Zhung-Shan S. Road, Taipei, 100, Taiwan, 886 223123456 ext 62135, bird8873@gmail.com %K Male infertility %K semen analysis %K home sperm test %K smartphone %K artificial intelligence %K cloud computing %K telemedicine %D 2020 %7 19.11.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Human sperm quality fluctuates over time. Therefore, it is crucial for couples preparing for natural pregnancy to monitor sperm motility. Objective: This study verified the performance of an artificial intelligence–based image recognition and cloud computing sperm motility testing system (Bemaner, Createcare) composed of microscope and microfluidic modules and designed to adapt to different types of smartphones. Methods: Sperm videos were captured and uploaded to the cloud with an app. Analysis of sperm motility was performed by an artificial intelligence–based image recognition algorithm then results were displayed. According to the number of motile sperm in the vision field, 47 (deidentified) videos of sperm were scored using 6 grades (0-5) by a male-fertility expert with 10 years of experience. Pearson product-moment correlation was calculated between the grades and the results (concentration of total sperm, concentration of motile sperm, and motility percentage) computed by the system. Results: Good correlation was demonstrated between the grades and results computed by the system for concentration of total sperm (r=0.65, P<.001), concentration of motile sperm (r=0.84, P<.001), and motility percentage (r=0.90, P<.001). Conclusions: This smartphone-based sperm motility test (Bemaner) accurately measures motility-related parameters and could potentially be applied toward the following fields: male infertility detection, sperm quality test during preparation for pregnancy, and infertility treatment monitoring. With frequent at-home testing, more data can be collected to help make clinical decisions and to conduct epidemiological research. %M 33211025 %R 10.2196/20031 %U http://medinform.jmir.org/2020/11/e20031/ %U https://doi.org/10.2196/20031 %U http://www.ncbi.nlm.nih.gov/pubmed/33211025 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 11 %P e24163 %T Development of an Artificial Intelligence–Based Automated Recommendation System for Clinical Laboratory Tests: Retrospective Analysis of the National Health Insurance Database %A Islam,Md Mohaimenul %A Yang,Hsuan-Chia %A Poly,Tahmina Nasrin %A Li,Yu-Chuan Jack %+ Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, 250 Wu-Hsing St., Taipei 110, Taipei, Taiwan, 886 2 27361661 ext 7600, jaak88@gmail.com %K artificial intelligence %K deep learning %K clinical decision-support system %K laboratory test %K patient safety %D 2020 %7 18.11.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Laboratory tests are considered an essential part of patient safety as patients’ screening, diagnosis, and follow-up are solely based on laboratory tests. Diagnosis of patients could be wrong, missed, or delayed if laboratory tests are performed erroneously. However, recognizing the value of correct laboratory test ordering remains underestimated by policymakers and clinicians. Nowadays, artificial intelligence methods such as machine learning and deep learning (DL) have been extensively used as powerful tools for pattern recognition in large data sets. Therefore, developing an automated laboratory test recommendation tool using available data from electronic health records (EHRs) could support current clinical practice. Objective: The objective of this study was to develop an artificial intelligence–based automated model that can provide laboratory tests recommendation based on simple variables available in EHRs. Methods: A retrospective analysis of the National Health Insurance database between January 1, 2013, and December 31, 2013, was performed. We reviewed the record of all patients who visited the cardiology department at least once and were prescribed laboratory tests. The data set was split into training and testing sets (80:20) to develop the DL model. In the internal validation, 25% of data were randomly selected from the training set to evaluate the performance of this model. Results: We used the area under the receiver operating characteristic curve, precision, recall, and hamming loss as comparative measures. A total of 129,938 prescriptions were used in our model. The DL-based automated recommendation system for laboratory tests achieved a significantly higher area under the receiver operating characteristic curve (AUROCmacro and AUROCmicro of 0.76 and 0.87, respectively). Using a low cutoff, the model identified appropriate laboratory tests with 99% sensitivity. Conclusions: The developed artificial intelligence model based on DL exhibited good discriminative capability for predicting laboratory tests using routinely collected EHR data. Utilization of DL approaches can facilitate optimal laboratory test selection for patients, which may in turn improve patient safety. However, future study is recommended to assess the cost-effectiveness for implementing this model in real-world clinical settings. %M 33206057 %R 10.2196/24163 %U https://medinform.jmir.org/2020/11/e24163 %U https://doi.org/10.2196/24163 %U http://www.ncbi.nlm.nih.gov/pubmed/33206057 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 11 %P e23315 %T Economic Value of Data and Analytics for Health Care Providers: Hermeneutic Systematic Literature Review %A von Wedel,Philip %A Hagist,Christian %+ Chair of Economic and Social Policy, WHU - Otto Beisheim School of Management, Burgplatz 2, Vallendar, 56179, Germany, 49 02616509 ext 255, philip.wedel@whu.edu %K digital health %K health information technology %K healthcare provider economics %K electronic health records %K data analytics %K artificial intelligence %D 2020 %7 18.11.2020 %9 Review %J J Med Internet Res %G English %X Background: The benefits of data and analytics for health care systems and single providers is an increasingly investigated field in digital health literature. Electronic health records (EHR), for example, can improve quality of care. Emerging analytics tools based on artificial intelligence show the potential to assist physicians in day-to-day workflows. Yet, single health care providers also need information regarding the economic impact when deciding on potential adoption of these tools. Objective: This paper examines the question of whether data and analytics provide economic advantages or disadvantages for health care providers. The goal is to provide a comprehensive overview including a variety of technologies beyond computer-based patient records. Ultimately, findings are also intended to determine whether economic barriers for adoption by providers could exist. Methods: A systematic literature search of the PubMed and Google Scholar online databases was conducted, following the hermeneutic methodology that encourages iterative search and interpretation cycles. After applying inclusion and exclusion criteria to 165 initially identified studies, 50 were included for qualitative synthesis and topic-based clustering. Results: The review identified 5 major technology categories, namely EHRs (n=30), computerized clinical decision support (n=8), advanced analytics (n=5), business analytics (n=5), and telemedicine (n=2). Overall, 62% (31/50) of the reviewed studies indicated a positive economic impact for providers either via direct cost or revenue effects or via indirect efficiency or productivity improvements. When differentiating between categories, however, an ambiguous picture emerged for EHR, whereas analytics technologies like computerized clinical decision support and advanced analytics predominantly showed economic benefits. Conclusions: The research question of whether data and analytics create economic benefits for health care providers cannot be answered uniformly. The results indicate ambiguous effects for EHRs, here representing data, and mainly positive effects for the significantly less studied analytics field. The mixed results regarding EHRs can create an economic barrier for adoption by providers. This barrier can translate into a bottleneck to positive economic effects of analytics technologies relying on EHR data. Ultimately, more research on economic effects of technologies other than EHRs is needed to generate a more reliable evidence base. %M 33206056 %R 10.2196/23315 %U http://www.jmir.org/2020/11/e23315/ %U https://doi.org/10.2196/23315 %U http://www.ncbi.nlm.nih.gov/pubmed/33206056 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 11 %P e19805 %T Deep Learning Methodology for Differentiating Glioma Recurrence From Radiation Necrosis Using Multimodal Magnetic Resonance Imaging: Algorithm Development and Validation %A Gao,Yang %A Xiao,Xiong %A Han,Bangcheng %A Li,Guilin %A Ning,Xiaolin %A Wang,Defeng %A Cai,Weidong %A Kikinis,Ron %A Berkovsky,Shlomo %A Di Ieva,Antonio %A Zhang,Liwei %A Ji,Nan %A Liu,Sidong %+ Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, 75 Talavera Road, Macquarie Park, Sydney, 2113, Australia, 61 29852729, dr.sidong.liu@gmail.com %K recurrent tumor %K radiation necrosis %K progression %K pseudoprogression %K multimodal MRI %K deep learning %D 2020 %7 17.11.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: The radiological differential diagnosis between tumor recurrence and radiation-induced necrosis (ie, pseudoprogression) is of paramount importance in the management of glioma patients. Objective: This research aims to develop a deep learning methodology for automated differentiation of tumor recurrence from radiation necrosis based on routine magnetic resonance imaging (MRI) scans. Methods: In this retrospective study, 146 patients who underwent radiation therapy after glioma resection and presented with suspected recurrent lesions at the follow-up MRI examination were selected for analysis. Routine MRI scans were acquired from each patient, including T1, T2, and gadolinium-contrast-enhanced T1 sequences. Of those cases, 96 (65.8%) were confirmed as glioma recurrence on postsurgical pathological examination, while 50 (34.2%) were diagnosed as necrosis. A light-weighted deep neural network (DNN) (ie, efficient radionecrosis neural network [ERN-Net]) was proposed to learn radiological features of gliomas and necrosis from MRI scans. Sensitivity, specificity, accuracy, and area under the curve (AUC) were used to evaluate performance of the model in both image-wise and subject-wise classifications. Preoperative diagnostic performance of the model was also compared to that of the state-of-the-art DNN models and five experienced neurosurgeons. Results: DNN models based on multimodal MRI outperformed single-modal models. ERN-Net achieved the highest AUC in both image-wise (0.915) and subject-wise (0.958) classification tasks. The evaluated DNN models achieved an average sensitivity of 0.947 (SD 0.033), specificity of 0.817 (SD 0.075), and accuracy of 0.903 (SD 0.026), which were significantly better than the tested neurosurgeons (P=.02 in sensitivity and P<.001 in specificity and accuracy). Conclusions: Deep learning offers a useful computational tool for the differential diagnosis between recurrent gliomas and necrosis. The proposed ERN-Net model, a simple and effective DNN model, achieved excellent performance on routine MRI scans and showed a high clinical applicability. %M 33200991 %R 10.2196/19805 %U http://medinform.jmir.org/2020/11/e19805/ %U https://doi.org/10.2196/19805 %U http://www.ncbi.nlm.nih.gov/pubmed/33200991 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 11 %P e15185 %T Physicians’ Perceptions of the Use of a Chatbot for Information Seeking: Qualitative Study %A Koman,Jason %A Fauvelle,Khristina %A Schuck,Stéphane %A Texier,Nathalie %A Mebarki,Adel %+ Sanofi Aventis, 82, avenue Raspail, Gentilly Cedex, 94255, France, 33 772219558, khristina.fauvelle@sanofi.com %K health %K digital health %K innovation %K conversational agent %K decision support system %K qualitative research %K chatbot %K bot %K medical drugs %K prescription %K risk minimization measures %D 2020 %7 10.11.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Seeking medical information can be an issue for physicians. In the specific context of medical practice, chatbots are hypothesized to present additional value for providing information quickly, particularly as far as drug risk minimization measures are concerned. Objective: This qualitative study aimed to elicit physicians’ perceptions of a pilot version of a chatbot used in the context of drug information and risk minimization measures. Methods: General practitioners and specialists were recruited across France to participate in individual semistructured interviews. Interviews were recorded, transcribed, and analyzed using a horizontal thematic analysis approach. Results: Eight general practitioners and 2 specialists participated. The tone and ergonomics of the pilot version were appreciated by physicians. However, all participants emphasized the importance of getting exhaustive, trustworthy answers when interacting with a chatbot. Conclusions: The chatbot was perceived as a useful and innovative tool that could easily be integrated into routine medical practice and could help health professionals when seeking information on drug and risk minimization measures. %M 33170134 %R 10.2196/15185 %U http://www.jmir.org/2020/11/e15185/ %U https://doi.org/10.2196/15185 %U http://www.ncbi.nlm.nih.gov/pubmed/33170134 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 9 %N 11 %P e21659 %T Artificial Intelligence–Powered Smartphone App to Facilitate Medication Adherence: Protocol for a Human Factors Design Study %A Roosan,Don %A Chok,Jay %A Karim,Mazharul %A Law,Anandi V %A Baskys,Andrius %A Hwang,Angela %A Roosan,Moom R %+ Department of Pharmacy Practice and Administration, College of Pharmacy, Western University of Health Sciences, 309 E 2nd St, Pomona, CA, 91766, United States, 1 9094698778, droosan@westernu.edu %K artificial intelligence %K smartphone app %K patient cognition %K complex medication information %K medication adherence %K machine learning %K mobile phone %D 2020 %7 9.11.2020 %9 Protocol %J JMIR Res Protoc %G English %X Background: Medication Guides consisting of crucial interactions and side effects are extensive and complex. Due to the exhaustive information, patients do not retain the necessary medication information, which can result in hospitalizations and medication nonadherence. A gap exists in understanding patients’ cognition of managing complex medication information. However, advancements in technology and artificial intelligence (AI) allow us to understand patient cognitive processes to design an app to better provide important medication information to patients. Objective: Our objective is to improve the design of an innovative AI- and human factor–based interface that supports patients’ medication information comprehension that could potentially improve medication adherence. Methods: This study has three aims. Aim 1 has three phases: (1) an observational study to understand patient perception of fear and biases regarding medication information, (2) an eye-tracking study to understand the attention locus for medication information, and (3) a psychological refractory period (PRP) paradigm study to understand functionalities. Observational data will be collected, such as audio and video recordings, gaze mapping, and time from PRP. A total of 50 patients, aged 18-65 years, who started at least one new medication, for which we developed visualization information, and who have a cognitive status of 34 during cognitive screening using the TICS-M test and health literacy level will be included in this aim of the study. In Aim 2, we will iteratively design and evaluate an AI-powered medication information visualization interface as a smartphone app with the knowledge gained from each component of Aim 1. The interface will be assessed through two usability surveys. A total of 300 patients, aged 18-65 years, with diabetes, cardiovascular diseases, or mental health disorders, will be recruited for the surveys. Data from the surveys will be analyzed through exploratory factor analysis. In Aim 3, in order to test the prototype, there will be a two-arm study design. This aim will include 900 patients, aged 18-65 years, with internet access, without any cognitive impairment, and with at least two medications. Patients will be sequentially randomized. Three surveys will be used to assess the primary outcome of medication information comprehension and the secondary outcome of medication adherence at 12 weeks. Results: Preliminary data collection will be conducted in 2021, and results are expected to be published in 2022. Conclusions: This study will lead the future of AI-based, innovative, digital interface design and aid in improving medication comprehension, which may improve medication adherence. The results from this study will also open up future research opportunities in understanding how patients manage complex medication information and will inform the format and design for innovative, AI-powered digital interfaces for Medication Guides. International Registered Report Identifier (IRRID): PRR1-10.2196/21659 %M 33164898 %R 10.2196/21659 %U http://www.researchprotocols.org/2020/11/e21659/ %U https://doi.org/10.2196/21659 %U http://www.ncbi.nlm.nih.gov/pubmed/33164898 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 11 %P e21252 %T Patient Triage by Topic Modeling of Referral Letters: Feasibility Study %A Spasic,Irena %A Button,Kate %+ School of Computer Science & Informatics, Cardiff University, 5 The Parade, Cardiff, CF24 3AA, United Kingdom, 44 02920870320, spasici@cardiff.ac.uk %K natural language processing %K machine learning %K data science %K medical informatics %K computer-assisted decision making %D 2020 %7 6.11.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Musculoskeletal conditions are managed within primary care, but patients can be referred to secondary care if a specialist opinion is required. The ever-increasing demand for health care resources emphasizes the need to streamline care pathways with the ultimate aim of ensuring that patients receive timely and optimal care. Information contained in referral letters underpins the referral decision-making process but is yet to be explored systematically for the purposes of treatment prioritization for musculoskeletal conditions. Objective: This study aims to explore the feasibility of using natural language processing and machine learning to automate the triage of patients with musculoskeletal conditions by analyzing information from referral letters. Specifically, we aim to determine whether referral letters can be automatically assorted into latent topics that are clinically relevant, that is, considered relevant when prescribing treatments. Here, clinical relevance is assessed by posing 2 research questions. Can latent topics be used to automatically predict treatment? Can clinicians interpret latent topics as cohorts of patients who share common characteristics or experiences such as medical history, demographics, and possible treatments? Methods: We used latent Dirichlet allocation to model each referral letter as a finite mixture over an underlying set of topics and model each topic as an infinite mixture over an underlying set of topic probabilities. The topic model was evaluated in the context of automating patient triage. Given a set of treatment outcomes, a binary classifier was trained for each outcome using previously extracted topics as the input features of the machine learning algorithm. In addition, a qualitative evaluation was performed to assess the human interpretability of topics. Results: The prediction accuracy of binary classifiers outperformed the stratified random classifier by a large margin, indicating that topic modeling could be used to predict the treatment, thus effectively supporting patient triage. The qualitative evaluation confirmed the high clinical interpretability of the topic model. Conclusions: The results established the feasibility of using natural language processing and machine learning to automate triage of patients with knee or hip pain by analyzing information from their referral letters. %M 33155985 %R 10.2196/21252 %U https://medinform.jmir.org/2020/11/e21252 %U https://doi.org/10.2196/21252 %U http://www.ncbi.nlm.nih.gov/pubmed/33155985 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 11 %P e20251 %T Engaging Unmotivated Smokers to Move Toward Quitting: Design of Motivational Interviewing–Based Chatbot Through Iterative Interactions %A Almusharraf,Fahad %A Rose,Jonathan %A Selby,Peter %+ The Edward S. Rogers Sr. Department of Electrical & Computer Engineering, Faculty of Applied Science & Engineering, University of Toronto, 10 King's College Road, Toronto, ON, M5S 3G4, Canada, 1 4169786992, jonathan.rose@ece.utoronto.ca %K smoking cessation %K motivational interviewing %K chatbot %K natural language processing %D 2020 %7 3.11.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: At any given time, most smokers in a population are ambivalent with no motivation to quit. Motivational interviewing (MI) is an evidence-based technique that aims to elicit change in ambivalent smokers. MI practitioners are scarce and expensive, and smokers are difficult to reach. Smokers are potentially reachable through the web, and if an automated chatbot could emulate an MI conversation, it could form the basis of a low-cost and scalable intervention motivating smokers to quit. Objective: The primary goal of this study is to design, train, and test an automated MI-based chatbot capable of eliciting reflection in a conversation with cigarette smokers. This study describes the process of collecting training data to improve the chatbot’s ability to generate MI-oriented responses, particularly reflections and summary statements. The secondary goal of this study is to observe the effects on participants through voluntary feedback given after completing a conversation with the chatbot. Methods: An interdisciplinary collaboration between an MI expert and experts in computer engineering and natural language processing (NLP) co-designed the conversation and algorithms underlying the chatbot. A sample of 121 adult cigarette smokers in 11 successive groups were recruited from a web-based platform for a single-arm prospective iterative design study. The chatbot was designed to stimulate reflections on the pros and cons of smoking using MI’s running head start technique. Participants were also asked to confirm the chatbot’s classification of their free-form responses to measure the classification accuracy of the underlying NLP models. Each group provided responses that were used to train the chatbot for the next group. Results: A total of 6568 responses from 121 participants in 11 successive groups over 14 weeks were received. From these responses, we were able to isolate 21 unique reasons for and against smoking and the relative frequency of each. The gradual collection of responses as inputs and smoking reasons as labels over the 11 iterations improved the F1 score of the classification within the chatbot from 0.63 in the first group to 0.82 in the final group. The mean time spent by each participant interacting with the chatbot was 21.3 (SD 14.0) min (minimum 6.4 and maximum 89.2). We also found that 34.7% (42/121) of participants enjoyed the interaction with the chatbot, and 8.3% (10/121) of participants noted explicit smoking cessation benefits from the conversation in voluntary feedback that did not solicit this explicitly. Conclusions: Recruiting ambivalent smokers through the web is a viable method to train a chatbot to increase accuracy in reflection and summary statements, the building blocks of MI. A new set of 21 smoking reasons (both for and against) has been identified. Initial feedback from smokers on the experience shows promise toward using it in an intervention. %M 33141095 %R 10.2196/20251 %U https://www.jmir.org/2020/11/e20251 %U https://doi.org/10.2196/20251 %U http://www.ncbi.nlm.nih.gov/pubmed/33141095 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 11 %P e19548 %T Classification of Depression Through Resting-State Electroencephalogram as a Novel Practice in Psychiatry: Review %A Čukić,Milena %A López,Victoria %A Pavón,Juan %+ HealthInc 3EGA, Amsterdam Health and Technology Institute, Koningin Wilhelminaplein 644, Amsterdam, 1062 KS, Netherlands, 31 615178926, micu@3ega.nl %K computational psychiatry %K physiological complexity %K machine learning %K theory-driven approach %K resting-state EEG %K personalized medicine %K computational neuroscience %K unwarranted optimism %D 2020 %7 3.11.2020 %9 Review %J J Med Internet Res %G English %X Background: Machine learning applications in health care have increased considerably in the recent past, and this review focuses on an important application in psychiatry related to the detection of depression. Since the advent of computational psychiatry, research based on functional magnetic resonance imaging has yielded remarkable results, but these tools tend to be too expensive for everyday clinical use. Objective: This review focuses on an affordable data-driven approach based on electroencephalographic recordings. Web-based applications via public or private cloud-based platforms would be a logical next step. We aim to compare several different approaches to the detection of depression from electroencephalographic recordings using various features and machine learning models. Methods: To detect depression, we reviewed published detection studies based on resting-state electroencephalogram with final machine learning, and to predict therapy outcomes, we reviewed a set of interventional studies using some form of stimulation in their methodology. Results: We reviewed 14 detection studies and 12 interventional studies published between 2008 and 2019. As direct comparison was not possible due to the large diversity of theoretical approaches and methods used, we compared them based on the steps in analysis and accuracies yielded. In addition, we compared possible drawbacks in terms of sample size, feature extraction, feature selection, classification, internal and external validation, and possible unwarranted optimism and reproducibility. In addition, we suggested desirable practices to avoid misinterpretation of results and optimism. Conclusions: This review shows the need for larger data sets and more systematic procedures to improve the use of the solution for clinical diagnostics. Therefore, regulation of the pipeline and standard requirements for methodology used should become mandatory to increase the reliability and accuracy of the complete methodology for it to be translated to modern psychiatry. %M 33141088 %R 10.2196/19548 %U https://www.jmir.org/2020/11/e19548 %U https://doi.org/10.2196/19548 %U http://www.ncbi.nlm.nih.gov/pubmed/33141088 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 10 %P e18273 %T Exploring Eating Disorder Topics on Twitter: Machine Learning Approach %A Zhou,Sicheng %A Zhao,Yunpeng %A Bian,Jiang %A Haynos,Ann F %A Zhang,Rui %+ Institute for Health Informatics, University of Minnesota, 8-100 Phillips-Wangensteen Building, 516 Delaware Street SE, Minneapolis, MN, 55455, United States, 1 612 626 4209, zhan1386@umn.edu %K eating disorders %K topic modeling %K text classification %K social media %K public health %D 2020 %7 30.10.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Eating disorders (EDs) are a group of mental illnesses that have an adverse effect on both mental and physical health. As social media platforms (eg, Twitter) have become an important data source for public health research, some studies have qualitatively explored the ways in which EDs are discussed on these platforms. Initial results suggest that such research offers a promising method for further understanding this group of diseases. Nevertheless, an efficient computational method is needed to further identify and analyze tweets relevant to EDs on a larger scale. Objective: This study aims to develop and validate a machine learning–based classifier to identify tweets related to EDs and to explore factors (ie, topics) related to EDs using a topic modeling method. Methods: We collected potential ED-relevant tweets using keywords from previous studies and annotated these tweets into different groups (ie, ED relevant vs irrelevant and then promotional information vs laypeople discussion). Several supervised machine learning methods, such as convolutional neural network (CNN), long short-term memory (LSTM), support vector machine, and naïve Bayes, were developed and evaluated using annotated data. We used the classifier with the best performance to identify ED-relevant tweets and applied a topic modeling method—Correlation Explanation (CorEx)—to analyze the content of the identified tweets. To validate these machine learning results, we also collected a cohort of ED-relevant tweets on the basis of manually curated rules. Results: A total of 123,977 tweets were collected during the set period. We randomly annotated 2219 tweets for developing the machine learning classifiers. We developed a CNN-LSTM classifier to identify ED-relevant tweets published by laypeople in 2 steps: first relevant versus irrelevant (F1 score=0.89) and then promotional versus published by laypeople (F1 score=0.90). A total of 40,790 ED-relevant tweets were identified using the CNN-LSTM classifier. We also identified another set of tweets (ie, 17,632 ED-relevant and 83,557 ED-irrelevant tweets) posted by laypeople using manually specified rules. Using CorEx on all ED-relevant tweets, the topic model identified 162 topics. Overall, the coherence rate for topic modeling was 77.07% (1264/1640), indicating a high quality of the produced topics. The topics were further reviewed and analyzed by a domain expert. Conclusions: A developed CNN-LSTM classifier could improve the efficiency of identifying ED-relevant tweets compared with the traditional manual-based method. The CorEx topic model was applied on the tweets identified by the machine learning–based classifier and the traditional manual approach separately. Highly overlapping topics were observed between the 2 cohorts of tweets. The produced topics were further reviewed by a domain expert. Some of the topics identified by the potential ED tweets may provide new avenues for understanding this serious set of disorders. %M 33124997 %R 10.2196/18273 %U http://medinform.jmir.org/2020/10/e18273/ %U https://doi.org/10.2196/18273 %U http://www.ncbi.nlm.nih.gov/pubmed/33124997 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 10 %P e21222 %T Predictive Models for Neonatal Follow-Up Serum Bilirubin: Model Development and Validation %A Chou,Joseph H %+ Massachusetts General Hospital, 55 Fruit Street, Founders 526E, Boston, MA, 02114-2696, United States, 1 617 724 9040, jchou2@mgh.harvard.edu %K infant, newborn %K neonatology %K jaundice, neonatal %K hyperbilirubinemia, neonatal %K machine learning %K supervised machine learning %K data science %K medical informatics %K decision support techniques %K models, statistical %K predictive models %D 2020 %7 29.10.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Hyperbilirubinemia affects many newborn infants and, if not treated appropriately, can lead to irreversible brain injury. Objective: This study aims to develop predictive models of follow-up total serum bilirubin measurement and to compare their accuracy with that of clinician predictions. Methods: Subjects were patients born between June 2015 and June 2019 at 4 hospitals in Massachusetts. The prediction target was a follow-up total serum bilirubin measurement obtained <72 hours after a previous measurement. Birth before versus after February 2019 was used to generate a training set (27,428 target measurements) and a held-out test set (3320 measurements), respectively. Multiple supervised learning models were trained. To further assess model performance, predictions on the held-out test set were also compared with corresponding predictions from clinicians. Results: The best predictive accuracy on the held-out test set was obtained with the multilayer perceptron (ie, neural network, mean absolute error [MAE] 1.05 mg/dL) and Xgboost (MAE 1.04 mg/dL) models. A limited number of predictors were sufficient for constructing models with the best performance and avoiding overfitting: current bilirubin measurement, last rate of rise, proportion of time under phototherapy, time to next measurement, gestational age at birth, current age, and fractional weight change from birth. Clinicians made a total of 210 prospective predictions. The neural network model accuracy on this subset of predictions had an MAE of 1.06 mg/dL compared with clinician predictions with an MAE of 1.38 mg/dL (P<.0001). In babies born at 35 weeks of gestation or later, this approach was also applied to predict the binary outcome of subsequently exceeding consensus guidelines for phototherapy initiation and achieved an area under the receiver operator characteristic curve of 0.94 (95% CI 0.91 to 0.97). Conclusions: This study developed predictive models for neonatal follow-up total serum bilirubin measurements that outperform clinicians. This may be the first report of models that predict specific bilirubin values, are not limited to near-term patients without risk factors, and take into account the effect of phototherapy. %M 33118947 %R 10.2196/21222 %U http://medinform.jmir.org/2020/10/e21222/ %U https://doi.org/10.2196/21222 %U http://www.ncbi.nlm.nih.gov/pubmed/33118947 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 10 %P e21801 %T Clinical Characteristics and Prognostic Factors for Intensive Care Unit Admission of Patients With COVID-19: Retrospective Study Using Machine Learning and Natural Language Processing %A Izquierdo,Jose Luis %A Ancochea,Julio %A , %A Soriano,Joan B %+ Hospital Universitario de La Princesa, Diego de León 62, Madrid, 28005, Spain, 34 618867769, jbsoriano2@gmail.com %K artificial intelligence %K big data %K COVID-19 %K electronic health records %K tachypnea %K SARS-CoV-2 %K predictive model %D 2020 %7 28.10.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Many factors involved in the onset and clinical course of the ongoing COVID-19 pandemic are still unknown. Although big data analytics and artificial intelligence are widely used in the realms of health and medicine, researchers are only beginning to use these tools to explore the clinical characteristics and predictive factors of patients with COVID-19. Objective: Our primary objectives are to describe the clinical characteristics and determine the factors that predict intensive care unit (ICU) admission of patients with COVID-19. Determining these factors using a well-defined population can increase our understanding of the real-world epidemiology of the disease. Methods: We used a combination of classic epidemiological methods, natural language processing (NLP), and machine learning (for predictive modeling) to analyze the electronic health records (EHRs) of patients with COVID-19. We explored the unstructured free text in the EHRs within the Servicio de Salud de Castilla-La Mancha (SESCAM) Health Care Network (Castilla-La Mancha, Spain) from the entire population with available EHRs (1,364,924 patients) from January 1 to March 29, 2020. We extracted related clinical information regarding diagnosis, progression, and outcome for all COVID-19 cases. Results: A total of 10,504 patients with a clinical or polymerase chain reaction–confirmed diagnosis of COVID-19 were identified; 5519 (52.5%) were male, with a mean age of 58.2 years (SD 19.7). Upon admission, the most common symptoms were cough, fever, and dyspnea; however, all three symptoms occurred in fewer than half of the cases. Overall, 6.1% (83/1353) of hospitalized patients required ICU admission. Using a machine-learning, data-driven algorithm, we identified that a combination of age, fever, and tachypnea was the most parsimonious predictor of ICU admission; patients younger than 56 years, without tachypnea, and temperature <39 degrees Celsius (or >39 ºC without respiratory crackles) were not admitted to the ICU. In contrast, patients with COVID-19 aged 40 to 79 years were likely to be admitted to the ICU if they had tachypnea and delayed their visit to the emergency department after being seen in primary care. Conclusions: Our results show that a combination of easily obtainable clinical variables (age, fever, and tachypnea with or without respiratory crackles) predicts whether patients with COVID-19 will require ICU admission. %M 33090964 %R 10.2196/21801 %U http://www.jmir.org/2020/10/e21801/ %U https://doi.org/10.2196/21801 %U http://www.ncbi.nlm.nih.gov/pubmed/33090964 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 10 %P e20891 %T Federated Learning on Clinical Benchmark Data: Performance Assessment %A Lee,Geun Hyeong %A Shin,Soo-Yong %+ Department of Digital Health, Samsung Advanced Institute for Health Sciences & Technology, Sungkyunkwan University, 115 Irwon-ro, Gangnam-gu, Seoul, 06355, Republic of Korea, 82 2 3410 1449, sy.shin@skku.edu %K federated learning %K medical data %K privacy protection %K machine learning %K deep learning %D 2020 %7 26.10.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Federated learning (FL) is a newly proposed machine-learning method that uses a decentralized dataset. Since data transfer is not necessary for the learning process in FL, there is a significant advantage in protecting personal privacy. Therefore, many studies are being actively conducted in the applications of FL for diverse areas. Objective: The aim of this study was to evaluate the reliability and performance of FL using three benchmark datasets, including a clinical benchmark dataset. Methods: To evaluate FL in a realistic setting, we implemented FL using a client-server architecture with Python. The implemented client-server version of the FL software was deployed to Amazon Web Services. Modified National Institute of Standards and Technology (MNIST), Medical Information Mart for Intensive Care-III (MIMIC-III), and electrocardiogram (ECG) datasets were used to evaluate the performance of FL. To test FL in a realistic setting, the MNIST dataset was split into 10 different clients, with one digit for each client. In addition, we conducted four different experiments according to basic, imbalanced, skewed, and a combination of imbalanced and skewed data distributions. We also compared the performance of FL to that of the state-of-the-art method with respect to in-hospital mortality using the MIMIC-III dataset. Likewise, we conducted experiments comparing basic and imbalanced data distributions using MIMIC-III and ECG data. Results: FL on the basic MNIST dataset with 10 clients achieved an area under the receiver operating characteristic curve (AUROC) of 0.997 and an F1-score of 0.946. The experiment with the imbalanced MNIST dataset achieved an AUROC of 0.995 and an F1-score of 0.921. The experiment with the skewed MNIST dataset achieved an AUROC of 0.992 and an F1-score of 0.905. Finally, the combined imbalanced and skewed experiment achieved an AUROC of 0.990 and an F1-score of 0.891. The basic FL on in-hospital mortality using MIMIC-III data achieved an AUROC of 0.850 and an F1-score of 0.944, while the experiment with the imbalanced MIMIC-III dataset achieved an AUROC of 0.850 and an F1-score of 0.943. For ECG classification, the basic FL achieved an AUROC of 0.938 and an F1-score of 0.807, and the imbalanced ECG dataset achieved an AUROC of 0.943 and an F1-score of 0.807. Conclusions: FL demonstrated comparative performance on different benchmark datasets. In addition, FL demonstrated reliable performance in cases where the distribution was imbalanced, skewed, and extreme, reflecting the real-life scenario in which data distributions from various hospitals are different. FL can achieve high performance while maintaining privacy protection because there is no requirement to centralize the data. %M 33104011 %R 10.2196/20891 %U http://www.jmir.org/2020/10/e20891/ %U https://doi.org/10.2196/20891 %U http://www.ncbi.nlm.nih.gov/pubmed/33104011 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 10 %P e20346 %T The Effectiveness of Artificial Intelligence Conversational Agents in Health Care: Systematic Review %A Milne-Ives,Madison %A de Cock,Caroline %A Lim,Ernest %A Shehadeh,Melissa Harper %A de Pennington,Nick %A Mole,Guy %A Normando,Eduardo %A Meinert,Edward %+ Centre for Health Technology, University of Plymouth, 8 Kirkby Place, Room 2, Plymouth, PL4 6DT, United Kingdom, 44 7824446808, edward.meinert@plymouth.ac.uk %K artificial intelligence %K avatar %K chatbot %K conversational agent %K digital health %K intelligent assistant %K speech recognition software %K virtual assistant %K virtual coach %K virtual health care %K virtual nursing %K voice recognition software %D 2020 %7 22.10.2020 %9 Review %J J Med Internet Res %G English %X Background: The high demand for health care services and the growing capability of artificial intelligence have led to the development of conversational agents designed to support a variety of health-related activities, including behavior change, treatment support, health monitoring, training, triage, and screening support. Automation of these tasks could free clinicians to focus on more complex work and increase the accessibility to health care services for the public. An overarching assessment of the acceptability, usability, and effectiveness of these agents in health care is needed to collate the evidence so that future development can target areas for improvement and potential for sustainable adoption. Objective: This systematic review aims to assess the effectiveness and usability of conversational agents in health care and identify the elements that users like and dislike to inform future research and development of these agents. Methods: PubMed, Medline (Ovid), EMBASE (Excerpta Medica dataBASE), CINAHL (Cumulative Index to Nursing and Allied Health Literature), Web of Science, and the Association for Computing Machinery Digital Library were systematically searched for articles published since 2008 that evaluated unconstrained natural language processing conversational agents used in health care. EndNote (version X9, Clarivate Analytics) reference management software was used for initial screening, and full-text screening was conducted by 1 reviewer. Data were extracted, and the risk of bias was assessed by one reviewer and validated by another. Results: A total of 31 studies were selected and included a variety of conversational agents, including 14 chatbots (2 of which were voice chatbots), 6 embodied conversational agents (3 of which were interactive voice response calls, virtual patients, and speech recognition screening systems), 1 contextual question-answering agent, and 1 voice recognition triage system. Overall, the evidence reported was mostly positive or mixed. Usability and satisfaction performed well (27/30 and 26/31), and positive or mixed effectiveness was found in three-quarters of the studies (23/30). However, there were several limitations of the agents highlighted in specific qualitative feedback. Conclusions: The studies generally reported positive or mixed evidence for the effectiveness, usability, and satisfactoriness of the conversational agents investigated, but qualitative user perceptions were more mixed. The quality of many of the studies was limited, and improved study design and reporting are necessary to more accurately evaluate the usefulness of the agents in health care and identify key areas for improvement. Further research should also analyze the cost-effectiveness, privacy, and security of the agents. International Registered Report Identifier (IRRID): RR2-10.2196/16934 %M 33090118 %R 10.2196/20346 %U http://www.jmir.org/2020/10/e20346/ %U https://doi.org/10.2196/20346 %U http://www.ncbi.nlm.nih.gov/pubmed/33090118 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 10 %P e22550 %T Deep Learning With Electronic Health Records for Short-Term Fracture Risk Identification: Crystal Bone Algorithm Development and Validation %A Almog,Yasmeen Adar %A Rai,Angshu %A Zhang,Patrick %A Moulaison,Amanda %A Powell,Ross %A Mishra,Anirban %A Weinberg,Kerry %A Hamilton,Celeste %A Oates,Mary %A McCloskey,Eugene %A Cummings,Steven R %+ Digital Health & Innovation, Amgen Inc, 1 Amgen Center Drive, MS 38-3B, Thousand Oaks, CA, 91320, United States, 1 4243463036, yalmog@amgen.com %K fracture %K bone %K osteoporosis %K low bone mass %K prediction %K natural language processing %K NLP %K machine learning %K deep learning %K artificial intelligence %K AI %K electronic health record %K EHR %D 2020 %7 16.10.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Fractures as a result of osteoporosis and low bone mass are common and give rise to significant clinical, personal, and economic burden. Even after a fracture occurs, high fracture risk remains widely underdiagnosed and undertreated. Common fracture risk assessment tools utilize a subset of clinical risk factors for prediction, and often require manual data entry. Furthermore, these tools predict risk over the long term and do not explicitly provide short-term risk estimates necessary to identify patients likely to experience a fracture in the next 1-2 years. Objective: The goal of this study was to develop and evaluate an algorithm for the identification of patients at risk of fracture in a subsequent 1- to 2-year period. In order to address the aforementioned limitations of current prediction tools, this approach focused on a short-term timeframe, automated data entry, and the use of longitudinal data to inform the predictions. Methods: Using retrospective electronic health record data from over 1,000,000 patients, we developed Crystal Bone, an algorithm that applies machine learning techniques from natural language processing to the temporal nature of patient histories to generate short-term fracture risk predictions. Similar to how language models predict the next word in a given sentence or the topic of a document, Crystal Bone predicts whether a patient’s future trajectory might contain a fracture event, or whether the signature of the patient’s journey is similar to that of a typical future fracture patient. A holdout set with 192,590 patients was used to validate accuracy. Experimental baseline models and human-level performance were used for comparison. Results: The model accurately predicted 1- to 2-year fracture risk for patients aged over 50 years (area under the receiver operating characteristics curve [AUROC] 0.81). These algorithms outperformed the experimental baselines (AUROC 0.67) and showed meaningful improvements when compared to retrospective approximation of human-level performance by correctly identifying 9649 of 13,765 (70%) at-risk patients who did not receive any preventative bone-health-related medical interventions from their physicians. Conclusions: These findings indicate that it is possible to use a patient’s unique medical history as it changes over time to predict the risk of short-term fracture. Validating and applying such a tool within the health care system could enable automated and widespread prediction of this risk and may help with identification of patients at very high risk of fracture. %M 32956069 %R 10.2196/22550 %U http://www.jmir.org/2020/10/e22550/ %U https://doi.org/10.2196/22550 %U http://www.ncbi.nlm.nih.gov/pubmed/32956069 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 10 %P e19878 %T Application of an Artificial Intelligence Trilogy to Accelerate Processing of Suspected Patients With SARS-CoV-2 at a Smart Quarantine Station: Observational Study %A Liu,Ping-Yen %A Tsai,Yi-Shan %A Chen,Po-Lin %A Tsai,Huey-Pin %A Hsu,Ling-Wei %A Wang,Chi-Shiang %A Lee,Nan-Yao %A Huang,Mu-Shiang %A Wu,Yun-Chiao %A Ko,Wen-Chien %A Yang,Yi-Ching %A Chiang,Jung-Hsien %A Shen,Meng-Ru %+ Department of Obstetrics and Gynecology, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, 138 Sheng-Li Rd, Tainan, 70401, Taiwan, 886 6 2353535 ext 5505, mrshen@mail.ncku.edu.tw %K SARS-CoV-2 %K COVID-19 %K artificial intelligence %K smart device assisted decision making %K quarantine station %D 2020 %7 14.10.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: As the COVID-19 epidemic increases in severity, the burden of quarantine stations outside emergency departments (EDs) at hospitals is increasing daily. To address the high screening workload at quarantine stations, all staff members with medical licenses are required to work shifts in these stations. Therefore, it is necessary to simplify the workflow and decision-making process for physicians and surgeons from all subspecialties. Objective: The aim of this paper is to demonstrate how the National Cheng Kung University Hospital artificial intelligence (AI) trilogy of diversion to a smart quarantine station, AI-assisted image interpretation, and a built-in clinical decision-making algorithm improves medical care and reduces quarantine processing times. Methods: This observational study on the emerging COVID-19 pandemic included 643 patients. An “AI trilogy” of diversion to a smart quarantine station, AI-assisted image interpretation, and a built-in clinical decision-making algorithm on a tablet computer was applied to shorten the quarantine survey process and reduce processing time during the COVID-19 pandemic. Results: The use of the AI trilogy facilitated the processing of suspected cases of COVID-19 with or without symptoms; also, travel, occupation, contact, and clustering histories were obtained with the tablet computer device. A separate AI-mode function that could quickly recognize pulmonary infiltrates on chest x-rays was merged into the smart clinical assisting system (SCAS), and this model was subsequently trained with COVID-19 pneumonia cases from the GitHub open source data set. The detection rates for posteroanterior and anteroposterior chest x-rays were 55/59 (93%) and 5/11 (45%), respectively. The SCAS algorithm was continuously adjusted based on updates to the Taiwan Centers for Disease Control public safety guidelines for faster clinical decision making. Our ex vivo study demonstrated the efficiency of disinfecting the tablet computer surface by wiping it twice with 75% alcohol sanitizer. To further analyze the impact of the AI application in the quarantine station, we subdivided the station group into groups with or without AI. Compared with the conventional ED (n=281), the survey time at the quarantine station (n=1520) was significantly shortened; the median survey time at the ED was 153 minutes (95% CI 108.5-205.0), vs 35 minutes at the quarantine station (95% CI 24-56; P<.001). Furthermore, the use of the AI application in the quarantine station reduced the survey time in the quarantine station; the median survey time without AI was 101 minutes (95% CI 40-153), vs 34 minutes (95% CI 24-53) with AI in the quarantine station (P<.001). Conclusions: The AI trilogy improved our medical care workflow by shortening the quarantine survey process and reducing the processing time, which is especially important during an emerging infectious disease epidemic. %M 33001832 %R 10.2196/19878 %U http://www.jmir.org/2020/10/e19878/ %U https://doi.org/10.2196/19878 %U http://www.ncbi.nlm.nih.gov/pubmed/33001832 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 10 %P e18287 %T Construction of a Digestive System Tumor Knowledge Graph Based on Chinese Electronic Medical Records: Development and Usability Study %A Xiu,Xiaolei %A Qian,Qing %A Wu,Sizhu %+ Institute of Medical Information/Medical Library, Chinese Academy of Medical Sciences & Peking Union Medical College, No 3 Yabao Road, Chaoyang District, Beijing, 100020, China, 86 18510495073, wu.sizhu@imicams.ac.cn %K Chinese electronic medical records %K knowledge graph %K digestive system tumor %K graph evaluation %D 2020 %7 7.10.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: With the increasing incidences and mortality of digestive system tumor diseases in China, ways to use clinical experience data in Chinese electronic medical records (CEMRs) to determine potentially effective relationships between diagnosis and treatment have become a priority. As an important part of artificial intelligence, a knowledge graph is a powerful tool for information processing and knowledge organization that provides an ideal means to solve this problem. Objective: This study aimed to construct a semantic-driven digestive system tumor knowledge graph (DSTKG) to represent the knowledge in CEMRs with fine granularity and semantics. Methods: This paper focuses on the knowledge graph schema and semantic relationships that were the main challenges for constructing a Chinese tumor knowledge graph. The DSTKG was developed through a multistep procedure. As an initial step, a complete DSTKG construction framework based on CEMRs was proposed. Then, this research built a knowledge graph schema containing 7 classes and 16 kinds of semantic relationships and accomplished the DSTKG by knowledge extraction, named entity linking, and drawing the knowledge graph. Finally, the quality of the DSTKG was evaluated from 3 aspects: data layer, schema layer, and application layer. Results: Experts agreed that the DSTKG was good overall (mean score 4.20). Especially for the aspects of “rationality of schema structure,” “scalability,” and “readability of results,” the DSTKG performed well, with scores of 4.72, 4.67, and 4.69, respectively, which were much higher than the average. However, the small amount of data in the DSTKG negatively affected its “practicability” score. Compared with other Chinese tumor knowledge graphs, the DSTKG can represent more granular entities, properties, and semantic relationships. In addition, the DSTKG was flexible, allowing personalized customization to meet the designer's focus on specific interests in the digestive system tumor. Conclusions: We constructed a granular semantic DSTKG. It could provide guidance for the construction of a tumor knowledge graph and provide a preliminary step for the intelligent application of knowledge graphs based on CEMRs. Additional data sources and stronger research on assertion classification are needed to gain insight into the DSTKG’s potential. %M 33026359 %R 10.2196/18287 %U http://medinform.jmir.org/2020/10/e18287/ %U https://doi.org/10.2196/18287 %U http://www.ncbi.nlm.nih.gov/pubmed/33026359 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 9 %P e22845 %T Artificial Intelligence Chatbot Behavior Change Model for Designing Artificial Intelligence Chatbots to Promote Physical Activity and a Healthy Diet: Viewpoint %A Zhang,Jingwen %A Oh,Yoo Jung %A Lange,Patrick %A Yu,Zhou %A Fukuoka,Yoshimi %+ Department of Communication, University of California, Davis, One Shields Avenue, Davis, CA, 95616, United States, 1 530 754 1472, jwzzhang@ucdavis.edu %K chatbot %K conversational agent %K artificial intelligence %K physical activity %K diet %K intervention %K behavior change %K natural language processing %K communication %D 2020 %7 30.9.2020 %9 Viewpoint %J J Med Internet Res %G English %X Background: Chatbots empowered by artificial intelligence (AI) can increasingly engage in natural conversations and build relationships with users. Applying AI chatbots to lifestyle modification programs is one of the promising areas to develop cost-effective and feasible behavior interventions to promote physical activity and a healthy diet. Objective: The purposes of this perspective paper are to present a brief literature review of chatbot use in promoting physical activity and a healthy diet, describe the AI chatbot behavior change model our research team developed based on extensive interdisciplinary research, and discuss ethical principles and considerations. Methods: We conducted a preliminary search of studies reporting chatbots for improving physical activity and/or diet in four databases in July 2020. We summarized the characteristics of the chatbot studies and reviewed recent developments in human-AI communication research and innovations in natural language processing. Based on the identified gaps and opportunities, as well as our own clinical and research experience and findings, we propose an AI chatbot behavior change model. Results: Our review found a lack of understanding around theoretical guidance and practical recommendations on designing AI chatbots for lifestyle modification programs. The proposed AI chatbot behavior change model consists of the following four components to provide such guidance: (1) designing chatbot characteristics and understanding user background; (2) building relational capacity; (3) building persuasive conversational capacity; and (4) evaluating mechanisms and outcomes. The rationale and evidence supporting the design and evaluation choices for this model are presented in this paper. Conclusions: As AI chatbots become increasingly integrated into various digital communications, our proposed theoretical framework is the first step to conceptualize the scope of utilization in health behavior change domains and to synthesize all possible dimensions of chatbot features to inform intervention design and evaluation. There is a need for more interdisciplinary work to continue developing AI techniques to improve a chatbot’s relational and persuasive capacities to change physical activity and diet behaviors with strong ethical principles. %M 32996892 %R 10.2196/22845 %U https://www.jmir.org/2020/9/e22845 %U https://doi.org/10.2196/22845 %U http://www.ncbi.nlm.nih.gov/pubmed/32996892 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 9 %P e21849 %T Development of a Social Network for People Without a Diagnosis (RarePairs): Evaluation Study %A Kühnle,Lara %A Mücke,Urs %A Lechner,Werner M %A Klawonn,Frank %A Grigull,Lorenz %+ Hannover Medical School, Carl-Neuberg-Straße 1, Hannover, 30625, Germany, 49 511532 ext 3220, muecke.urs@mh-hannover.de %K rare disease %K diagnostic support tool %K prototype %K social network %K machine learning %K artificial intelligence %D 2020 %7 29.9.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Diagnostic delay in rare disease (RD) is common, occasionally lasting up to more than 20 years. In attempting to reduce it, diagnostic support tools have been studied extensively. However, social platforms have not yet been used for systematic diagnostic support. This paper illustrates the development and prototypic application of a social network using scientifically developed questions to match individuals without a diagnosis. Objective: The study aimed to outline, create, and evaluate a prototype tool (a social network platform named RarePairs), helping patients with undiagnosed RDs to find individuals with similar symptoms. The prototype includes a matching algorithm, bringing together individuals with similar disease burden in the lead-up to diagnosis. Methods: We divided our project into 4 phases. In phase 1, we used known data and findings in the literature to understand and specify the context of use. In phase 2, we specified the user requirements. In phase 3, we designed a prototype based on the results of phases 1 and 2, as well as incorporating a state-of-the-art questionnaire with 53 items for recognizing an RD. Lastly, we evaluated this prototype with a data set of 973 questionnaires from individuals suffering from different RDs using 24 distance calculating methods. Results: Based on a step-by-step construction process, the digital patient platform prototype, RarePairs, was developed. In order to match individuals with similar experiences, it uses answer patterns generated by a specifically designed questionnaire (Q53). A total of 973 questionnaires answered by patients with RDs were used to construct and test an artificial intelligence (AI) algorithm like the k-nearest neighbor search. With this, we found matches for every single one of the 973 records. The cross-validation of those matches showed that the algorithm outperforms random matching significantly. Statistically, for every data set the algorithm found at least one other record (match) with the same diagnosis. Conclusions: Diagnostic delay is torturous for patients without a diagnosis. Shortening the delay is important for both doctors and patients. Diagnostic support using AI can be promoted differently. The prototype of the social media platform RarePairs might be a low-threshold patient platform, and proved suitable to match and connect different individuals with comparable symptoms. This exchange promoted through RarePairs might be used to speed up the diagnostic process. Further studies include its evaluation in a prospective setting and implementation of RarePairs as a mobile phone app. %M 32990634 %R 10.2196/21849 %U http://www.jmir.org/2020/9/e21849/ %U https://doi.org/10.2196/21849 %U http://www.ncbi.nlm.nih.gov/pubmed/32990634 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 9 %P e20645 %T Marrying Medical Domain Knowledge With Deep Learning on Electronic Health Records: A Deep Visual Analytics Approach %A Li,Rui %A Yin,Changchang %A Yang,Samuel %A Qian,Buyue %A Zhang,Ping %+ The Ohio State University, Lincoln Tower 310A, 1800 Cannon Drive, Columbus, OH, 43210, United States, 1 614 293 9286, zhang.10631@osu.edu %K electronic health records %K interpretable deep learning %K knowledge graph %K visual analytics %D 2020 %7 28.9.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Deep learning models have attracted significant interest from health care researchers during the last few decades. There have been many studies that apply deep learning to medical applications and achieve promising results. However, there are three limitations to the existing models: (1) most clinicians are unable to interpret the results from the existing models, (2) existing models cannot incorporate complicated medical domain knowledge (eg, a disease causes another disease), and (3) most existing models lack visual exploration and interaction. Both the electronic health record (EHR) data set and the deep model results are complex and abstract, which impedes clinicians from exploring and communicating with the model directly. Objective: The objective of this study is to develop an interpretable and accurate risk prediction model as well as an interactive clinical prediction system to support EHR data exploration, knowledge graph demonstration, and model interpretation. Methods: A domain-knowledge–guided recurrent neural network (DG-RNN) model is proposed to predict clinical risks. The model takes medical event sequences as input and incorporates medical domain knowledge by attending to a subgraph of the whole medical knowledge graph. A global pooling operation and a fully connected layer are used to output the clinical outcomes. The middle results and the parameters of the fully connected layer are helpful in identifying which medical events cause clinical risks. DG-Viz is also designed to support EHR data exploration, knowledge graph demonstration, and model interpretation. Results: We conducted both risk prediction experiments and a case study on a real-world data set. A total of 554 patients with heart failure and 1662 control patients without heart failure were selected from the data set. The experimental results show that the proposed DG-RNN outperforms the state-of-the-art approaches by approximately 1.5%. The case study demonstrates how our medical physician collaborator can effectively explore the data and interpret the prediction results using DG-Viz. Conclusions: In this study, we present DG-Viz, an interactive clinical prediction system, which brings together the power of deep learning (ie, a DG-RNN–based model) and visual analytics to predict clinical risks and visually interpret the EHR prediction results. Experimental results and a case study on heart failure risk prediction tasks demonstrate the effectiveness and usefulness of the DG-Viz system. This study will pave the way for interactive, interpretable, and accurate clinical risk predictions. %M 32985996 %R 10.2196/20645 %U http://www.jmir.org/2020/9/e20645/ %U https://doi.org/10.2196/20645 %U http://www.ncbi.nlm.nih.gov/pubmed/32985996 %0 Journal Article %@ 2371-4379 %I JMIR Publications %V 5 %N 3 %P e18660 %T The Diabits App for Smartphone-Assisted Predictive Monitoring of Glycemia in Patients With Diabetes: Retrospective Observational Study %A Kriventsov,Stan %A Lindsey,Alexander %A Hayeri,Amir %+ Bio Conscious Technologies Inc, 555 W Hastings St, Suite #1200, Vancouver, BC, V6B 4N6, Canada, 1 604 729 4747, stan@bioconscious.tech %K blood glucose predictions %K type 1 diabetes %K artificial intelligence %K machine learning %K digital health %K mobile phone %D 2020 %7 22.9.2020 %9 Original Paper %J JMIR Diabetes %G English %X Background: Diabetes mellitus, which causes dysregulation of blood glucose in humans, is a major public health challenge. Patients with diabetes must monitor their glycemic levels to keep them in a healthy range. This task is made easier by using continuous glucose monitoring (CGM) devices and relaying their output to smartphone apps, thus providing users with real-time information on their glycemic fluctuations and possibly predicting future trends. Objective: This study aims to discuss various challenges of predictive monitoring of glycemia and examines the accuracy and blood glucose control effects of Diabits, a smartphone app that helps patients with diabetes monitor and manage their blood glucose levels in real time. Methods: Using data from CGM devices and user input, Diabits applies machine learning techniques to create personalized patient models and predict blood glucose fluctuations up to 60 min in advance. These predictions give patients an opportunity to take pre-emptive action to maintain their blood glucose values within the reference range. In this retrospective observational cohort study, the predictive accuracy of Diabits and the correlation between daily use of the app and blood glucose control metrics were examined based on real app users’ data. Moreover, the accuracy of predictions on the 2018 Ohio T1DM (type 1 diabetes mellitus) data set was calculated and compared against other published results. Results: On the basis of more than 6.8 million data points, 30-min Diabits predictions evaluated using Parkes Error Grid were found to be 86.89% (5,963,930/6,864,130) clinically accurate (zone A) and 99.56% (6,833,625/6,864,130) clinically acceptable (zones A and B), whereas 60-min predictions were 70.56% (4,843,605/6,864,130) clinically accurate and 97.49% (6,692,165/6,864,130) clinically acceptable. By analyzing daily use statistics and CGM data for the 280 most long-standing users of Diabits, it was established that under free-living conditions, many common blood glucose control metrics improved with increased frequency of app use. For instance, the average blood glucose for the days these users did not interact with the app was 154.0 (SD 47.2) mg/dL, with 67.52% of the time spent in the healthy 70 to 180 mg/dL range. For days with 10 or more Diabits sessions, the average blood glucose decreased to 141.6 (SD 42.0) mg/dL (P<.001), whereas the time in euglycemic range increased to 74.28% (P<.001). On the Ohio T1DM data set of 6 patients with type 1 diabetes, 30-min predictions of the base Diabits model had an average root mean square error of 18.68 (SD 2.19) mg/dL, which is an improvement over the published state-of-the-art results for this data set. Conclusions: Diabits accurately predicts future glycemic fluctuations, potentially making it easier for patients with diabetes to maintain their blood glucose in the reference range. Furthermore, an improvement in glucose control was observed on days with more frequent Diabits use. %M 32960180 %R 10.2196/18660 %U http://diabetes.jmir.org/2020/3/e18660/ %U https://doi.org/10.2196/18660 %U http://www.ncbi.nlm.nih.gov/pubmed/32960180 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 9 %P e19897 %T A Personalized Voice-Based Diet Assistant for Caregivers of Alzheimer Disease and Related Dementias: System Development and Validation %A Li,Juan %A Maharjan,Bikesh %A Xie,Bo %A Tao,Cui %+ University of Texas Health Science Center at Houston, 7000 Fannin Street Suite 600, Houston, TX, 77030, United States, 1 7135003981, cui.tao@uth.tmc.edu %K Alzheimer disease %K dementia %K diet %K knowledge %K ontology %K voice assistant %D 2020 %7 21.9.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: The world’s aging population is increasing, with an expected increase in the prevalence of Alzheimer disease and related dementias (ADRD). Proper nutrition and good eating behavior show promise for preventing and slowing the progression of ADRD and consequently improving patients with ADRD’s health status and quality of life. Most ADRD care is provided by informal caregivers, so assisting caregivers to manage patients with ADRD’s diet is important. Objective: This study aims to design, develop, and test an artificial intelligence–powered voice assistant to help informal caregivers manage the daily diet of patients with ADRD and learn food and nutrition-related knowledge. Methods: The voice assistant is being implemented in several steps: construction of a comprehensive knowledge base with ontologies that define ADRD diet care and user profiles, and is extended with external knowledge graphs; management of conversation between users and the voice assistant; personalized ADRD diet services provided through a semantics-based knowledge graph search and reasoning engine; and system evaluation in use cases with additional qualitative evaluations. Results: A prototype voice assistant was evaluated in the lab using various use cases. Preliminary qualitative test results demonstrate reasonable rates of dialogue success and recommendation correctness. Conclusions: The voice assistant provides a natural, interactive interface for users, and it does not require the user to have a technical background, which may facilitate senior caregivers’ use in their daily care tasks. This study suggests the feasibility of using the intelligent voice assistant to help caregivers manage patients with ADRD’s diet. %M 32955452 %R 10.2196/19897 %U http://www.jmir.org/2020/9/e19897/ %U https://doi.org/10.2196/19897 %U http://www.ncbi.nlm.nih.gov/pubmed/32955452 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 9 %P e21983 %T Artificial Intelligence for the Prediction of Helicobacter Pylori Infection in Endoscopic Images: Systematic Review and Meta-Analysis Of Diagnostic Test Accuracy %A Bang,Chang Seok %A Lee,Jae Jun %A Baik,Gwang Ho %+ Department of Internal Medicine, Hallym University College of Medicine, Sakju-ro 77, Chuncheon, , Republic of Korea, 82 33 240 5000, csbang@hallym.ac.kr %K artificial intelligence %K convolutional neural network %K deep learning %K machine learning %K endoscopy %K Helicobacter pylori %D 2020 %7 16.9.2020 %9 Review %J J Med Internet Res %G English %X Background: Helicobacter pylori plays a central role in the development of gastric cancer, and prediction of H pylori infection by visual inspection of the gastric mucosa is an important function of endoscopy. However, there are currently no established methods of optical diagnosis of H pylori infection using endoscopic images. Definitive diagnosis requires endoscopic biopsy. Artificial intelligence (AI) has been increasingly adopted in clinical practice, especially for image recognition and classification. Objective: This study aimed to evaluate the diagnostic test accuracy of AI for the prediction of H pylori infection using endoscopic images. Methods: Two independent evaluators searched core databases. The inclusion criteria included studies with endoscopic images of H pylori infection and with application of AI for the prediction of H pylori infection presenting diagnostic performance. Systematic review and diagnostic test accuracy meta-analysis were performed. Results: Ultimately, 8 studies were identified. Pooled sensitivity, specificity, diagnostic odds ratio, and area under the curve of AI for the prediction of H pylori infection were 0.87 (95% CI 0.72-0.94), 0.86 (95% CI 0.77-0.92), 40 (95% CI 15-112), and 0.92 (95% CI 0.90-0.94), respectively, in the 1719 patients (385 patients with H pylori infection vs 1334 controls). Meta-regression showed methodological quality and included the number of patients in each study for the purpose of heterogeneity. There was no evidence of publication bias. The accuracy of the AI algorithm reached 82% for discrimination between noninfected images and posteradication images. Conclusions: An AI algorithm is a reliable tool for endoscopic diagnosis of H pylori infection. The limitations of lacking external validation performance and being conducted only in Asia should be overcome. Trial Registration: PROSPERO CRD42020175957; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=175957 %M 32936088 %R 10.2196/21983 %U http://www.jmir.org/2020/9/e21983/ %U https://doi.org/10.2196/21983 %U http://www.ncbi.nlm.nih.gov/pubmed/32936088 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 9 %P e18689 %T An Intelligent Mobile-Enabled System for Diagnosing Parkinson Disease: Development and Validation of a Speech Impairment Detection System %A Zhang,Liang %A Qu,Yue %A Jin,Bo %A Jing,Lu %A Gao,Zhan %A Liang,Zhanhua %+ Department of Neurology, The First Affiliated Hospital of Dalian Medical University, No.222 Zhongshan Road, Dalian, 116011, China, 86 18098876262, jinglu131129@126.com %K Parkinson disease %K speech disorder %K remote diagnosis %K artificial intelligence %K mobile phone app %K mobile health %D 2020 %7 16.9.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Parkinson disease (PD) is one of the most common neurological diseases. At present, because the exact cause is still unclear, accurate diagnosis and progression monitoring remain challenging. In recent years, exploring the relationship between PD and speech impairment has attracted widespread attention in the academic world. Most of the studies successfully validated the effectiveness of some vocal features. Moreover, the noninvasive nature of speech signal–based testing has pioneered a new way for telediagnosis and telemonitoring. In particular, there is an increasing demand for artificial intelligence–powered tools in the digital health era. Objective: This study aimed to build a real-time speech signal analysis tool for PD diagnosis and severity assessment. Further, the underlying system should be flexible enough to integrate any machine learning or deep learning algorithm. Methods: At its core, the system we built consists of two parts: (1) speech signal processing: both traditional and novel speech signal processing technologies have been employed for feature engineering, which can automatically extract a few linear and nonlinear dysphonia features, and (2) application of machine learning algorithms: some classical regression and classification algorithms from the machine learning field have been tested; we then chose the most efficient algorithms and relevant features. Results: Experimental results showed that our system had an outstanding ability to both diagnose and assess severity of PD. By using both linear and nonlinear dysphonia features, the accuracy reached 88.74% and recall reached 97.03% in the diagnosis task. Meanwhile, mean absolute error was 3.7699 in the assessment task. The system has already been deployed within a mobile app called No Pa. Conclusions: This study performed diagnosis and severity assessment of PD from the perspective of speech order detection. The efficiency and effectiveness of the algorithms indirectly validated the practicality of the system. In particular, the system reflects the necessity of a publicly accessible PD diagnosis and assessment system that can perform telediagnosis and telemonitoring of PD. This system can also optimize doctors’ decision-making processes regarding treatments. %M 32936086 %R 10.2196/18689 %U http://medinform.jmir.org/2020/9/e18689/ %U https://doi.org/10.2196/18689 %U http://www.ncbi.nlm.nih.gov/pubmed/32936086 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 9 %P e20641 %T Automatic Grading of Stroke Symptoms for Rapid Assessment Using Optimized Machine Learning and 4-Limb Kinematics: Clinical Validation Study %A Park,Eunjeong %A Lee,Kijeong %A Han,Taehwa %A Nam,Hyo Suk %+ Department of Neurology, Yonsei University College of Medicine, 50 Yonsei-ro, Seodaemoon-gu, Seoul, 03722, Republic of Korea, 82 222280245, hsnam@yuhs.ac %K machine learning %K artificial intelligence %K sensors %K kinematics %K stroke %K telemedicine %D 2020 %7 16.9.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Subtle abnormal motor signs are indications of serious neurological diseases. Although neurological deficits require fast initiation of treatment in a restricted time, it is difficult for nonspecialists to detect and objectively assess the symptoms. In the clinical environment, diagnoses and decisions are based on clinical grading methods, including the National Institutes of Health Stroke Scale (NIHSS) score or the Medical Research Council (MRC) score, which have been used to measure motor weakness. Objective grading in various environments is necessitated for consistent agreement among patients, caregivers, paramedics, and medical staff to facilitate rapid diagnoses and dispatches to appropriate medical centers. Objective: In this study, we aimed to develop an autonomous grading system for stroke patients. We investigated the feasibility of our new system to assess motor weakness and grade NIHSS and MRC scores of 4 limbs, similar to the clinical examinations performed by medical staff. Methods: We implemented an automatic grading system composed of a measuring unit with wearable sensors and a grading unit with optimized machine learning. Inertial sensors were attached to measure subtle weaknesses caused by paralysis of upper and lower limbs. We collected 60 instances of data with kinematic features of motor disorders from neurological examination and demographic information of stroke patients with NIHSS 0 or 1 and MRC 7, 8, or 9 grades in a stroke unit. Training data with 240 instances were generated using a synthetic minority oversampling technique to complement the imbalanced number of data between classes and low number of training data. We trained 2 representative machine learning algorithms, an ensemble and a support vector machine (SVM), to implement auto-NIHSS and auto-MRC grading. The optimized algorithms performed a 5-fold cross-validation and were searched by Bayes optimization in 30 trials. The trained model was tested with the 60 original hold-out instances for performance evaluation in accuracy, sensitivity, specificity, and area under the receiver operating characteristics curve (AUC). Results: The proposed system can grade NIHSS scores with an accuracy of 83.3% and an AUC of 0.912 using an optimized ensemble algorithm, and it can grade with an accuracy of 80.0% and an AUC of 0.860 using an optimized SVM algorithm. The auto-MRC grading achieved an accuracy of 76.7% and a mean AUC of 0.870 in SVM classification and an accuracy of 78.3% and a mean AUC of 0.877 in ensemble classification. Conclusions: The automatic grading system quantifies proximal weakness in real time and assesses symptoms through automatic grading. The pilot outcomes demonstrated the feasibility of remote monitoring of motor weakness caused by stroke. The system can facilitate consistent grading with instant assessment and expedite dispatches to appropriate hospitals and treatment initiation by sharing auto-MRC and auto-NIHSS scores between prehospital and hospital responses as an objective observation. %M 32936079 %R 10.2196/20641 %U http://www.jmir.org/2020/9/e20641/ %U https://doi.org/10.2196/20641 %U http://www.ncbi.nlm.nih.gov/pubmed/32936079 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 9 %P e19133 %T Social Reminiscence in Older Adults’ Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning %A Ferrario,Andrea %A Demiray,Burcu %A Yordanova,Kristina %A Luo,Minxia %A Martin,Mike %+ Department of Management, Technology, and Economics, ETH Zurich, Weinbergstrasse 56/58, Zurich, 8092, Switzerland, 41 44 632 86 24, aferrario@ethz.ch %K aging %K dementia %K reminiscence %K real-life conversations %K electronically activated recorder (EAR) %K natural language processing %K machine learning %K imbalanced learning %D 2020 %7 15.9.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Reminiscence is the act of thinking or talking about personal experiences that occurred in the past. It is a central task of old age that is essential for healthy aging, and it serves multiple functions, such as decision-making and introspection, transmitting life lessons, and bonding with others. The study of social reminiscence behavior in everyday life can be used to generate data and detect reminiscence from general conversations. Objective: The aims of this original paper are to (1) preprocess coded transcripts of conversations in German of older adults with natural language processing (NLP), and (2) implement and evaluate learning strategies using different NLP features and machine learning algorithms to detect reminiscence in a corpus of transcripts. Methods: The methods in this study comprise (1) collecting and coding of transcripts of older adults’ conversations in German, (2) preprocessing transcripts to generate NLP features (bag-of-words models, part-of-speech tags, pretrained German word embeddings), and (3) training machine learning models to detect reminiscence using random forests, support vector machines, and adaptive and extreme gradient boosting algorithms. The data set comprises 2214 transcripts, including 109 transcripts with reminiscence. Due to class imbalance in the data, we introduced three learning strategies: (1) class-weighted learning, (2) a meta-classifier consisting of a voting ensemble, and (3) data augmentation with the Synthetic Minority Oversampling Technique (SMOTE) algorithm. For each learning strategy, we performed cross-validation on a random sample of the training data set of transcripts. We computed the area under the curve (AUC), the average precision (AP), precision, recall, as well as F1 score and specificity measures on the test data, for all combinations of NLP features, algorithms, and learning strategies. Results: Class-weighted support vector machines on bag-of-words features outperformed all other classifiers (AUC=0.91, AP=0.56, precision=0.5, recall=0.45, F1=0.48, specificity=0.98), followed by support vector machines on SMOTE-augmented data and word embeddings features (AUC=0.89, AP=0.54, precision=0.35, recall=0.59, F1=0.44, specificity=0.94). For the meta-classifier strategy, adaptive and extreme gradient boosting algorithms trained on word embeddings and bag-of-words outperformed all other classifiers and NLP features; however, the performance of the meta-classifier learning strategy was lower compared to other strategies, with highly imbalanced precision-recall trade-offs. Conclusions: This study provides evidence of the applicability of NLP and machine learning pipelines for the automated detection of reminiscence in older adults’ everyday conversations in German. The methods and findings of this study could be relevant for designing unobtrusive computer systems for the real-time detection of social reminiscence in the everyday life of older adults and classifying their functions. With further improvements, these systems could be deployed in health interventions aimed at improving older adults’ well-being by promoting self-reflection and suggesting coping strategies to be used in the case of dysfunctional reminiscence cases, which can undermine physical and mental health. %M 32866108 %R 10.2196/19133 %U http://www.jmir.org/2020/9/e19133/ %U https://doi.org/10.2196/19133 %U http://www.ncbi.nlm.nih.gov/pubmed/32866108 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 9 %P e21573 %T An Innovative Artificial Intelligence–Based App for the Diagnosis of Gestational Diabetes Mellitus (GDM-AI): Development Study %A Shen,Jiayi %A Chen,Jiebin %A Zheng,Zequan %A Zheng,Jiabin %A Liu,Zherui %A Song,Jian %A Wong,Sum Yi %A Wang,Xiaoling %A Huang,Mengqi %A Fang,Po-Han %A Jiang,Bangsheng %A Tsang,Winghei %A He,Zonglin %A Liu,Taoran %A Akinwunmi,Babatunde %A Wang,Chi Chiu %A Zhang,Casper J P %A Huang,Jian %A Ming,Wai-Kit %+ Department of Public Health and Preventive Medicine, School of Medicine, Jinan University, Guangzhou, China, 86 14715485116, wkming@connect.hku.hk %K AI %K application %K disease diagnosis %K maternal health care %K artificial intelligence %K app %K women %K rural %K innovation %K diabetes %K gestational diabetes %K diagnosis %D 2020 %7 15.9.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Gestational diabetes mellitus (GDM) can cause adverse consequences to both mothers and their newborns. However, pregnant women living in low- and middle-income areas or countries often fail to receive early clinical interventions at local medical facilities due to restricted availability of GDM diagnosis. The outstanding performance of artificial intelligence (AI) in disease diagnosis in previous studies demonstrates its promising applications in GDM diagnosis. Objective: This study aims to investigate the implementation of a well-performing AI algorithm in GDM diagnosis in a setting, which requires fewer medical equipment and staff and to establish an app based on the AI algorithm. This study also explores possible progress if our app is widely used. Methods: An AI model that included 9 algorithms was trained on 12,304 pregnant outpatients with their consent who received a test for GDM in the obstetrics and gynecology department of the First Affiliated Hospital of Jinan University, a local hospital in South China, between November 2010 and October 2017. GDM was diagnosed according to American Diabetes Association (ADA) 2011 diagnostic criteria. Age and fasting blood glucose were chosen as critical parameters. For validation, we performed k-fold cross-validation (k=5) for the internal dataset and an external validation dataset that included 1655 cases from the Prince of Wales Hospital, the affiliated teaching hospital of the Chinese University of Hong Kong, a non-local hospital. Accuracy, sensitivity, and other criteria were calculated for each algorithm. Results: The areas under the receiver operating characteristic curve (AUROC) of external validation dataset for support vector machine (SVM), random forest, AdaBoost, k-nearest neighbors (kNN), naive Bayes (NB), decision tree, logistic regression (LR), eXtreme gradient boosting (XGBoost), and gradient boosting decision tree (GBDT) were 0.780, 0.657, 0.736, 0.669, 0.774, 0.614, 0.769, 0.742, and 0.757, respectively. SVM also retained high performance in other criteria. The specificity for SVM retained 100% in the external validation set with an accuracy of 88.7%. Conclusions: Our prospective and multicenter study is the first clinical study that supports the GDM diagnosis for pregnant women in resource-limited areas, using only fasting blood glucose value, patients’ age, and a smartphone connected to the internet. Our study proved that SVM can achieve accurate diagnosis with less operation cost and higher efficacy. Our study (referred to as GDM-AI study, ie, the study of AI-based diagnosis of GDM) also shows our app has a promising future in improving the quality of maternal health for pregnant women, precision medicine, and long-distance medical care. We recommend future work should expand the dataset scope and replicate the process to validate the performance of the AI algorithms. %M 32930674 %R 10.2196/21573 %U https://www.jmir.org/2020/9/e21573 %U https://doi.org/10.2196/21573 %U http://www.ncbi.nlm.nih.gov/pubmed/32930674 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 9 %P e20701 %T Artificial Intelligence-Based Conversational Agents for Chronic Conditions: Systematic Literature Review %A Schachner,Theresa %A Keller,Roman %A v Wangenheim,Florian %+ Department of Management, Technology, and Economics, ETH Zurich, WEV G 228, Weinbergstr 56/58, Zurich , Switzerland, 41 446325209, tschachner@ethz.ch %K artificial intelligence %K conversational agents %K chatbots %K healthcare %K chronic diseases %K systematic literature review %D 2020 %7 14.9.2020 %9 Review %J J Med Internet Res %G English %X Background: A rising number of conversational agents or chatbots are equipped with artificial intelligence (AI) architecture. They are increasingly prevalent in health care applications such as those providing education and support to patients with chronic diseases, one of the leading causes of death in the 21st century. AI-based chatbots enable more effective and frequent interactions with such patients. Objective: The goal of this systematic literature review is to review the characteristics, health care conditions, and AI architectures of AI-based conversational agents designed specifically for chronic diseases. Methods: We conducted a systematic literature review using PubMed MEDLINE, EMBASE, PyscInfo, CINAHL, ACM Digital Library, ScienceDirect, and Web of Science. We applied a predefined search strategy using the terms “conversational agent,” “healthcare,” “artificial intelligence,” and their synonyms. We updated the search results using Google alerts, and screened reference lists for other relevant articles. We included primary research studies that involved the prevention, treatment, or rehabilitation of chronic diseases, involved a conversational agent, and included any kind of AI architecture. Two independent reviewers conducted screening and data extraction, and Cohen kappa was used to measure interrater agreement.A narrative approach was applied for data synthesis. Results: The literature search found 2052 articles, out of which 10 papers met the inclusion criteria. The small number of identified studies together with the prevalence of quasi-experimental studies (n=7) and prevailing prototype nature of the chatbots (n=7) revealed the immaturity of the field. The reported chatbots addressed a broad variety of chronic diseases (n=6), showcasing a tendency to develop specialized conversational agents for individual chronic conditions. However, there lacks comparison of these chatbots within and between chronic diseases. In addition, the reported evaluation measures were not standardized, and the addressed health goals showed a large range. Together, these study characteristics complicated comparability and open room for future research. While natural language processing represented the most used AI technique (n=7) and the majority of conversational agents allowed for multimodal interaction (n=6), the identified studies demonstrated broad heterogeneity, lack of depth of reported AI techniques and systems, and inconsistent usage of taxonomy of the underlying AI software, further aggravating comparability and generalizability of study results. Conclusions: The literature on AI-based conversational agents for chronic conditions is scarce and mostly consists of quasi-experimental studies with chatbots in prototype stage that use natural language processing and allow for multimodal user interaction. Future research could profit from evidence-based evaluation of the AI-based conversational agents and comparison thereof within and between different chronic health conditions. Besides increased comparability, the quality of chatbots developed for specific chronic conditions and their subsequent impact on the target patients could be enhanced by more structured development and standardized evaluation processes. %M 32924957 %R 10.2196/20701 %U http://www.jmir.org/2020/9/e20701/ %U https://doi.org/10.2196/20701 %U http://www.ncbi.nlm.nih.gov/pubmed/32924957 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 9 %P e18091 %T Artificial Intelligence and Its Effect on Dermatologists’ Accuracy in Dermoscopic Melanoma Image Classification: Web-Based Survey Study %A Maron,Roman C %A Utikal,Jochen S %A Hekler,Achim %A Hauschild,Axel %A Sattler,Elke %A Sondermann,Wiebke %A Haferkamp,Sebastian %A Schilling,Bastian %A Heppt,Markus V %A Jansen,Philipp %A Reinholz,Markus %A Franklin,Cindy %A Schmitt,Laurenz %A Hartmann,Daniela %A Krieghoff-Henning,Eva %A Schmitt,Max %A Weichenthal,Michael %A von Kalle,Christof %A Fröhling,Stefan %A Brinker,Titus J %+ Digital Biomarkers for Oncology Group (DBO), National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg, 69120, Germany, 49 62213219304, titus.brinker@nct-heidelberg.de %K artificial intelligence %K machine learning %K deep learning %K neural network %K dermatology %K diagnosis %K nevi %K melanoma %K skin neoplasm %D 2020 %7 11.9.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Early detection of melanoma can be lifesaving but this remains a challenge. Recent diagnostic studies have revealed the superiority of artificial intelligence (AI) in classifying dermoscopic images of melanoma and nevi, concluding that these algorithms should assist a dermatologist’s diagnoses. Objective: The aim of this study was to investigate whether AI support improves the accuracy and overall diagnostic performance of dermatologists in the dichotomous image–based discrimination between melanoma and nevus. Methods: Twelve board-certified dermatologists were presented disjoint sets of 100 unique dermoscopic images of melanomas and nevi (total of 1200 unique images), and they had to classify the images based on personal experience alone (part I) and with the support of a trained convolutional neural network (CNN, part II). Additionally, dermatologists were asked to rate their confidence in their final decision for each image. Results: While the mean specificity of the dermatologists based on personal experience alone remained almost unchanged (70.6% vs 72.4%; P=.54) with AI support, the mean sensitivity and mean accuracy increased significantly (59.4% vs 74.6%; P=.003 and 65.0% vs 73.6%; P=.002, respectively) with AI support. Out of the 10% (10/94; 95% CI 8.4%-11.8%) of cases where dermatologists were correct and AI was incorrect, dermatologists on average changed to the incorrect answer for 39% (4/10; 95% CI 23.2%-55.6%) of cases. When dermatologists were incorrect and AI was correct (25/94, 27%; 95% CI 24.0%-30.1%), dermatologists changed their answers to the correct answer for 46% (11/25; 95% CI 33.1%-58.4%) of cases. Additionally, the dermatologists’ average confidence in their decisions increased when the CNN confirmed their decision and decreased when the CNN disagreed, even when the dermatologists were correct. Reported values are based on the mean of all participants. Whenever absolute values are shown, the denominator and numerator are approximations as every dermatologist ended up rating a varying number of images due to a quality control step. Conclusions: The findings of our study show that AI support can improve the overall accuracy of the dermatologists in the dichotomous image–based discrimination between melanoma and nevus. This supports the argument for AI-based tools to aid clinicians in skin lesion classification and provides a rationale for studies of such classifiers in real-life settings, wherein clinicians can integrate additional information such as patient age and medical history into their decisions. %M 32915161 %R 10.2196/18091 %U https://www.jmir.org/2020/9/e18091 %U https://doi.org/10.2196/18091 %U http://www.ncbi.nlm.nih.gov/pubmed/32915161 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 3 %N 2 %P e19554 %T Artificial Intelligence–Powered Digital Health Platform and Wearable Devices Improve Outcomes for Older Adults in Assisted Living Communities: Pilot Intervention Study %A Wilmink,Gerald %A Dupey,Katherine %A Alkire,Schon %A Grote,Jeffrey %A Zobel,Gregory %A Fillit,Howard M %A Movva,Satish %+ CarePredict, 324 South University Drive, Plantation, FL, 33324, United States, 1 6153644985, jerry.wilmink@gmail.com %K health technology %K artificial intelligence %K AI %K preventive %K senior technology %K assisted living %K long-term services %K long-term care providers %D 2020 %7 10.9.2020 %9 Original Paper %J JMIR Aging %G English %X Background: Wearables and artificial intelligence (AI)–powered digital health platforms that utilize machine learning algorithms can autonomously measure a senior’s change in activity and behavior and may be useful tools for proactive interventions that target modifiable risk factors. Objective: The goal of this study was to analyze how a wearable device and AI-powered digital health platform could provide improved health outcomes for older adults in assisted living communities. Methods: Data from 490 residents from six assisted living communities were analyzed retrospectively over 24 months. The intervention group (+CP) consisted of 3 communities that utilized CarePredict (n=256), and the control group (–CP) consisted of 3 communities (n=234) that did not utilize CarePredict. The following outcomes were measured and compared to baseline: hospitalization rate, fall rate, length of stay (LOS), and staff response time. Results: The residents of the +CP and –CP communities exhibit no statistical difference in age (P=.64), sex (P=.63), and staff service hours per resident (P=.94). The data show that the +CP communities exhibited a 39% lower hospitalization rate (P=.02), a 69% lower fall rate (P=.01), and a 67% greater length of stay (P=.03) than the –CP communities. The staff alert acknowledgment and reach resident times also improved in the +CP communities by 37% (P=.02) and 40% (P=.02), respectively. Conclusions: The AI-powered digital health platform provides the community staff with actionable information regarding each resident’s activities and behavior, which can be used to identify older adults that are at an increased risk for a health decline. Staff can use this data to intervene much earlier, protecting seniors from conditions that left untreated could result in hospitalization. In summary, the use of wearables and AI-powered digital health platform can contribute to improved health outcomes for seniors in assisted living communities. The accuracy of the system will be further validated in a larger trial. %M 32723711 %R 10.2196/19554 %U http://aging.jmir.org/2020/2/e19554/ %U https://doi.org/10.2196/19554 %U http://www.ncbi.nlm.nih.gov/pubmed/32723711 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 8 %N 9 %P e18142 %T Neural Network–Based Algorithm for Adjusting Activity Targets to Sustain Exercise Engagement Among People Using Activity Trackers: Retrospective Observation and Algorithm Development Study %A Mohammadi,Ramin %A Atif,Mursal %A Centi,Amanda Jayne %A Agboola,Stephen %A Jethwani,Kamal %A Kvedar,Joseph %A Kamarthi,Sagar %+ Northeastern University, 360 Huntington Ave, Boston, MA, 02115, United States, 1 6173733070, sagar@coe.neu.edu %K activity tracker %K exercise engagement %K dynamic activity target %K neural network %K activity target prediction %K machine learning %D 2020 %7 8.9.2020 %9 Original Paper %J JMIR Mhealth Uhealth %G English %X Background: It is well established that lack of physical activity is detrimental to the overall health of an individual. Modern-day activity trackers enable individuals to monitor their daily activities to meet and maintain targets. This is expected to promote activity encouraging behavior, but the benefits of activity trackers attenuate over time due to waning adherence. One of the key approaches to improving adherence to goals is to motivate individuals to improve on their historic performance metrics. Objective: The aim of this work was to build a machine learning model to predict an achievable weekly activity target by considering (1) patterns in the user’s activity tracker data in the previous week and (2) behavior and environment characteristics. By setting realistic goals, ones that are neither too easy nor too difficult to achieve, activity tracker users can be encouraged to continue to meet these goals, and at the same time, to find utility in their activity tracker. Methods: We built a neural network model that prescribes a weekly activity target for an individual that can be realistically achieved. The inputs to the model were user-specific personal, social, and environmental factors, daily step count from the previous 7 days, and an entropy measure that characterized the pattern of daily step count. Data for training and evaluating the machine learning model were collected over a duration of 9 weeks. Results: Of 30 individuals who were enrolled, data from 20 participants were used. The model predicted target daily count with a mean absolute error of 1545 (95% CI 1383-1706) steps for an 8-week period. Conclusions: Artificial intelligence applied to physical activity data combined with behavioral data can be used to set personalized goals in accordance with the individual’s level of activity and thereby improve adherence to a fitness tracker; this could be used to increase engagement with activity trackers. A follow-up prospective study is ongoing to determine the performance of the engagement algorithm. %M 32897235 %R 10.2196/18142 %U https://mhealth.jmir.org/2020/9/e18142 %U https://doi.org/10.2196/18142 %U http://www.ncbi.nlm.nih.gov/pubmed/32897235 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 9 %P e18930 %T Human- Versus Machine Learning–Based Triage Using Digitalized Patient Histories in Primary Care: Comparative Study %A Entezarjou,Artin %A Bonamy,Anna-Karin Edstedt %A Benjaminsson,Simon %A Herman,Pawel %A Midlöv,Patrik %+ Center for Primary Health Care Research, Department of Clinical Sciences in Malmö/Family Medicine, Lund University, Box 50332, Malmö, 202 13, Sweden, 46 40391400, artin.entezarjou@med.lu.se %K machine learning %K artificial intelligence %K decision support %K primary care %K triage %D 2020 %7 3.9.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Smartphones have made it possible for patients to digitally report symptoms before physical primary care visits. Using machine learning (ML), these data offer an opportunity to support decisions about the appropriate level of care (triage). Objective: The purpose of this study was to explore the interrater reliability between human physicians and an automated ML-based triage method. Methods: After testing several models, a naïve Bayes triage model was created using data from digital medical histories, capable of classifying digital medical history reports as either in need of urgent physical examination or not in need of urgent physical examination. The model was tested on 300 digital medical history reports and classification was compared with the majority vote of an expert panel of 5 primary care physicians (PCPs). Reliability between raters was measured using both Cohen κ (adjusted for chance agreement) and percentage agreement (not adjusted for chance agreement). Results: Interrater reliability as measured by Cohen κ was 0.17 when comparing the majority vote of the reference group with the model. Agreement was 74% (138/186) for cases judged not in need of urgent physical examination and 42% (38/90) for cases judged to be in need of urgent physical examination. No specific features linked to the model’s triage decision could be identified. Between physicians within the panel, Cohen κ was 0.2. Intrarater reliability when 1 physician retriaged 50 reports resulted in Cohen κ of 0.55. Conclusions: Low interrater and intrarater agreement in triage decisions among PCPs limits the possibility to use human decisions as a reference for ML to automate triage in primary care. %M 32880578 %R 10.2196/18930 %U https://medinform.jmir.org/2020/9/e18930 %U https://doi.org/10.2196/18930 %U http://www.ncbi.nlm.nih.gov/pubmed/32880578 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 7 %N 9 %P e19348 %T Utilizing Machine Learning on Internet Search Activity to Support the Diagnostic Process and Relapse Detection in Young Individuals With Early Psychosis: Feasibility Study %A Birnbaum,Michael Leo %A Kulkarni,Prathamesh "Param" %A Van Meter,Anna %A Chen,Victor %A Rizvi,Asra F %A Arenare,Elizabeth %A De Choudhury,Munmun %A Kane,John M %+ The Zucker Hillside Hospital, Northwell Health, 75-59 263rd Street, Glen Oaks, NY, 11004, United States, 1 718 470 8305, Mbirnbaum@northwell.edu %K schizophrenia spectrum disorders %K internet search activity %K Google %K diagnostic prediction %K relapse prediction %K machine learning %K digital data %K digital phenotyping %K digital biomarkers %D 2020 %7 1.9.2020 %9 Original Paper %J JMIR Ment Health %G English %X Background: Psychiatry is nearly entirely reliant on patient self-reporting, and there are few objective and reliable tests or sources of collateral information available to help diagnostic and assessment procedures. Technology offers opportunities to collect objective digital data to complement patient experience and facilitate more informed treatment decisions. Objective: We aimed to develop computational algorithms based on internet search activity designed to support diagnostic procedures and relapse identification in individuals with schizophrenia spectrum disorders. Methods: We extracted 32,733 time-stamped search queries across 42 participants with schizophrenia spectrum disorders and 74 healthy volunteers between the ages of 15 and 35 (mean 24.4 years, 44.0% male), and built machine-learning diagnostic and relapse classifiers utilizing the timing, frequency, and content of online search activity. Results: Classifiers predicted a diagnosis of schizophrenia spectrum disorders with an area under the curve value of 0.74 and predicted a psychotic relapse in individuals with schizophrenia spectrum disorders with an area under the curve of 0.71. Compared with healthy participants, those with schizophrenia spectrum disorders made fewer searches and their searches consisted of fewer words. Prior to a relapse hospitalization, participants with schizophrenia spectrum disorders were more likely to use words related to hearing, perception, and anger, and were less likely to use words related to health. Conclusions: Online search activity holds promise for gathering objective and easily accessed indicators of psychiatric symptoms. Utilizing search activity as collateral behavioral health information would represent a major advancement in efforts to capitalize on objective digital data to improve mental health monitoring. %M 32870161 %R 10.2196/19348 %U https://mental.jmir.org/2020/9/e19348 %U https://doi.org/10.2196/19348 %U http://www.ncbi.nlm.nih.gov/pubmed/32870161 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 8 %P e21056 %T Impact of a Commercial Artificial Intelligence–Driven Patient Self-Assessment Solution on Waiting Times at General Internal Medicine Outpatient Departments: Retrospective Study %A Harada,Yukinori %A Shimizu,Taro %+ Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Kitakobayashi 880, Mibu, 321-0293, Japan, 81 282861111, shimizutaro7@gmail.com %K artificial intelligence %K automated medical history taking system %K eHealth %K interrupted time-series analysis %K waiting time %D 2020 %7 31.8.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Patient waiting time at outpatient departments is directly related to patient satisfaction and quality of care, particularly in patients visiting the general internal medicine outpatient departments for the first time. Moreover, reducing wait time from arrival in the clinic to the initiation of an examination is key to reducing patients’ anxiety. The use of automated medical history–taking systems in general internal medicine outpatient departments is a promising strategy to reduce waiting times. Recently, Ubie Inc in Japan developed AI Monshin, an artificial intelligence–based, automated medical history–taking system for general internal medicine outpatient departments. Objective: We hypothesized that replacing the use of handwritten self-administered questionnaires with the use of AI Monshin would reduce waiting times in general internal medicine outpatient departments. Therefore, we conducted this study to examine whether the use of AI Monshin reduced patient waiting times. Methods: We retrospectively analyzed the waiting times of patients visiting the general internal medicine outpatient department at a Japanese community hospital without an appointment from April 2017 to April 2020. AI Monshin was implemented in April 2019. We compared the median waiting time before and after implementation by conducting an interrupted time-series analysis of the median waiting time per month. We also conducted supplementary analyses to explain the main results. Results: We analyzed 21,615 visits. The median waiting time after AI Monshin implementation (74.4 minutes, IQR 57.1) was not significantly different from that before AI Monshin implementation (74.3 minutes, IQR 63.7) (P=.12). In the interrupted time-series analysis, the underlying linear time trend (–0.4 minutes per month; P=.06; 95% CI –0.9 to 0.02), level change (40.6 minutes; P=.09; 95% CI –5.8 to 87.0), and slope change (–1.1 minutes per month; P=.16; 95% CI –2.7 to 0.4) were not statistically significant. In a supplemental analysis of data from 9054 of 21,615 visits (41.9%), the median examination time after AI Monshin implementation (6.0 minutes, IQR 5.2) was slightly but significantly longer than that before AI Monshin implementation (5.7 minutes, IQR 5.0) (P=.003). Conclusions: The implementation of an artificial intelligence–based, automated medical history–taking system did not reduce waiting time for patients visiting the general internal medicine outpatient department without an appointment, and there was a slight increase in the examination time after implementation; however, the system may have enhanced the quality of care by supporting the optimization of staff assignments. %M 32865504 %R 10.2196/21056 %U http://medinform.jmir.org/2020/8/e21056/ %U https://doi.org/10.2196/21056 %U http://www.ncbi.nlm.nih.gov/pubmed/32865504 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 8 %N 8 %P e19962 %T Predicting Early Warning Signs of Psychotic Relapse From Passive Sensing Data: An Approach Using Encoder-Decoder Neural Networks %A Adler,Daniel A %A Ben-Zeev,Dror %A Tseng,Vincent W-S %A Kane,John M %A Brian,Rachel %A Campbell,Andrew T %A Hauser,Marta %A Scherer,Emily A %A Choudhury,Tanzeem %+ Cornell Tech, 2 W Loop Rd, New York, NY, 10044, United States, 1 2155953769, daa243@cornell.edu %K psychotic disorders %K schizophrenia %K mHealth %K mental health %K mobile health %K smartphone applications %K machine learning %K passive sensing %K digital biomarkers %K digital phenotyping %K artificial intelligence %K deep learning %K mobile phone %D 2020 %7 31.8.2020 %9 Original Paper %J JMIR Mhealth Uhealth %G English %X Background: Schizophrenia spectrum disorders (SSDs) are chronic conditions, but the severity of symptomatic experiences and functional impairments vacillate over the course of illness. Developing unobtrusive remote monitoring systems to detect early warning signs of impending symptomatic relapses would allow clinicians to intervene before the patient’s condition worsens. Objective: In this study, we aim to create the first models, exclusively using passive sensing data from a smartphone, to predict behavioral anomalies that could indicate early warning signs of a psychotic relapse. Methods: Data used to train and test the models were collected during the CrossCheck study. Hourly features derived from smartphone passive sensing data were extracted from 60 patients with SSDs (42 nonrelapse and 18 relapse >1 time throughout the study) and used to train models and test performance. We trained 2 types of encoder-decoder neural network models and a clustering-based local outlier factor model to predict behavioral anomalies that occurred within the 30-day period before a participant's date of relapse (the near relapse period). Models were trained to recreate participant behavior on days of relative health (DRH, outside of the near relapse period), following which a threshold to the recreation error was applied to predict anomalies. The neural network model architecture and the percentage of relapse participant data used to train all models were varied. Results: A total of 20,137 days of collected data were analyzed, with 726 days of data (0.037%) within any 30-day near relapse period. The best performing model used a fully connected neural network autoencoder architecture and achieved a median sensitivity of 0.25 (IQR 0.15-1.00) and specificity of 0.88 (IQR 0.14-0.96; a median 108% increase in behavioral anomalies near relapse). We conducted a post hoc analysis using the best performing model to identify behavioral features that had a medium-to-large effect (Cohen d>0.5) in distinguishing anomalies near relapse from DRH among 4 participants who relapsed multiple times throughout the study. Qualitative validation using clinical notes collected during the original CrossCheck study showed that the identified features from our analysis were presented to clinicians during relapse events. Conclusions: Our proposed method predicted a higher rate of anomalies in patients with SSDs within the 30-day near relapse period and can be used to uncover individual-level behaviors that change before relapse. This approach will enable technologists and clinicians to build unobtrusive digital mental health tools that can predict incipient relapse in SSDs. %M 32865506 %R 10.2196/19962 %U https://mhealth.jmir.org/2020/8/e19962 %U https://doi.org/10.2196/19962 %U http://www.ncbi.nlm.nih.gov/pubmed/32865506 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 8 %P e19870 %T Using Dual Neural Network Architecture to Detect the Risk of Dementia With Community Health Data: Algorithm Development and Validation Study %A Shen,Xiao %A Wang,Guanjin %A Kwan,Rick Yiu-Cho %A Choi,Kup-Sze %+ Centre for Smart Health, School of Nursing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, 852 3400 3214, hskschoi@polyu.edu.hk %K cognitive screening %K dementia risk %K dual neural network %K predictive models %K primary care %D 2020 %7 31.8.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Recent studies have revealed lifestyle behavioral risk factors that can be modified to reduce the risk of dementia. As modification of lifestyle takes time, early identification of people with high dementia risk is important for timely intervention and support. As cognitive impairment is a diagnostic criterion of dementia, cognitive assessment tools are used in primary care to screen for clinically unevaluated cases. Among them, Mini-Mental State Examination (MMSE) is a very common instrument. However, MMSE is a questionnaire that is administered when symptoms of memory decline have occurred. Early administration at the asymptomatic stage and repeated measurements would lead to a practice effect that degrades the effectiveness of MMSE when it is used at later stages. Objective: The aim of this study was to exploit machine learning techniques to assist health care professionals in detecting high-risk individuals by predicting the results of MMSE using elderly health data collected from community-based primary care services. Methods: A health data set of 2299 samples was adopted in the study. The input data were divided into two groups of different characteristics (ie, client profile data and health assessment data). The predictive output was the result of two-class classification of the normal and high-risk cases that were defined based on MMSE. A dual neural network (DNN) model was proposed to obtain the latent representations of the two groups of input data separately, which were then concatenated for the two-class classification. Mean and k-nearest neighbor were used separately to tackle missing data, whereas a cost-sensitive learning (CSL) algorithm was proposed to deal with class imbalance. The performance of the DNN was evaluated by comparing it with that of conventional machine learning methods. Results: A total of 16 predictive models were built using the elderly health data set. Among them, the proposed DNN with CSL outperformed in the detection of high-risk cases. The area under the receiver operating characteristic curve, average precision, sensitivity, and specificity reached 0.84, 0.88, 0.73, and 0.80, respectively. Conclusions: The proposed method has the potential to serve as a tool to screen for elderly people with cognitive impairment and predict high-risk cases of dementia at the asymptomatic stage, providing health care professionals with early signals that can prompt suggestions for a follow-up or a detailed diagnosis. %M 32865498 %R 10.2196/19870 %U https://medinform.jmir.org/2020/8/e19870 %U https://doi.org/10.2196/19870 %U http://www.ncbi.nlm.nih.gov/pubmed/32865498 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 8 %P e19918 %T Is Artificial Intelligence Better Than Human Clinicians in Predicting Patient Outcomes? %A Lee,Joon %+ Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, 3280 Hospital Dr NW, TRW 5E17, Calgary, AB, T2N 4Z6, Canada, 1 403 220 2968, joonwu.lee@ucalgary.ca %K patient outcome prediction %K artificial intelligence %K machine learning %K human-generated predictions %K human-AI symbiosis %D 2020 %7 26.8.2020 %9 Viewpoint %J J Med Internet Res %G English %X In contrast with medical imaging diagnostics powered by artificial intelligence (AI), in which deep learning has led to breakthroughs in recent years, patient outcome prediction poses an inherently challenging problem because it focuses on events that have not yet occurred. Interestingly, the performance of machine learning–based patient outcome prediction models has rarely been compared with that of human clinicians in the literature. Human intuition and insight may be sources of underused predictive information that AI will not be able to identify in electronic data. Both human and AI predictions should be investigated together with the aim of achieving a human-AI symbiosis that synergistically and complementarily combines AI with the predictive abilities of clinicians. %M 32845249 %R 10.2196/19918 %U http://www.jmir.org/2020/8/e19918/ %U https://doi.org/10.2196/19918 %U http://www.ncbi.nlm.nih.gov/pubmed/32845249 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 6 %N 3 %P e20794 %T Big Data, Natural Language Processing, and Deep Learning to Detect and Characterize Illicit COVID-19 Product Sales: Infoveillance Study on Twitter and Instagram %A Mackey,Tim Ken %A Li,Jiawei %A Purushothaman,Vidya %A Nali,Matthew %A Shah,Neal %A Bardier,Cortni %A Cai,Mingxiang %A Liang,Bryan %+ Department of Anesthesiology and Division of Infectious Diseases and Global Public Health, School of Medicine, University of California, San Diego, 8950 Villa La Jolla Drive, A124, La Jolla, CA, 92037, United States, 1 951 491 4161, tmackey@ucsd.edu %K COVID-19 %K coronavirus %K infectious disease %K social media %K surveillance %K infoveillance %K infodemiology %K infodemic %K fraud %K cybercrime %D 2020 %7 25.8.2020 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: The coronavirus disease (COVID-19) pandemic is perhaps the greatest global health challenge of the last century. Accompanying this pandemic is a parallel “infodemic,” including the online marketing and sale of unapproved, illegal, and counterfeit COVID-19 health products including testing kits, treatments, and other questionable “cures.” Enabling the proliferation of this content is the growing ubiquity of internet-based technologies, including popular social media platforms that now have billions of global users. Objective: This study aims to collect, analyze, identify, and enable reporting of suspected fake, counterfeit, and unapproved COVID-19–related health care products from Twitter and Instagram. Methods: This study is conducted in two phases beginning with the collection of COVID-19–related Twitter and Instagram posts using a combination of web scraping on Instagram and filtering the public streaming Twitter application programming interface for keywords associated with suspect marketing and sale of COVID-19 products. The second phase involved data analysis using natural language processing (NLP) and deep learning to identify potential sellers that were then manually annotated for characteristics of interest. We also visualized illegal selling posts on a customized data dashboard to enable public health intelligence. Results: We collected a total of 6,029,323 tweets and 204,597 Instagram posts filtered for terms associated with suspect marketing and sale of COVID-19 health products from March to April for Twitter and February to May for Instagram. After applying our NLP and deep learning approaches, we identified 1271 tweets and 596 Instagram posts associated with questionable sales of COVID-19–related products. Generally, product introduction came in two waves, with the first consisting of questionable immunity-boosting treatments and a second involving suspect testing kits. We also detected a low volume of pharmaceuticals that have not been approved for COVID-19 treatment. Other major themes detected included products offered in different languages, various claims of product credibility, completely unsubstantiated products, unapproved testing modalities, and different payment and seller contact methods. Conclusions: Results from this study provide initial insight into one front of the “infodemic” fight against COVID-19 by characterizing what types of health products, selling claims, and types of sellers were active on two popular social media platforms at earlier stages of the pandemic. This cybercrime challenge is likely to continue as the pandemic progresses and more people seek access to COVID-19 testing and treatment. This data intelligence can help public health agencies, regulatory authorities, legitimate manufacturers, and technology platforms better remove and prevent this content from harming the public. %M 32750006 %R 10.2196/20794 %U http://publichealth.jmir.org/2020/3/e20794/ %U https://doi.org/10.2196/20794 %U http://www.ncbi.nlm.nih.gov/pubmed/32750006 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 8 %P e18189 %T Artificial Intelligence for Caregivers of Persons With Alzheimer’s Disease and Related Dementias: Systematic Literature Review %A Xie,Bo %A Tao,Cui %A Li,Juan %A Hilsabeck,Robin C %A Aguirre,Alyssa %+ School of Nursing, The University of Texas at Austin, 1710 Red River, Austin, TX, 78712, United States, 1 512 232 5788, boxie@utexas.edu %K Alzheimer disease %K dementia %K caregiving %K technology %K artificial intelligence %D 2020 %7 20.8.2020 %9 Review %J JMIR Med Inform %G English %X Background: Artificial intelligence (AI) has great potential for improving the care of persons with Alzheimer’s disease and related dementias (ADRD) and the quality of life of their family caregivers. To date, however, systematic review of the literature on the impact of AI on ADRD management has been lacking. Objective: This paper aims to (1) identify and examine literature on AI that provides information to facilitate ADRD management by caregivers of individuals diagnosed with ADRD and (2) identify gaps in the literature that suggest future directions for research. Methods: Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines for conducting systematic literature reviews, during August and September 2019, we performed 3 rounds of selection. First, we searched predetermined keywords in PubMed, Cumulative Index to Nursing and Allied Health Literature Plus with Full Text, PsycINFO, IEEE Xplore Digital Library, and the ACM Digital Library. This step generated 113 nonduplicate results. Next, we screened the titles and abstracts of the 113 papers according to inclusion and exclusion criteria, after which 52 papers were excluded and 61 remained. Finally, we screened the full text of the remaining papers to ensure that they met the inclusion or exclusion criteria; 31 papers were excluded, leaving a final sample of 30 papers for analysis. Results: Of the 30 papers, 20 reported studies that focused on using AI to assist in activities of daily living. A limited number of specific daily activities were targeted. The studies’ aims suggested three major purposes: (1) to test the feasibility, usability, or perceptions of prototype AI technology; (2) to generate preliminary data on the technology’s performance (primarily accuracy in detecting target events, such as falls); and (3) to understand user needs and preferences for the design and functionality of to-be-developed technology. The majority of the studies were qualitative, with interviews, focus groups, and observation being their most common methods. Cross-sectional surveys were also common, but with small convenience samples. Sample sizes ranged from 6 to 106, with the vast majority on the low end. The majority of the studies were descriptive, exploratory, and lacking theoretical guidance. Many studies reported positive outcomes in favor of their AI technology’s feasibility and satisfaction; some studies reported mixed results on these measures. Performance of the technology varied widely across tasks. Conclusions: These findings call for more systematic designs and evaluations of the feasibility and efficacy of AI-based interventions for caregivers of people with ADRD. These gaps in the research would be best addressed through interdisciplinary collaboration, incorporating complementary expertise from the health sciences and computer science/engineering–related fields. %M 32663146 %R 10.2196/18189 %U http://medinform.jmir.org/2020/8/e18189/ %U https://doi.org/10.2196/18189 %U http://www.ncbi.nlm.nih.gov/pubmed/32663146 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 8 %P e22590 %T Social Network Analysis of COVID-19 Sentiments: Application of Artificial Intelligence %A Hung,Man %A Lauren,Evelyn %A Hon,Eric S %A Birmingham,Wendy C %A Xu,Julie %A Su,Sharon %A Hon,Shirley D %A Park,Jungweon %A Dang,Peter %A Lipsky,Martin S %+ College of Dental Medicine, Roseman University of Health Sciences, 10894 South River Front Parkway, South Jordan, UT, 84095-3538, United States, 1 801 878 1270, mhung@roseman.edu %K COVID-19 %K coronavirus %K sentiment %K social network %K Twitter %K infodemiology %K infodemic %K pandemic %K crisis %K public health %K business economy %K artificial intelligence %D 2020 %7 18.8.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: The coronavirus disease (COVID-19) pandemic led to substantial public discussion. Understanding these discussions can help institutions, governments, and individuals navigate the pandemic. Objective: The aim of this study is to analyze discussions on Twitter related to COVID-19 and to investigate the sentiments toward COVID-19. Methods: This study applied machine learning methods in the field of artificial intelligence to analyze data collected from Twitter. Using tweets originating exclusively in the United States and written in English during the 1-month period from March 20 to April 19, 2020, the study examined COVID-19–related discussions. Social network and sentiment analyses were also conducted to determine the social network of dominant topics and whether the tweets expressed positive, neutral, or negative sentiments. Geographic analysis of the tweets was also conducted. Results: There were a total of 14,180,603 likes, 863,411 replies, 3,087,812 retweets, and 641,381 mentions in tweets during the study timeframe. Out of 902,138 tweets analyzed, sentiment analysis classified 434,254 (48.2%) tweets as having a positive sentiment, 187,042 (20.7%) as neutral, and 280,842 (31.1%) as negative. The study identified 5 dominant themes among COVID-19–related tweets: health care environment, emotional support, business economy, social change, and psychological stress. Alaska, Wyoming, New Mexico, Pennsylvania, and Florida were the states expressing the most negative sentiment while Vermont, North Dakota, Utah, Colorado, Tennessee, and North Carolina conveyed the most positive sentiment. Conclusions: This study identified 5 prevalent themes of COVID-19 discussion with sentiments ranging from positive to negative. These themes and sentiments can clarify the public’s response to COVID-19 and help officials navigate the pandemic. %M 32750001 %R 10.2196/22590 %U http://www.jmir.org/2020/8/e22590/ %U https://doi.org/10.2196/22590 %U http://www.ncbi.nlm.nih.gov/pubmed/32750001 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 8 %P e20007 %T Artificial Intelligence for Rapid Meta-Analysis: Case Study on Ocular Toxicity of Hydroxychloroquine %A Michelson,Matthew %A Chow,Tiffany %A Martin,Neil A %A Ross,Mike %A Tee Qiao Ying,Amelia %A Minton,Steven %+ Evid Science, 2361 Rosencrans Ave Ste 348, El Segundo, CA, 90245-4929, United States, 1 626 765 1903, mmichelson@evidscience.com %K meta-analysis %K rapid meta-analysis %K artificial intelligence %K drug %K analysis %K hydroxychloroquine %K toxic %K COVID-19 %K treatment %K side effect %K ocular %K eye %D 2020 %7 17.8.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Rapid access to evidence is crucial in times of an evolving clinical crisis. To that end, we propose a novel approach to answer clinical queries, termed rapid meta-analysis (RMA). Unlike traditional meta-analysis, RMA balances a quick time to production with reasonable data quality assurances, leveraging artificial intelligence (AI) to strike this balance. Objective: We aimed to evaluate whether RMA can generate meaningful clinical insights, but crucially, in a much faster processing time than traditional meta-analysis, using a relevant, real-world example. Methods: The development of our RMA approach was motivated by a currently relevant clinical question: is ocular toxicity and vision compromise a side effect of hydroxychloroquine therapy? At the time of designing this study, hydroxychloroquine was a leading candidate in the treatment of coronavirus disease (COVID-19). We then leveraged AI to pull and screen articles, automatically extract their results, review the studies, and analyze the data with standard statistical methods. Results: By combining AI with human analysis in our RMA, we generated a meaningful, clinical result in less than 30 minutes. The RMA identified 11 studies considering ocular toxicity as a side effect of hydroxychloroquine and estimated the incidence to be 3.4% (95% CI 1.11%-9.96%). The heterogeneity across individual study findings was high, which should be taken into account in interpretation of the result. Conclusions: We demonstrate that a novel approach to meta-analysis using AI can generate meaningful clinical insights in a much shorter time period than traditional meta-analysis. %M 32804086 %R 10.2196/20007 %U http://www.jmir.org/2020/8/e20007/ %U https://doi.org/10.2196/20007 %U http://www.ncbi.nlm.nih.gov/pubmed/32804086 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 8 %P e17211 %T How Can Artificial Intelligence Make Medicine More Preemptive? %A Iqbal,Usman %A Celi,Leo Anthony %A Li,Yu-Chuan Jack %+ Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, No 172-1, Sec 2, Keelung Rd, Daan District, Taipei, 10675, Taiwan, 886 6638 2736 ext 7601, jack@tmu.edu.tw %K artificial intelligence %K digital health %K eHealth %K health care technology %K medical innovations %K health information technology %K advanced care systems %D 2020 %7 11.8.2020 %9 Viewpoint %J J Med Internet Res %G English %X In this paper we propose the idea that Artificial intelligence (AI) is ushering in a new era of “Earlier Medicine,” which is a predictive approach for disease prevention based on AI modeling and big data. The flourishing health care technological landscape is showing great potential—from diagnosis and prescription automation to the early detection of disease through efficient and cost-effective patient data screening tools that benefit from the predictive capabilities of AI. Monitoring the trajectories of both in- and outpatients has proven to be a task AI can perform to a reliable degree. Predictions can be a significant advantage to health care if they are accurate, prompt, and can be personalized and acted upon efficiently. This is where AI plays a crucial role in “Earlier Medicine” implementation. %M 32780024 %R 10.2196/17211 %U https://www.jmir.org/2020/8/e17211 %U https://doi.org/10.2196/17211 %U http://www.ncbi.nlm.nih.gov/pubmed/32780024 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 8 %P e19104 %T Approaches Based on Artificial Intelligence and the Internet of Intelligent Things to Prevent the Spread of COVID-19: Scoping Review %A Adly,Aya Sedky %A Adly,Afnan Sedky %A Adly,Mahmoud Sedky %+ Faculty of Oral and Dental Medicine, Cairo University, Cairo University Road, Cairo, , Egypt, 20 1145559778, dr.mahmoud.sedky@gmail.com %K SARS-CoV-2 %K COVID-19 %K novel coronavirus %K artificial intelligence %K internet of things %K telemedicine %K machine learning %K modeling %K simulation %K robotics %D 2020 %7 10.8.2020 %9 Review %J J Med Internet Res %G English %X Background: Artificial intelligence (AI) and the Internet of Intelligent Things (IIoT) are promising technologies to prevent the concerningly rapid spread of coronavirus disease (COVID-19) and to maximize safety during the pandemic. With the exponential increase in the number of COVID-19 patients, it is highly possible that physicians and health care workers will not be able to treat all cases. Thus, computer scientists can contribute to the fight against COVID-19 by introducing more intelligent solutions to achieve rapid control of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that causes the disease. Objective: The objectives of this review were to analyze the current literature, discuss the applicability of reported ideas for using AI to prevent and control COVID-19, and build a comprehensive view of how current systems may be useful in particular areas. This may be of great help to many health care administrators, computer scientists, and policy makers worldwide. Methods: We conducted an electronic search of articles in the MEDLINE, Google Scholar, Embase, and Web of Knowledge databases to formulate a comprehensive review that summarizes different categories of the most recently reported AI-based approaches to prevent and control the spread of COVID-19. Results: Our search identified the 10 most recent AI approaches that were suggested to provide the best solutions for maximizing safety and preventing the spread of COVID-19. These approaches included detection of suspected cases, large-scale screening, monitoring, interactions with experimental therapies, pneumonia screening, use of the IIoT for data and information gathering and integration, resource allocation, predictions, modeling and simulation, and robotics for medical quarantine. Conclusions: We found few or almost no studies regarding the use of AI to examine COVID-19 interactions with experimental therapies, the use of AI for resource allocation to COVID-19 patients, or the use of AI and the IIoT for COVID-19 data and information gathering/integration. Moreover, the adoption of other approaches, including use of AI for COVID-19 prediction, use of AI for COVID-19 modeling and simulation, and use of AI robotics for medical quarantine, should be further emphasized by researchers because these important approaches lack sufficient numbers of studies. Therefore, we recommend that computer scientists focus on these approaches, which are still not being adequately addressed. %M 32584780 %R 10.2196/19104 %U https://www.jmir.org/2020/8/e19104 %U https://doi.org/10.2196/19104 %U http://www.ncbi.nlm.nih.gov/pubmed/32584780 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 8 %P e17158 %T Conversational Agents in Health Care: Scoping Review and Conceptual Analysis %A Tudor Car,Lorainne %A Dhinagaran,Dhakshenya Ardhithy %A Kyaw,Bhone Myint %A Kowatsch,Tobias %A Joty,Shafiq %A Theng,Yin-Leng %A Atun,Rifat %+ Family Medicine and Primary Care, Lee Kong Chian School of Medicine, Nanyang Technological University Singapore, 11 Mandalay Road, Singapore, 65 69041258, lorainne.tudor.car@ntu.edu.sg %K conversational agents %K chatbots %K artificial intelligence %K machine learning %K mobile phone %K health care %K scoping review %D 2020 %7 7.8.2020 %9 Review %J J Med Internet Res %G English %X Background: Conversational agents, also known as chatbots, are computer programs designed to simulate human text or verbal conversations. They are increasingly used in a range of fields, including health care. By enabling better accessibility, personalization, and efficiency, conversational agents have the potential to improve patient care. Objective: This study aimed to review the current applications, gaps, and challenges in the literature on conversational agents in health care and provide recommendations for their future research, design, and application. Methods: We performed a scoping review. A broad literature search was performed in MEDLINE (Medical Literature Analysis and Retrieval System Online; Ovid), EMBASE (Excerpta Medica database; Ovid), PubMed, Scopus, and Cochrane Central with the search terms “conversational agents,” “conversational AI,” “chatbots,” and associated synonyms. We also searched the gray literature using sources such as the OCLC (Online Computer Library Center) WorldCat database and ResearchGate in April 2019. Reference lists of relevant articles were checked for further articles. Screening and data extraction were performed in parallel by 2 reviewers. The included evidence was analyzed narratively by employing the principles of thematic analysis. Results: The literature search yielded 47 study reports (45 articles and 2 ongoing clinical trials) that matched the inclusion criteria. The identified conversational agents were largely delivered via smartphone apps (n=23) and used free text only as the main input (n=19) and output (n=30) modality. Case studies describing chatbot development (n=18) were the most prevalent, and only 11 randomized controlled trials were identified. The 3 most commonly reported conversational agent applications in the literature were treatment and monitoring, health care service support, and patient education. Conclusions: The literature on conversational agents in health care is largely descriptive and aimed at treatment and monitoring and health service support. It mostly reports on text-based, artificial intelligence–driven, and smartphone app–delivered conversational agents. There is an urgent need for a robust evaluation of diverse health care conversational agents’ formats, focusing on their acceptability, safety, and effectiveness. %M 32763886 %R 10.2196/17158 %U http://www.jmir.org/2020/8/e17158/ %U https://doi.org/10.2196/17158 %U http://www.ncbi.nlm.nih.gov/pubmed/32763886 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 8 %P e15394 %T Applying Machine Learning Models with An Ensemble Approach for Accurate Real-Time Influenza Forecasting in Taiwan: Development and Validation Study %A Cheng,Hao-Yuan %A Wu,Yu-Chun %A Lin,Min-Hau %A Liu,Yu-Lun %A Tsai,Yue-Yang %A Wu,Jo-Hua %A Pan,Ke-Han %A Ke,Chih-Jung %A Chen,Chiu-Mei %A Liu,Ding-Ping %A Lin,I-Feng %A Chuang,Jen-Hsiang %+ Taiwan Centers for Disease Control, 9F, No. 6, Linsen S. Road, Zhong-zheng District, Taipei, 100, Taiwan, 886 2 2395 9825, jhchuang@cdc.gov.tw %K influenza %K Influenza-like illness %K forecasting %K machine learning %K artificial intelligence %K epidemic forecasting %K surveillance %D 2020 %7 5.8.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Changeful seasonal influenza activity in subtropical areas such as Taiwan causes problems in epidemic preparedness. The Taiwan Centers for Disease Control has maintained real-time national influenza surveillance systems since 2004. Except for timely monitoring, epidemic forecasting using the national influenza surveillance data can provide pivotal information for public health response. Objective: We aimed to develop predictive models using machine learning to provide real-time influenza-like illness forecasts. Methods: Using surveillance data of influenza-like illness visits from emergency departments (from the Real-Time Outbreak and Disease Surveillance System), outpatient departments (from the National Health Insurance database), and the records of patients with severe influenza with complications (from the National Notifiable Disease Surveillance System), we developed 4 machine learning models (autoregressive integrated moving average, random forest, support vector regression, and extreme gradient boosting) to produce weekly influenza-like illness predictions for a given week and 3 subsequent weeks. We established a framework of the machine learning models and used an ensemble approach called stacking to integrate these predictions. We trained the models using historical data from 2008-2014. We evaluated their predictive ability during 2015-2017 for each of the 4-week time periods using Pearson correlation, mean absolute percentage error (MAPE), and hit rate of trend prediction. A dashboard website was built to visualize the forecasts, and the results of real-world implementation of this forecasting framework in 2018 were evaluated using the same metrics. Results: All models could accurately predict the timing and magnitudes of the seasonal peaks in the then-current week (nowcast) (ρ=0.802-0.965; MAPE: 5.2%-9.2%; hit rate: 0.577-0.756), 1-week (ρ=0.803-0.918; MAPE: 8.3%-11.8%; hit rate: 0.643-0.747), 2-week (ρ=0.783-0.867; MAPE: 10.1%-15.3%; hit rate: 0.669-0.734), and 3-week forecasts (ρ=0.676-0.801; MAPE: 12.0%-18.9%; hit rate: 0.643-0.786), especially the ensemble model. In real-world implementation in 2018, the forecasting performance was still accurate in nowcasts (ρ=0.875-0.969; MAPE: 5.3%-8.0%; hit rate: 0.582-0.782) and remained satisfactory in 3-week forecasts (ρ=0.721-0.908; MAPE: 7.6%-13.5%; hit rate: 0.596-0.904). Conclusions: This machine learning and ensemble approach can make accurate, real-time influenza-like illness forecasts for a 4-week period, and thus, facilitate decision making. %M 32755888 %R 10.2196/15394 %U https://www.jmir.org/2020/8/e15394 %U https://doi.org/10.2196/15394 %U http://www.ncbi.nlm.nih.gov/pubmed/32755888 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 7 %P e18228 %T Artificial Intelligence in Health Care: Bibliometric Analysis %A Guo,Yuqi %A Hao,Zhichao %A Zhao,Shichong %A Gong,Jiaqi %A Yang,Fan %+ Social Welfare Program, School of Public Administration, Dongbei University of Finance and Economics, 217 Jianshan Street, Shahekou District, Dalian, China, 86 411 84710562, fyang10@dufe.edu.cn %K health care %K artificial intelligence %K bibliometric analysis %K telehealth %K neural networks %K machine learning %D 2020 %7 29.7.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: As a critical driving power to promote health care, the health care–related artificial intelligence (AI) literature is growing rapidly. Objective: The purpose of this analysis is to provide a dynamic and longitudinal bibliometric analysis of health care–related AI publications. Methods: The Web of Science (Clarivate PLC) was searched to retrieve all existing and highly cited AI-related health care research papers published in English up to December 2019. Based on bibliometric indicators, a search strategy was developed to screen the title for eligibility, using the abstract and full text where needed. The growth rate of publications, characteristics of research activities, publication patterns, and research hotspot tendencies were computed using the HistCite software. Results: The search identified 5235 hits, of which 1473 publications were included in the analyses. Publication output increased an average of 17.02% per year since 1995, but the growth rate of research papers significantly increased to 45.15% from 2014 to 2019. The major health problems studied in AI research are cancer, depression, Alzheimer disease, heart failure, and diabetes. Artificial neural networks, support vector machines, and convolutional neural networks have the highest impact on health care. Nucleosides, convolutional neural networks, and tumor markers have remained research hotspots through 2019. Conclusions: This analysis provides a comprehensive overview of the AI-related research conducted in the field of health care, which helps researchers, policy makers, and practitioners better understand the development of health care–related AI research and possible practice implications. Future AI research should be dedicated to filling in the gaps between AI health care research and clinical applications. %M 32723713 %R 10.2196/18228 %U http://www.jmir.org/2020/7/e18228/ %U https://doi.org/10.2196/18228 %U http://www.ncbi.nlm.nih.gov/pubmed/32723713 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 7 %P e18082 %T Automatic Recognition, Segmentation, and Sex Assignment of Nocturnal Asthmatic Coughs and Cough Epochs in Smartphone Audio Recordings: Observational Field Study %A Barata,Filipe %A Tinschert,Peter %A Rassouli,Frank %A Steurer-Stey,Claudia %A Fleisch,Elgar %A Puhan,Milo Alan %A Brutsche,Martin %A Kotz,David %A Kowatsch,Tobias %+ Center for Digital Health Interventions, Department of Management, Technology, and Economics, ETH Zurich, Weinbergstrasse 56/57, Zurich, 8092, Switzerland, 41 446323509, fbarata@ethz.ch %K asthma %K cough recognition %K cough segmentation %K sex assignment %K deep learning %K smartphone %K mobile phone %D 2020 %7 14.7.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Asthma is one of the most prevalent chronic respiratory diseases. Despite increased investment in treatment, little progress has been made in the early recognition and treatment of asthma exacerbations over the last decade. Nocturnal cough monitoring may provide an opportunity to identify patients at risk for imminent exacerbations. Recently developed approaches enable smartphone-based cough monitoring. These approaches, however, have not undergone longitudinal overnight testing nor have they been specifically evaluated in the context of asthma. Also, the problem of distinguishing partner coughs from patient coughs when two or more people are sleeping in the same room using contact-free audio recordings remains unsolved. Objective: The objective of this study was to evaluate the automatic recognition and segmentation of nocturnal asthmatic coughs and cough epochs in smartphone-based audio recordings that were collected in the field. We also aimed to distinguish partner coughs from patient coughs in contact-free audio recordings by classifying coughs based on sex. Methods: We used a convolutional neural network model that we had developed in previous work for automated cough recognition. We further used techniques (such as ensemble learning, minibatch balancing, and thresholding) to address the imbalance in the data set. We evaluated the classifier in a classification task and a segmentation task. The cough-recognition classifier served as the basis for the cough-segmentation classifier from continuous audio recordings. We compared automated cough and cough-epoch counts to human-annotated cough and cough-epoch counts. We employed Gaussian mixture models to build a classifier for cough and cough-epoch signals based on sex. Results: We recorded audio data from 94 adults with asthma (overall: mean 43 years; SD 16 years; female: 54/94, 57%; male 40/94, 43%). Audio data were recorded by each participant in their everyday environment using a smartphone placed next to their bed; recordings were made over a period of 28 nights. Out of 704,697 sounds, we identified 30,304 sounds as coughs. A total of 26,166 coughs occurred without a 2-second pause between coughs, yielding 8238 cough epochs. The ensemble classifier performed well with a Matthews correlation coefficient of 92% in a pure classification task and achieved comparable cough counts to that of human annotators in the segmentation of coughing. The count difference between automated and human-annotated coughs was a mean –0.1 (95% CI –12.11, 11.91) coughs. The count difference between automated and human-annotated cough epochs was a mean 0.24 (95% CI –3.67, 4.15) cough epochs. The Gaussian mixture model cough epoch–based sex classification performed best yielding an accuracy of 83%. Conclusions: Our study showed longitudinal nocturnal cough and cough-epoch recognition from nightly recorded smartphone-based audio from adults with asthma. The model distinguishes partner cough from patient cough in contact-free recordings by identifying cough and cough-epoch signals that correspond to the sex of the patient. This research represents a step towards enabling passive and scalable cough monitoring for adults with asthma. %M 32459641 %R 10.2196/18082 %U https://www.jmir.org/2020/7/e18082 %U https://doi.org/10.2196/18082 %U http://www.ncbi.nlm.nih.gov/pubmed/32459641 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 7 %P e16649 %T Public Perception of Artificial Intelligence in Medical Care: Content Analysis of Social Media %A Gao,Shuqing %A He,Lingnan %A Chen,Yue %A Li,Dan %A Lai,Kaisheng %+ School of Journalism and Communication, Jinan University, 601 Whampoa Ave W, Guangzhou, , China, 86 020 38374980, kaishenglai@126.com %K artificial intelligence %K public perception %K social media %K content analysis %K medical care %D 2020 %7 13.7.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: High-quality medical resources are in high demand worldwide, and the application of artificial intelligence (AI) in medical care may help alleviate the crisis related to this shortage. The development of the medical AI industry depends to a certain extent on whether industry experts have a comprehensive understanding of the public’s views on medical AI. Currently, the opinions of the general public on this matter remain unclear. Objective: The purpose of this study is to explore the public perception of AI in medical care through a content analysis of social media data, including specific topics that the public is concerned about; public attitudes toward AI in medical care and the reasons for them; and public opinion on whether AI can replace human doctors. Methods: Through an application programming interface, we collected a data set from the Sina Weibo platform comprising more than 16 million users throughout China by crawling all public posts from January to December 2017. Based on this data set, we identified 2315 posts related to AI in medical care and classified them through content analysis. Results: Among the 2315 identified posts, we found three types of AI topics discussed on the platform: (1) technology and application (n=987, 42.63%), (2) industry development (n=706, 30.50%), and (3) impact on society (n=622, 26.87%). Out of 956 posts where public attitudes were expressed, 59.4% (n=568), 34.4% (n=329), and 6.2% (n=59) of the posts expressed positive, neutral, and negative attitudes, respectively. The immaturity of AI technology (27/59, 46%) and a distrust of related companies (n=15, 25%) were the two main reasons for the negative attitudes. Across 200 posts that mentioned public attitudes toward replacing human doctors with AI, 47.5% (n=95) and 32.5% (n=65) of the posts expressed that AI would completely or partially replace human doctors, respectively. In comparison, 20.0% (n=40) of the posts expressed that AI would not replace human doctors. Conclusions: Our findings indicate that people are most concerned about AI technology and applications. Generally, the majority of people held positive attitudes and believed that AI doctors would completely or partially replace human ones. Compared with previous studies on medical doctors, the general public has a more positive attitude toward medical AI. Lack of trust in AI and the absence of the humanistic care factor are essential reasons why some people still have a negative attitude toward medical AI. We suggest that practitioners may need to pay more attention to promoting the credibility of technology companies and meeting patients’ emotional needs instead of focusing merely on technical issues. %M 32673231 %R 10.2196/16649 %U http://www.jmir.org/2020/7/e16649/ %U https://doi.org/10.2196/16649 %U http://www.ncbi.nlm.nih.gov/pubmed/32673231 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 7 %P e16021 %T Effectiveness and Safety of Using Chatbots to Improve Mental Health: Systematic Review and Meta-Analysis %A Abd-Alrazaq,Alaa Ali %A Rababeh,Asma %A Alajlani,Mohannad %A Bewick,Bridgette M %A Househ,Mowafa %+ College of Science and Engineering, Hamad Bin Khalifa University, Liberal Arts and Sciences Building, Education City, Ar Rayyan, Doha, Qatar, 974 55708549, mhouseh@hbku.edu.qa %K chatbots %K conversational agents %K mental health %K mental disorders %K depression %K anxiety %K effectiveness %K safety %D 2020 %7 13.7.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: The global shortage of mental health workers has prompted the utilization of technological advancements, such as chatbots, to meet the needs of people with mental health conditions. Chatbots are systems that are able to converse and interact with human users using spoken, written, and visual language. While numerous studies have assessed the effectiveness and safety of using chatbots in mental health, no reviews have pooled the results of those studies. Objective: This study aimed to assess the effectiveness and safety of using chatbots to improve mental health through summarizing and pooling the results of previous studies. Methods: A systematic review was carried out to achieve this objective. The search sources were 7 bibliographic databases (eg, MEDLINE, EMBASE, PsycINFO), the search engine “Google Scholar,” and backward and forward reference list checking of the included studies and relevant reviews. Two reviewers independently selected the studies, extracted data from the included studies, and assessed the risk of bias. Data extracted from studies were synthesized using narrative and statistical methods, as appropriate. Results: Of 1048 citations retrieved, we identified 12 studies examining the effect of using chatbots on 8 outcomes. Weak evidence demonstrated that chatbots were effective in improving depression, distress, stress, and acrophobia. In contrast, according to similar evidence, there was no statistically significant effect of using chatbots on subjective psychological wellbeing. Results were conflicting regarding the effect of chatbots on the severity of anxiety and positive and negative affect. Only two studies assessed the safety of chatbots and concluded that they are safe in mental health, as no adverse events or harms were reported. Conclusions: Chatbots have the potential to improve mental health. However, the evidence in this review was not sufficient to definitely conclude this due to lack of evidence that their effect is clinically important, a lack of studies assessing each outcome, high risk of bias in those studies, and conflicting results for some outcomes. Further studies are required to draw solid conclusions about the effectiveness and safety of chatbots. Trial Registration: PROSPERO International Prospective Register of Systematic Reviews CRD42019141219; https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42019141219 %M 32673216 %R 10.2196/16021 %U http://www.jmir.org/2020/7/e16021/ %U https://doi.org/10.2196/16021 %U http://www.ncbi.nlm.nih.gov/pubmed/32673216 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 7 %P e18697 %T Diagnosing Parkinson Disease Through Facial Expression Recognition: Video Analysis %A Jin,Bo %A Qu,Yue %A Zhang,Liang %A Gao,Zhan %+ Dongbei University of Finance and Economics, 217 Jianshan St, Shahekou District, Dalian, China, 86 15524709655, liang.zhang@dufe.edu.cn %K Parkinson disease %K face landmarks %K machine learning %K artificial intelligence %D 2020 %7 10.7.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: The number of patients with neurological diseases is currently increasing annually, which presents tremendous challenges for both patients and doctors. With the advent of advanced information technology, digital medical care is gradually changing the medical ecology. Numerous people are exploring new ways to receive a consultation, track their diseases, and receive rehabilitation training in more convenient and efficient ways. In this paper, we explore the use of facial expression recognition via artificial intelligence to diagnose a typical neurological system disease, Parkinson disease (PD). Objective: This study proposes methods to diagnose PD through facial expression recognition. Methods: We collected videos of facial expressions of people with PD and matched controls. We used relative coordinates and positional jitter to extract facial expression features (facial expression amplitude and shaking of small facial muscle groups) from the key points returned by Face++. Algorithms from traditional machine learning and advanced deep learning were utilized to diagnose PD. Results: The experimental results showed our models can achieve outstanding facial expression recognition ability for PD diagnosis. Applying a long short-term model neural network to the positions of the key features, precision and F1 values of 86% and 75%, respectively, can be reached. Further, utilizing a support vector machine algorithm for the facial expression amplitude features and shaking of the small facial muscle groups, an F1 value of 99% can be achieved. Conclusions: This study contributes to the digital diagnosis of PD based on facial expression recognition. The disease diagnosis model was validated through our experiment. The results can help doctors understand the real-time dynamics of the disease and even conduct remote diagnosis. %M 32673247 %R 10.2196/18697 %U https://www.jmir.org/2020/7/e18697 %U https://doi.org/10.2196/18697 %U http://www.ncbi.nlm.nih.gov/pubmed/32673247 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 8 %N 7 %P e17558 %T A Physical Activity and Diet Program Delivered by Artificially Intelligent Virtual Health Coach: Proof-of-Concept Study %A Maher,Carol Ann %A Davis,Courtney Rose %A Curtis,Rachel Grace %A Short,Camille Elizabeth %A Murphy,Karen Joy %+ Alliance for Research in Exercise, Nutrition and Activity, Allied Health and Human Performance, University of South Australia, GPO Box 2471, Adelaide, 5001, Australia, 61 883022315, carol.maher@unisa.edu.au %K virtual assistant %K chatbot %K Mediterranean diet %K physical activity %K lifestyle %D 2020 %7 10.7.2020 %9 Original Paper %J JMIR Mhealth Uhealth %G English %X Background: Poor diet and physical inactivity are leading modifiable causes of death and disease. Advances in artificial intelligence technology present tantalizing opportunities for creating virtual health coaches capable of providing personalized support at scale. Objective: This proof of concept study aimed to test the feasibility (recruitment and retention) and preliminary efficacy of physical activity and Mediterranean-style dietary intervention (MedLiPal) delivered via artificially intelligent virtual health coach. Methods: This 12-week single-arm pre-post study took place in Adelaide, Australia, from March to August 2019. Participants were inactive community-dwelling adults aged 45 to 75 years, recruited through news stories, social media posts, and flyers. The program included access to an artificially intelligent chatbot, Paola, who guided participants through a computer-based individualized introductory session, weekly check-ins, and goal setting, and was available 24/7 to answer questions. Participants used a Garmin Vivofit4 tracker to monitor daily steps, a website with educational materials and recipes, and a printed diet and activity log sheet. Primary outcomes included feasibility (based on recruitment and retention) and preliminary efficacy for changing physical activity and diet. Secondary outcomes were body composition (based on height, weight, and waist circumference) and blood pressure. Results: Over 4 weeks, 99 potential participants registered expressions of interest, with 81 of those screened meeting eligibility criteria. Participants completed a mean of 109.8 (95% CI 1.9-217.7) more minutes of physical activity at week 12 compared with baseline. Mediterranean diet scores increased from a mean of 3.8 out of 14 at baseline, to 9.6 at 12 weeks (mean improvement 5.7 points, 95% CI 4.2-7.3). After 12 weeks, participants lost an average 1.3 kg (95% CI –0.1 to –2.5 kg) and 2.1 cm from their waist circumference (95% CI –3.5 to –0.7 cm). There were no significant changes in blood pressure. Feasibility was excellent in terms of recruitment, retention (90% at 12 weeks), and safety (no adverse events). Conclusions: An artificially intelligent virtual assistant-led lifestyle-modification intervention was feasible and achieved measurable improvements in physical activity, diet, and body composition at 12 weeks. Future research examining artificially intelligent interventions at scale, and for other health purposes, is warranted. %M 32673246 %R 10.2196/17558 %U https://mhealth.jmir.org/2020/7/e17558 %U https://doi.org/10.2196/17558 %U http://www.ncbi.nlm.nih.gov/pubmed/32673246 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 8 %N 7 %P e17216 %T Development and Clinical Evaluation of a Web-Based Upper Limb Home Rehabilitation System Using a Smartwatch and Machine Learning Model for Chronic Stroke Survivors: Prospective Comparative Study %A Chae,Sang Hoon %A Kim,Yushin %A Lee,Kyoung-Soub %A Park,Hyung-Soon %+ Department of Mechanical Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea, 82 42 350 3038, hyungspark@kaist.ac.kr %K home-based rehabilitation %K artificial intelligence %K machine learning %K wearable device %K smartwatch %K chronic stroke %D 2020 %7 9.7.2020 %9 Original Paper %J JMIR Mhealth Uhealth %G English %X Background: Recent advancements in wearable sensor technology have shown the feasibility of remote physical therapy at home. In particular, the current COVID-19 pandemic has revealed the need and opportunity of internet-based wearable technology in future health care systems. Previous research has shown the feasibility of human activity recognition technologies for monitoring rehabilitation activities in home environments; however, few comprehensive studies ranging from development to clinical evaluation exist. Objective: This study aimed to (1) develop a home-based rehabilitation (HBR) system that can recognize and record the type and frequency of rehabilitation exercises conducted by the user using a smartwatch and smartphone app equipped with a machine learning (ML) algorithm and (2) evaluate the efficacy of the home-based rehabilitation system through a prospective comparative study with chronic stroke survivors. Methods: The HBR system involves an off-the-shelf smartwatch, a smartphone, and custom-developed apps. A convolutional neural network was used to train the ML algorithm for detecting home exercises. To determine the most accurate way for detecting the type of home exercise, we compared accuracy results with the data sets of personal or total data and accelerometer, gyroscope, or accelerometer combined with gyroscope data. From March 2018 to February 2019, we conducted a clinical study with two groups of stroke survivors. In total, 17 and 6 participants were enrolled for statistical analysis in the HBR group and control group, respectively. To measure clinical outcomes, we performed the Wolf Motor Function Test (WMFT), Fugl-Meyer Assessment of Upper Extremity, grip power test, Beck Depression Inventory, and range of motion (ROM) assessment of the shoulder joint at 0, 6, and 12 months, and at a follow-up assessment 6 weeks after retrieving the HBR system. Results: The ML model created with personal data involving accelerometer combined with gyroscope data (5590/5601, 99.80%) was the most accurate compared with accelerometer (5496/5601, 98.13%) or gyroscope data (5381/5601, 96.07%). In the comparative study, the drop-out rates in the control and HBR groups were 40% (4/10) and 22% (5/22) at 12 weeks and 100% (10/10) and 45% (10/22) at 18 weeks, respectively. The HBR group (n=17) showed a significant improvement in the mean WMFT score (P=.02) and ROM of flexion (P=.004) and internal rotation (P=.001). The control group (n=6) showed a significant change only in shoulder internal rotation (P=.03). Conclusions: This study found that a home care system using a commercial smartwatch and ML model can facilitate participation in home training and improve the functional score of the WMFT and shoulder ROM of flexion and internal rotation in the treatment of patients with chronic stroke. This strategy can possibly be a cost-effective tool for the home care treatment of stroke survivors in the future. Trial Registration: Clinical Research Information Service KCT0004818; https://tinyurl.com/y92w978t %M 32480361 %R 10.2196/17216 %U http://mhealth.jmir.org/2020/7/e17216/ %U https://doi.org/10.2196/17216 %U http://www.ncbi.nlm.nih.gov/pubmed/32480361 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 7 %P e14500 %T Identifying the Medical Lethality of Suicide Attempts Using Network Analysis and Deep Learning: Nationwide Study %A Kim,Bora %A Kim,Younghoon %A Park,C Hyung Keun %A Rhee,Sang Jin %A Kim,Young Shin %A Leventhal,Bennett L %A Ahn,Yong Min %A Paik,Hyojung %+ Center for Supercomputing Applications, Division of Supercomputing, Korea Institute of Science and Technology Information (KISTI), 245 Daehak-ro, Yuseong-gu, Daejeon, 305-806, Republic of Korea, 1 82 42 869 1004, hyojungpaik@kisti.re.kr %K suicide %K deep learning %K network %K antecedent behaviors %D 2020 %7 9.7.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Suicide is one of the leading causes of death among young and middle-aged people. However, little is understood about the behaviors leading up to actual suicide attempts and whether these behaviors are specific to the nature of suicide attempts. Objective: The goal of this study was to examine the clusters of behaviors antecedent to suicide attempts to determine if they could be used to assess the potential lethality of the attempt. To accomplish this goal, we developed a deep learning model using the relationships among behaviors antecedent to suicide attempts and the attempts themselves. Methods: This study used data from the Korea National Suicide Survey. We identified 1112 individuals who attempted suicide and completed a psychiatric evaluation in the emergency room. The 15-item Beck Suicide Intent Scale (SIS) was used for assessing antecedent behaviors, and the medical outcomes of the suicide attempts were measured by assessing lethality with the Columbia Suicide Severity Rating Scale (C-SSRS; lethal suicide attempt >3 and nonlethal attempt ≤3). Results: Using scores from the SIS, individuals who had lethal and nonlethal attempts comprised two different network nodes with the edges representing the relationships among nodes. Among the antecedent behaviors, the conception of a method’s lethality predicted suicidal behaviors with severe medical outcomes. The vectorized relationship values among the elements of antecedent behaviors in our deep learning model (E-GONet) increased performances, such as F1 and area under the precision-recall gain curve (AUPRG), for identifying lethal attempts (up to 3% for F1 and 32% for AUPRG), as compared with other models (mean F1: 0.81 for E-GONet, 0.78 for linear regression, and 0.80 for random forest; mean AUPRG: 0.73 for E-GONet, 0.41 for linear regression, and 0.69 for random forest). Conclusions: The relationships among behaviors antecedent to suicide attempts can be used to understand the suicidal intent of individuals and help identify the lethality of potential suicide attempts. Such a model may be useful in prioritizing cases for preventive intervention. %M 32673253 %R 10.2196/14500 %U http://medinform.jmir.org/2020/7/e14500/ %U https://doi.org/10.2196/14500 %U http://www.ncbi.nlm.nih.gov/pubmed/32673253 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 7 %P e17707 %T Artificial Intelligence and Health Technology Assessment: Anticipating a New Level of Complexity %A Alami,Hassane %A Lehoux,Pascale %A Auclair,Yannick %A de Guise,Michèle %A Gagnon,Marie-Pierre %A Shaw,James %A Roy,Denis %A Fleet,Richard %A Ag Ahmed,Mohamed Ali %A Fortin,Jean-Paul %+ Institut national d'excellence en santé et services sociaux, 2021, Avenue Union, Montréal, QC, H3A 2S9, Canada, 1 514 873 2563 ext 24404, hassane.alami@umontreal.ca %K artificial intelligence %K health technology assessment %K eHealth %K health care %K medical device %K patient %K health services %D 2020 %7 7.7.2020 %9 Viewpoint %J J Med Internet Res %G English %X Artificial intelligence (AI) is seen as a strategic lever to improve access, quality, and efficiency of care and services and to build learning and value-based health systems. Many studies have examined the technical performance of AI within an experimental context. These studies provide limited insights into the issues that its use in a real-world context of care and services raises. To help decision makers address these issues in a systemic and holistic manner, this viewpoint paper relies on the health technology assessment core model to contrast the expectations of the health sector toward the use of AI with the risks that should be mitigated for its responsible deployment. The analysis adopts the perspective of payers (ie, health system organizations and agencies) because of their central role in regulating, financing, and reimbursing novel technologies. This paper suggests that AI-based systems should be seen as a health system transformation lever, rather than a discrete set of technological devices. Their use could bring significant changes and impacts at several levels: technological, clinical, human and cognitive (patient and clinician), professional and organizational, economic, legal, and ethical. The assessment of AI’s value proposition should thus go beyond technical performance and cost logic by performing a holistic analysis of its value in a real-world context of care and services. To guide AI development, generate knowledge, and draw lessons that can be translated into action, the right political, regulatory, organizational, clinical, and technological conditions for innovation should be created as a first step. %M 32406850 %R 10.2196/17707 %U https://www.jmir.org/2020/7/e17707 %U https://doi.org/10.2196/17707 %U http://www.ncbi.nlm.nih.gov/pubmed/32406850 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 6 %N 1 %P e19285 %T Artificial Intelligence Education and Tools for Medical and Health Informatics Students: Systematic Review %A Sapci,A Hasan %A Sapci,H Aylin %+ Adelphi University, Nexus Building, 1 South Avenue, Garden City, NY, 11530, United States, 1 5168338156, sapci@adelphi.edu %K artificial intelligence %K education %K machine learning %K deep learning %K medical education %K health informatics %K systematic review %D 2020 %7 30.6.2020 %9 Review %J JMIR Med Educ %G English %X Background: The use of artificial intelligence (AI) in medicine will generate numerous application possibilities to improve patient care, provide real-time data analytics, and enable continuous patient monitoring. Clinicians and health informaticians should become familiar with machine learning and deep learning. Additionally, they should have a strong background in data analytics and data visualization to use, evaluate, and develop AI applications in clinical practice. Objective: The main objective of this study was to evaluate the current state of AI training and the use of AI tools to enhance the learning experience. Methods: A comprehensive systematic review was conducted to analyze the use of AI in medical and health informatics education, and to evaluate existing AI training practices. PRISMA-P (Preferred Reporting Items for Systematic Reviews and Meta-Analysis Protocols) guidelines were followed. The studies that focused on the use of AI tools to enhance medical education and the studies that investigated teaching AI as a new competency were categorized separately to evaluate recent developments. Results: This systematic review revealed that recent publications recommend the integration of AI training into medical and health informatics curricula. Conclusions: To the best of our knowledge, this is the first systematic review exploring the current state of AI education in both medicine and health informatics. Since AI curricula have not been standardized and competencies have not been determined, a framework for specialized AI training in medical and health informatics education is proposed. %M 32602844 %R 10.2196/19285 %U http://mededu.jmir.org/2020/1/e19285/ %U https://doi.org/10.2196/19285 %U http://www.ncbi.nlm.nih.gov/pubmed/32602844 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 6 %P e19202 %T Medical Emergency Resource Allocation Model in Large-Scale Emergencies Based on Artificial Intelligence: Algorithm Development %A Du,Lin %+ School of Information Science and Engineering, Qilu Normal University, No 33, Shanshi East Road, Jinan, China, 86 13793161610, dul1028@163.com %K medical emergency %K resource allocation model %K distribution model %K large-scale emergencies %K artificial intelligence %D 2020 %7 25.6.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Before major emergencies occur, the government needs to prepare various emergency supplies in advance. To do this, it should consider the coordinated storage of different types of materials while ensuring that emergency materials are not missed or superfluous. Objective: This paper aims to improve the dispatch and transportation efficiency of emergency materials under a model in which the government makes full use of Internet of Things technology and artificial intelligence technology. Methods: The paper established a model for emergency material preparation and dispatch based on queueing theory and further established a workflow system for emergency material preparation, dispatch, and transportation based on a Petri net, resulting in a highly efficient emergency material preparation and dispatch simulation system framework. Results: A decision support platform was designed to integrate all the algorithms and principles proposed. Conclusions: The resulting framework can effectively coordinate the workflow of emergency material preparation and dispatch, helping to shorten the total time of emergency material preparation, dispatch, and transportation. %M 32584262 %R 10.2196/19202 %U http://medinform.jmir.org/2020/6/e19202/ %U https://doi.org/10.2196/19202 %U http://www.ncbi.nlm.nih.gov/pubmed/32584262 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 4 %N 6 %P e18890 %T Adherence of the #Here4U App – Military Version to Criteria for the Development of Rigorous Mental Health Apps %A Linden,Brooke %A Tam-Seto,Linna %A Stuart,Heather %+ Health Services and Policy Research Institute, Queen's University, 21 Arch Street, Kingston, ON, K7L 3L3, Canada, 1 613 533 6387, brooke.linden@queensu.ca %K mental health services %K telemedicine %K mHealth %K chatbot %K e-solutions %K Canadian Armed Forces %K military health %K mobile phone %D 2020 %7 17.6.2020 %9 Original Paper %J JMIR Form Res %G English %X Background: Over the past several years, the emergence of mobile mental health apps has increased as a potential solution for populations who may face logistical and social barriers to traditional service delivery, including individuals connected to the military. Objective: The goal of the #Here4U App – Military Version is to provide evidence-informed mental health support to members of Canada’s military community, leveraging artificial intelligence in the form of IBM Canada’s Watson Assistant to carry on unique text-based conversations with users, identify presenting mental health concerns, and refer users to self-help resources or recommend professional health care where appropriate. Methods: As the availability and use of mental health apps has increased, so too has the list of recommendations and guidelines for efficacious development. We describe the development and testing conducted between 2018 and 2020 and assess the quality of the #Here4U App against 16 criteria for rigorous mental health app development, as identified by Bakker and colleagues in 2016. Results: The #Here4U App – Military Version met the majority of Bakker and colleagues’ criteria, with those unmet considered not applicable to this particular product or out of scope for research conducted to date. Notably, a formal evaluation of the efficacy of the app is a major priority moving forward. Conclusions: The #Here4U App – Military Version is a promising new mental health e-solution for members of the Canadian Armed Forces community, filling many of the gaps left by traditional service delivery. %M 32554374 %R 10.2196/18890 %U https://formative.jmir.org/2020/6/e18890 %U https://doi.org/10.2196/18890 %U http://www.ncbi.nlm.nih.gov/pubmed/32554374 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 6 %P e18301 %T Technical Metrics Used to Evaluate Health Care Chatbots: Scoping Review %A Abd-Alrazaq,Alaa %A Safi,Zeineb %A Alajlani,Mohannad %A Warren,Jim %A Househ,Mowafa %A Denecke,Kerstin %+ Institute for Medical Informatics, Bern University of Applied Sciences, Quellgasse 21, 2502 Biel, Bern, Switzerland, 41 76 409 97 61, kerstin.denecke@bfh.ch %K chatbots %K conversational agents %K health care %K evaluation %K metrics %D 2020 %7 5.6.2020 %9 Review %J J Med Internet Res %G English %X Background: Dialog agents (chatbots) have a long history of application in health care, where they have been used for tasks such as supporting patient self-management and providing counseling. Their use is expected to grow with increasing demands on health systems and improving artificial intelligence (AI) capability. Approaches to the evaluation of health care chatbots, however, appear to be diverse and haphazard, resulting in a potential barrier to the advancement of the field. Objective: This study aims to identify the technical (nonclinical) metrics used by previous studies to evaluate health care chatbots. Methods: Studies were identified by searching 7 bibliographic databases (eg, MEDLINE and PsycINFO) in addition to conducting backward and forward reference list checking of the included studies and relevant reviews. The studies were independently selected by two reviewers who then extracted data from the included studies. Extracted data were synthesized narratively by grouping the identified metrics into categories based on the aspect of chatbots that the metrics evaluated. Results: Of the 1498 citations retrieved, 65 studies were included in this review. Chatbots were evaluated using 27 technical metrics, which were related to chatbots as a whole (eg, usability, classifier performance, speed), response generation (eg, comprehensibility, realism, repetitiveness), response understanding (eg, chatbot understanding as assessed by users, word error rate, concept error rate), and esthetics (eg, appearance of the virtual agent, background color, and content). Conclusions: The technical metrics of health chatbot studies were diverse, with survey designs and global usability metrics dominating. The lack of standardization and paucity of objective measures make it difficult to compare the performance of health chatbots and could inhibit advancement of the field. We suggest that researchers more frequently include metrics computed from conversation logs. In addition, we recommend the development of a framework of technical metrics with recommendations for specific circumstances for their inclusion in chatbot studies. %M 32442157 %R 10.2196/18301 %U http://www.jmir.org/2020/6/e18301/ %U https://doi.org/10.2196/18301 %U http://www.ncbi.nlm.nih.gov/pubmed/32442157 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 4 %N 6 %P e16670 %T Patient Perception of Plain-Language Medical Notes Generated Using Artificial Intelligence Software: Pilot Mixed-Methods Study %A Bala,Sandeep %A Keniston,Angela %A Burden,Marisha %+ College of Medicine, University of Central Florida, 6850 Lake Nona Blvd, Orlando, FL, , United States, 1 321 299 8429, ucfbala@knights.ucf.edu %K artificial intelligence %K patient education %K natural language processing %K OpenNotes %K Open Notes %K patient-physician relationship %K simplified notes %K plain-language notes %D 2020 %7 5.6.2020 %9 Original Paper %J JMIR Form Res %G English %X Background: Clinicians’ time with patients has become increasingly limited due to regulatory burden, documentation and billing, administrative responsibilities, and market forces. These factors limit clinicians’ time to deliver thorough explanations to patients. OpenNotes began as a research initiative exploring the ability of sharing medical notes with patients to help patients understand their health care. Providing patients access to their medical notes has been shown to have many benefits, including improved patient satisfaction and clinical outcomes. OpenNotes has since evolved into a national movement that helps clinicians share notes with patients. However, a significant barrier to the widespread adoption of OpenNotes has been clinicians’ concerns that OpenNotes may cost additional time to correct patient confusion over medical language. Recent advances in artificial intelligence (AI) technology may help resolve this concern by converting medical notes to plain language with minimal time required of clinicians. Objective: This pilot study assesses patient comprehension and perceived benefits, concerns, and insights regarding an AI-simplified note through comprehension questions and guided interview. Methods: Synthea, a synthetic patient generator, was used to generate a standardized medical-language patient note which was then simplified using AI software. A multiple-choice comprehension assessment questionnaire was drafted with physician input. Study participants were recruited from inpatients at the University of Colorado Hospital. Participants were randomly assigned to be tested for their comprehension of the standardized medical-language version or AI-generated plain-language version of the patient note. Following this, participants reviewed the opposite version of the note and participated in a guided interview. A Student t test was performed to assess for differences in comprehension assessment scores between plain-language and medical-language note groups. Multivariate modeling was performed to assess the impact of demographic variables on comprehension. Interview responses were thematically analyzed. Results: Twenty patients agreed to participate. The mean number of comprehension assessment questions answered correctly was found to be higher in the plain-language group compared with the medical-language group; however, the Student t test was found to be underpowered to determine if this was significant. Age, ethnicity, and health literacy were found to have a significant impact on comprehension scores by multivariate modeling. Thematic analysis of guided interviews highlighted patients’ perceived benefits, concerns, and suggestions regarding such notes. Major themes of benefits were that simplified plain-language notes may (1) be more useable than unsimplified medical-language notes, (2) improve the patient-clinician relationship, and (3) empower patients through an enhanced understanding of their health care. Conclusions: AI software may translate medical notes into plain-language notes that are perceived as beneficial by patients. Limitations included sample size, inpatient-only setting, and possible confounding factors. Larger studies are needed to assess comprehension. Insight from patient responses to guided interviews can guide the future study and development of this technology. %M 32442148 %R 10.2196/16670 %U https://formative.jmir.org/2020/6/e16670 %U https://doi.org/10.2196/16670 %U http://www.ncbi.nlm.nih.gov/pubmed/32442148 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 6 %P e18677 %T Application of an Isolated Word Speech Recognition System in the Field of Mental Health Consultation: Development and Usability Study %A Fu,Weifeng %+ Liberal Arts College, Hunan Normal University, 36 Lushan Road, Changsha, 410081, China, 86 18973101748, fwf1126@hunnu.edu.cn %K speech recognition %K isolated words %K mental health %K small vocabulary %K HMM %K hidden Markov model %K programming %D 2020 %7 3.6.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Speech recognition is a technology that enables machines to understand human language. Objective: In this study, speech recognition of isolated words from a small vocabulary was applied to the field of mental health counseling. Methods: A software platform was used to establish a human-machine chat for psychological counselling. The software uses voice recognition technology to decode the user's voice information. The software system analyzes and processes the user's voice information according to many internal related databases, and then gives the user accurate feedback. For users who need psychological treatment, the system provides them with psychological education. Results: The speech recognition system included features such as speech extraction, endpoint detection, feature value extraction, training data, and speech recognition. Conclusions: The Hidden Markov Model was adopted, based on multithread programming under a VC2005 compilation environment, to realize the parallel operation of the algorithm and improve the efficiency of speech recognition. After the design was completed, simulation debugging was performed in the laboratory. The experimental results showed that the designed program met the basic requirements of a speech recognition system. %M 32384054 %R 10.2196/18677 %U https://medinform.jmir.org/2020/6/e18677 %U https://doi.org/10.2196/18677 %U http://www.ncbi.nlm.nih.gov/pubmed/32384054 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 5 %P e16896 %T Artificial Intelligence–Assisted System in Postoperative Follow-up of Orthopedic Patients: Exploratory Quantitative and Qualitative Study %A Bian,Yanyan %A Xiang,Yongbo %A Tong,Bingdu %A Feng,Bin %A Weng,Xisheng %+ Department of Orthopedic Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, No 1 Shuaifuyuan, Dongcheng District, Beijing, 100073, China, 86 13021159994, doctorwxs@163.com %K artificial intelligence %K conversational agent %K follow-up %K cost-effectiveness %D 2020 %7 26.5.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Patient follow-up is an essential part of hospital ward management. With the development of deep learning algorithms, individual follow-up assignments might be completed by artificial intelligence (AI). We developed an AI-assisted follow-up conversational agent that can simulate the human voice and select an appropriate follow-up time for quantitative, automatic, and personalized patient follow-up. Patient feedback and voice information could be collected and converted into text data automatically. Objective: The primary objective of this study was to compare the cost-effectiveness of AI-assisted follow-up to manual follow-up of patients after surgery. The secondary objective was to compare the feedback from AI-assisted follow-up to feedback from manual follow-up. Methods: The AI-assisted follow-up system was adopted in the Orthopedic Department of Peking Union Medical College Hospital in April 2019. A total of 270 patients were followed up through this system. Prior to that, 2656 patients were followed up by phone calls manually. Patient characteristics, telephone connection rate, follow-up rate, feedback collection rate, time spent, and feedback composition were compared between the two groups of patients. Results: There was no statistically significant difference in age, gender, or disease between the two groups. There was no significant difference in telephone connection rate (manual: 2478/2656, 93.3%; AI-assisted: 249/270, 92.2%; P=.50) or successful follow-up rate (manual: 2301/2478, 92.9%; AI-assisted: 231/249, 92.8%; P=.96) between the two groups. The time spent on 100 patients in the manual follow-up group was about 9.3 hours. In contrast, the time spent on the AI-assisted follow-up was close to 0 hours. The feedback rate in the AI-assisted follow-up group was higher than that in the manual follow-up group (manual: 68/2656, 2.5%; AI-assisted: 28/270, 10.3%; P<.001). The composition of feedback was different in the two groups. Feedback from the AI-assisted follow-up group mainly included nursing, health education, and hospital environment content, while feedback from the manual follow-up group mostly included medical consultation content. Conclusions: The effectiveness of AI-assisted follow-up was not inferior to that of manual follow-up. Human resource costs are saved by AI. AI can help obtain comprehensive feedback from patients, although its depth and pertinence of communication need to be improved. %M 32452807 %R 10.2196/16896 %U http://www.jmir.org/2020/5/e16896/ %U https://doi.org/10.2196/16896 %U http://www.ncbi.nlm.nih.gov/pubmed/32452807 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 6 %N 1 %P e15859 %T Assessing Breast Cancer Survivors’ Perceptions of Using Voice-Activated Technology to Address Insomnia: Feasibility Study Featuring Focus Groups and In-Depth Interviews %A Arem,Hannah %A Scott,Remle %A Greenberg,Daniel %A Kaltman,Rebecca %A Lieberman,Daniel %A Lewin,Daniel %+ Department of Epidemiology, Milken Institute School of Public Health, George Washington University, 950 New Hampshire Ave NW, Rm 514, Washington, DC, 20052, United States, 1 2029944676, hannaharem@gwu.edu %K artificial intelligence %K breast neoplasms %K survivors %K insomnia %K cognitive behavioral therapy %K mobile phones %D 2020 %7 26.5.2020 %9 Original Paper %J JMIR Cancer %G English %X Background: Breast cancer survivors (BCSs) are a growing population with a higher prevalence of insomnia than women of the same age without a history of cancer. Cognitive behavioral therapy for insomnia (CBT-I) has been shown to be effective in this population, but it is not widely available to those who need it. Objective: This study aimed to better understand BCSs’ experiences with insomnia and to explore the feasibility and acceptability of delivering CBT-I using a virtual assistant (Amazon Alexa). Methods: We first conducted a formative phase with 2 focus groups and 3 in-depth interviews to understand BCSs’ perceptions of insomnia as well as their interest in and comfort with using a virtual assistant to learn about CBT-I. We then developed a prototype incorporating participant preferences and CBT-I components and demonstrated it in group and individual settings to BCSs to evaluate acceptability, interest, perceived feasibility, educational potential, and usability of the prototype. We also collected open-ended feedback on the content and used frequencies to describe the quantitative data. Results: We recruited 11 BCSs with insomnia in the formative phase and 14 BCSs in the prototype demonstration. In formative work, anxiety, fear, and hot flashes were identified as causes of insomnia. After prototype demonstration, nearly 79% (11/14) of participants reported an interest in and perceived feasibility of using the virtual assistant to record sleep patterns. Approximately two-thirds of the participants thought lifestyle modification (9/14, 64%) and sleep restriction (9/14, 64%) would be feasible and were interested in this feature of the program (10/14, 71% and 9/14, 64%, respectively). Relaxation exercises were rated as interesting and feasible using the virtual assistant by 71% (10/14) of the participants. Usability was rated as better than average, and all women reported that they would recommend the program to friends and family. Conclusions: This virtual assistant prototype delivering CBT-I components by using a smart speaker was rated as feasible and acceptable, suggesting that this prototype should be fully developed and tested for efficacy in the BCS population. If efficacy is shown in this population, the prototype should also be adapted for other high-risk populations. %M 32348274 %R 10.2196/15859 %U http://cancer.jmir.org/2020/1/e15859/ %U https://doi.org/10.2196/15859 %U http://www.ncbi.nlm.nih.gov/pubmed/32348274 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 5 %P e17647 %T Clinical Desire for an Artificial Intelligence–Based Surgical Assistant System: Electronic Survey–Based Study %A Park,Soo Jin %A Lee,Eun Ji %A Kim,Se Ik %A Kong,Seong-Ho %A Jeong,Chang Wook %A Kim,Hee Seung %+ Department of Obstetrics and Gynecology, Seoul National University College of Medicine, 101 Daehak-Ro, Jongno-Gu, Seoul, 03080, Republic of Korea, 82 02 2072 4863, bboddi0311@gmail.com %K artificial intelligence %K solo surgery %K laparoscopic surgery %D 2020 %7 15.5.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Techniques utilizing artificial intelligence (AI) are rapidly growing in medical research and development, especially in the operating room. However, the application of AI in the operating room has been limited to small tasks or software, such as clinical decision systems. It still largely depends on human resources and technology involving the surgeons’ hands. Therefore, we conceptualized AI-based solo surgery (AISS) defined as laparoscopic surgery conducted by only one surgeon with support from an AI-based surgical assistant system, and we performed an electronic survey on the clinical desire for such a system. Objective: This study aimed to evaluate the experiences of surgeons who have performed laparoscopic surgery, the limitations of conventional laparoscopic surgical systems, and the desire for an AI-based surgical assistant system for AISS. Methods: We performed an online survey for gynecologists, urologists, and general surgeons from June to August 2017. The questionnaire consisted of six items about experience, two about limitations, and five about the clinical desire for an AI-based surgical assistant system for AISS. Results: A total of 508 surgeons who have performed laparoscopic surgery responded to the survey. Most of the surgeons needed two or more assistants during laparoscopic surgery, and the rate was higher among gynecologists (251/278, 90.3%) than among general surgeons (123/173, 71.1%) and urologists (35/57, 61.4%). The majority of responders answered that the skillfulness of surgical assistants was “very important” or “important.” The most uncomfortable aspect of laparoscopic surgery was unskilled movement of the camera (431/508, 84.8%) and instruments (303/508, 59.6%). About 40% (199/508, 39.1%) of responders answered that the AI-based surgical assistant system could substitute 41%-60% of the current workforce, and 83.3% (423/508) showed willingness to buy the system. Furthermore, the most reasonable price was US $30,000-50,000. Conclusions: Surgeons who perform laparoscopic surgery may feel discomfort with the conventional laparoscopic surgical system in terms of assistant skillfulness, and they may think that the skillfulness of surgical assistants is essential. They desire to alleviate present inconveniences with the conventional laparoscopic surgical system and to perform a safe and comfortable operation by using an AI-based surgical assistant system for AISS. %M 32412421 %R 10.2196/17647 %U http://medinform.jmir.org/2020/5/e17647/ %U https://doi.org/10.2196/17647 %U http://www.ncbi.nlm.nih.gov/pubmed/32412421 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 5 %P e17620 %T Health Care Employees’ Perceptions of the Use of Artificial Intelligence Applications: Survey Study %A Abdullah,Rana %A Fakieh,Bahjat %+ Information Systems Department, King Abdulaziz University, Al-Solaimaniah District, Jeddah, 21589, Saudi Arabia, 966 126952000 ext 67438, bfakieh@kau.edu.sa %K artificial intelligence %K employees %K healthcare sector %K perception %K Saudi Arabia %D 2020 %7 14.5.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: The advancement of health care information technology and the emergence of artificial intelligence has yielded tools to improve the quality of various health care processes. Few studies have investigated employee perceptions of artificial intelligence implementation in Saudi Arabia and the Arabian world. In addition, limited studies investigated the effect of employee knowledge and job title on the perception of artificial intelligence implementation in the workplace. Objective: The aim of this study was to explore health care employee perceptions and attitudes toward the implementation of artificial intelligence technologies in health care institutions in Saudi Arabia. Methods: An online questionnaire was published, and responses were collected from 250 employees, including doctors, nurses, and technicians at 4 of the largest hospitals in Riyadh, Saudi Arabia. Results: The results of this study showed that 3.11 of 4 respondents feared artificial intelligence would replace employees and had a general lack of knowledge regarding artificial intelligence. In addition, most respondents were unaware of the advantages and most common challenges to artificial intelligence applications in the health sector, indicating a need for training. The results also showed that technicians were the most frequently impacted by artificial intelligence applications due to the nature of their jobs, which do not require much direct human interaction. Conclusions: The Saudi health care sector presents an advantageous market potential that should be attractive to researchers and developers of artificial intelligence solutions. %M 32406857 %R 10.2196/17620 %U http://www.jmir.org/2020/5/e17620/ %U https://doi.org/10.2196/17620 %U http://www.ncbi.nlm.nih.gov/pubmed/32406857 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 4 %P e17234 %T Identification of the Facial Features of Patients With Cancer: A Deep Learning–Based Pilot Study %A Liang,Bin %A Yang,Na %A He,Guosheng %A Huang,Peng %A Yang,Yong %+ Department of Radiation Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, 17 Panjiayuannanli Rd, Chaoyang District, Beijing, 100021, China, 86 1087788663, leangbin@gmail.com %K convolutional neural network %K facial features %K cancer patient %K deep learning %K cancer %D 2020 %7 29.4.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Cancer has become the second leading cause of death globally. Most cancer cases are due to genetic mutations, which affect metabolism and result in facial changes. Objective: In this study, we aimed to identify the facial features of patients with cancer using the deep learning technique. Methods: Images of faces of patients with cancer were collected to build the cancer face image data set. A face image data set of people without cancer was built by randomly selecting images from the publicly available MegaAge data set according to the sex and age distribution of the cancer face image data set. Each face image was preprocessed to obtain an upright centered face chip, following which the background was filtered out to exclude the effects of nonrelative factors. A residual neural network was constructed to classify cancer and noncancer cases. Transfer learning, minibatches, few epochs, L2 regulation, and random dropout training strategies were used to prevent overfitting. Moreover, guided gradient-weighted class activation mapping was used to reveal the relevant features. Results: A total of 8124 face images of patients with cancer (men: n=3851, 47.4%; women: n=4273, 52.6%) were collected from January 2018 to January 2019. The ages of the patients ranged from 1 year to 70 years (median age 52 years). The average faces of both male and female patients with cancer displayed more obvious facial adiposity than the average faces of people without cancer, which was supported by a landmark comparison. When testing the data set, the training process was terminated after 5 epochs. The area under the receiver operating characteristic curve was 0.94, and the accuracy rate was 0.82. The main relative feature of cancer cases was facial skin, while the relative features of noncancer cases were extracted from the complementary face region. Conclusions: In this study, we built a face data set of patients with cancer and constructed a deep learning model to classify the faces of people with and those without cancer. We found that facial skin and adiposity were closely related to the presence of cancer. %M 32347802 %R 10.2196/17234 %U http://www.jmir.org/2020/4/e17234/ %U https://doi.org/10.2196/17234 %U http://www.ncbi.nlm.nih.gov/pubmed/32347802 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 4 %P e17125 %T A Deep Artificial Neural Network−Based Model for Prediction of Underlying Cause of Death From Death Certificates: Algorithm Development and Validation %A Falissard,Louis %A Morgand,Claire %A Roussel,Sylvie %A Imbaud,Claire %A Ghosn,Walid %A Bounebache,Karim %A Rey,Grégoire %+ Inserm (Institut National de la Santé et de la Recherche Médicale) - CépiDc (Centre d'epidémiologie sur les causes médicales de Décès), 80 Rue du Général Leclerc, Le Kremlin Bicêtre, 94270, France, 33 679649178, louis.falissard@gmail.com %K machine learning %K deep learning %K mortality statistics %K underlying cause of death %D 2020 %7 28.4.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Coding of underlying causes of death from death certificates is a process that is nowadays undertaken mostly by humans with potential assistance from expert systems, such as the Iris software. It is, consequently, an expensive process that can, in addition, suffer from geospatial discrepancies, thus severely impairing the comparability of death statistics at the international level. The recent advances in artificial intelligence, specifically the rise of deep learning methods, has enabled computers to make efficient decisions on a number of complex problems that were typically considered out of reach without human assistance; they require a considerable amount of data to learn from, which is typically their main limiting factor. However, the CépiDc (Centre d’épidémiologie sur les causes médicales de Décès) stores an exhaustive database of death certificates at the French national scale, amounting to several millions of training examples available for the machine learning practitioner. Objective: This article investigates the application of deep neural network methods to coding underlying causes of death. Methods: The investigated dataset was based on data contained from every French death certificate from 2000 to 2015, containing information such as the subject’s age and gender, as well as the chain of events leading to his or her death, for a total of around 8 million observations. The task of automatically coding the subject’s underlying cause of death was then formulated as a predictive modelling problem. A deep neural network−based model was then designed and fit to the dataset. Its error rate was then assessed on an exterior test dataset and compared to the current state-of-the-art (ie, the Iris software). Statistical significance of the proposed approach’s superiority was assessed via bootstrap. Results: The proposed approach resulted in a test accuracy of 97.8% (95% CI 97.7-97.9), which constitutes a significant improvement over the current state-of-the-art and its accuracy of 74.5% (95% CI 74.0-75.0) assessed on the same test example. Such an improvement opens up a whole field of new applications, from nosologist-level batch-automated coding to international and temporal harmonization of cause of death statistics. A typical example of such an application is demonstrated by recoding French overdose-related deaths from 2000 to 2010. Conclusions: This article shows that deep artificial neural networks are perfectly suited to the analysis of electronic health records and can learn a complex set of medical rules directly from voluminous datasets, without any explicit prior knowledge. Although not entirely free from mistakes, the derived algorithm constitutes a powerful decision-making tool that is able to handle structured medical data with an unprecedented performance. We strongly believe that the methods developed in this article are highly reusable in a variety of settings related to epidemiology, biostatistics, and the medical sciences in general. %M 32343252 %R 10.2196/17125 %U http://medinform.jmir.org/2020/4/e17125/ %U https://doi.org/10.2196/17125 %U http://www.ncbi.nlm.nih.gov/pubmed/32343252 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 9 %N 4 %P e17490 %T Nursing in the Age of Artificial Intelligence: Protocol for a Scoping Review %A Buchanan,Christine %A Howitt,M Lyndsay %A Wilson,Rita %A Booth,Richard G %A Risling,Tracie %A Bamford,Megan %+ Registered Nurses' Association of Ontario, 158 Pearl Street, Toronto, ON, M5H 1L3, Canada, 1 800 268 7199 ext 281, cbuchanan@rnao.ca %K nursing %K artificial intelligence %K machine learning %K robotics %K compassionate care %K scoping review %D 2020 %7 16.4.2020 %9 Protocol %J JMIR Res Protoc %G English %X Background: It is predicted that digital health technologies that incorporate artificial intelligence will transform health care delivery in the next decade. Little research has explored how emerging trends in artificial intelligence–driven digital health technologies may influence the relationship between nurses and patients. Objective: The purpose of this scoping review is to summarize the findings from 4 research questions regarding emerging trends in artificial intelligence–driven digital health technologies and their influence on nursing practice across the 5 domains outlined by the Canadian Nurses Association framework: administration, clinical care, education, policy, and research. Specifically, this scoping review will examine how emerging trends will transform the roles and functions of nurses over the next 10 years and beyond. Methods: Using an established scoping review methodology, MEDLINE, Cumulative Index to Nursing and Allied Health Literature, Embase, PsycINFO, Cochrane Database of Systematic Reviews, Cochrane Central, Education Resources Information Centre, Scopus, Web of Science, and Proquest databases were searched. In addition to the electronic database searches, a targeted website search will be performed to access relevant grey literature. Abstracts and full-text studies will be independently screened by 2 reviewers using prespecified inclusion and exclusion criteria. Included literature will focus on nursing and digital health technologies that incorporate artificial intelligence. Data will be charted using a structured form and narratively summarized. Results: Electronic database searches have retrieved 10,318 results. The scoping review and subsequent briefing paper will be completed by the fall of 2020. Conclusions: A symposium will be held to share insights gained from this scoping review with key thought leaders and a cross section of stakeholders from administration, clinical care, education, policy, and research as well as patient advocates. The symposium will provide a forum to explore opportunities for action to advance the future of nursing in a technological world and, more specifically, nurses’ delivery of compassionate care in the age of artificial intelligence. Results from the symposium will be summarized in the form of a briefing paper and widely disseminated to relevant stakeholders. International Registered Report Identifier (IRRID): DERR1-10.2196/17490 %M 32297873 %R 10.2196/17490 %U http://www.researchprotocols.org/2020/4/e17490/ %U https://doi.org/10.2196/17490 %U http://www.ncbi.nlm.nih.gov/pubmed/32297873 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 4 %P e15876 %T Leveraging Eye Tracking to Prioritize Relevant Medical Record Data: Comparative Machine Learning Study %A King,Andrew J %A Cooper,Gregory F %A Clermont,Gilles %A Hochheiser,Harry %A Hauskrecht,Milos %A Sittig,Dean F %A Visweswaran,Shyam %+ Department of Biomedical Informatics, University of Pittsburgh, The Offices at Baum, 5607 Baum Blvd., Suite 523, Pittsburgh, PA, United States, 1 412 648 7119, shv3@pitt.edu %K electronic medical record system %K eye tracking %K machine learning %K intensive care unit %K information-seeking behavior %D 2020 %7 2.4.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Electronic medical record (EMR) systems capture large amounts of data per patient and present that data to physicians with little prioritization. Without prioritization, physicians must mentally identify and collate relevant data, an activity that can lead to cognitive overload. To mitigate cognitive overload, a Learning EMR (LEMR) system prioritizes the display of relevant medical record data. Relevant data are those that are pertinent to a context—defined as the combination of the user, clinical task, and patient case. To determine which data are relevant in a specific context, a LEMR system uses supervised machine learning models of physician information-seeking behavior. Since obtaining information-seeking behavior data via manual annotation is slow and expensive, automatic methods for capturing such data are needed. Objective: The goal of the research was to propose and evaluate eye tracking as a high-throughput method to automatically acquire physician information-seeking behavior useful for training models for a LEMR system. Methods: Critical care medicine physicians reviewed intensive care unit patient cases in an EMR interface developed for the study. Participants manually identified patient data that were relevant in the context of a clinical task: preparing a patient summary to present at morning rounds. We used eye tracking to capture each physician’s gaze dwell time on each data item (eg, blood glucose measurements). Manual annotations and gaze dwell times were used to define target variables for developing supervised machine learning models of physician information-seeking behavior. We compared the performance of manual selection and gaze-derived models on an independent set of patient cases. Results: A total of 68 pairs of manual selection and gaze-derived machine learning models were developed from training data and evaluated on an independent evaluation data set. A paired Wilcoxon signed-rank test showed similar performance of manual selection and gaze-derived models on area under the receiver operating characteristic curve (P=.40). Conclusions: We used eye tracking to automatically capture physician information-seeking behavior and used it to train models for a LEMR system. The models that were trained using eye tracking performed like models that were trained using manual annotations. These results support further development of eye tracking as a high-throughput method for training clinical decision support systems that prioritize the display of relevant medical record data. %M 32238342 %R 10.2196/15876 %U https://www.jmir.org/2020/4/e15876 %U https://doi.org/10.2196/15876 %U http://www.ncbi.nlm.nih.gov/pubmed/32238342 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 9 %N 1 %P e16606 %T Use of Artificial Intelligence for Medical Literature Search: Randomized Controlled Trial Using the Hackathon Format %A Schoeb,Dominik %A Suarez-Ibarrola,Rodrigo %A Hein,Simon %A Dressler,Franz Friedrich %A Adams,Fabian %A Schlager,Daniel %A Miernik,Arkadiusz %+ Medical Center – Department of Urology, Faculty of Medicine, University of Freiburg, , Freiburg, , Germany, 49 076127025823, dominik.stefan.schoeb@uniklinik-freiburg.de %K artificial intelligence %K literature review %K medical information technology %D 2020 %7 30.3.2020 %9 Original Paper %J Interact J Med Res %G English %X Background: Mapping out the research landscape around a project is often time consuming and difficult. Objective: This study evaluates a commercial artificial intelligence (AI) search engine (IRIS.AI) for its applicability in an automated literature search on a specific medical topic. Methods: To evaluate the AI search engine in a standardized manner, the concept of a science hackathon was applied. Three groups of researchers were tasked with performing a literature search on a clearly defined scientific project. All participants had a high level of expertise for this specific field of research. Two groups were given access to the AI search engine IRIS.AI. All groups were given the same amount of time for their search and were instructed to document their results. Search results were summarized and ranked according to a predetermined scoring system. Results: The final scoring awarded 49 and 39 points out of 60 to AI groups 1 and 2, respectively, and the control group received 46 points. A total of 20 scientific studies with high relevance were identified, and 5 highly relevant studies (“spot on”) were reported by each group. Conclusions: AI technology is a promising approach to facilitate literature searches and the management of medical libraries. In this study, however, the application of AI technology lead to a more focused literature search without a significant improvement in the number of results. %M 32224481 %R 10.2196/16606 %U http://www.i-jmr.org/2020/1/e16606/ %U https://doi.org/10.2196/16606 %U http://www.ncbi.nlm.nih.gov/pubmed/32224481 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 3 %P e16235 %T User Experiences of Social Support From Companion Chatbots in Everyday Contexts: Thematic Analysis %A Ta,Vivian %A Griffith,Caroline %A Boatfield,Carolynn %A Wang,Xinyu %A Civitello,Maria %A Bader,Haley %A DeCero,Esther %A Loggarakis,Alexia %+ Lake Forest College, 555 N Sheridan Rd, Lake Forest, IL, 60045, United States, 1 682 203 0820, vpta538@gmail.com %K artificial intelligence %K social support %K artificial agents %K chatbots %K interpersonal relations %D 2020 %7 6.3.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Previous research suggests that artificial agents may be a promising source of social support for humans. However, the bulk of this research has been conducted in the context of social support interventions that specifically address stressful situations or health improvements. Little research has examined social support received from artificial agents in everyday contexts. Objective: Considering that social support manifests in not only crises but also everyday situations and that everyday social support forms the basis of support received during more stressful events, we aimed to investigate the types of everyday social support that can be received from artificial agents. Methods: In Study 1, we examined publicly available user reviews (N=1854) of Replika, a popular companion chatbot. In Study 2, a sample (n=66) of Replika users provided detailed open-ended responses regarding their experiences of using Replika. We conducted thematic analysis on both datasets to gain insight into the kind of everyday social support that users receive through interactions with Replika. Results: Replika provides some level of companionship that can help curtail loneliness, provide a “safe space” in which users can discuss any topic without the fear of judgment or retaliation, increase positive affect through uplifting and nurturing messages, and provide helpful information/advice when normal sources of informational support are not available. Conclusions: Artificial agents may be a promising source of everyday social support, particularly companionship, emotional, informational, and appraisal support, but not as tangible support. Future studies are needed to determine who might benefit from these types of everyday social support the most and why. These results could potentially be used to help address global health issues or other crises early on in everyday situations before they potentially manifest into larger issues. %M 32141837 %R 10.2196/16235 %U http://www.jmir.org/2020/2/e16235/ %U https://doi.org/10.2196/16235 %U http://www.ncbi.nlm.nih.gov/pubmed/32141837 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 2 %P e16866 %T The Economic Impact of Artificial Intelligence in Health Care: Systematic Review %A Wolff,Justus %A Pauling,Josch %A Keck,Andreas %A Baumbach,Jan %+ TUM School of Life Sciences Weihenstephan, Technical University of Munich, Maximus-von-Imhof-Forum 3, Freising, 85354, Germany, 49 40329012 0, justus.wolff@syte-institute.com %K telemedicine %K artificial intelligence %K machine learning %K cost-benefit analysis %D 2020 %7 20.2.2020 %9 Review %J J Med Internet Res %G English %X Background: Positive economic impact is a key decision factor in making the case for or against investing in an artificial intelligence (AI) solution in the health care industry. It is most relevant for the care provider and insurer as well as for the pharmaceutical and medical technology sectors. Although the broad economic impact of digital health solutions in general has been assessed many times in literature and the benefit for patients and society has also been analyzed, the specific economic impact of AI in health care has been addressed only sporadically. Objective: This study aimed to systematically review and summarize the cost-effectiveness studies dedicated to AI in health care and to assess whether they meet the established quality criteria. Methods: In a first step, the quality criteria for economic impact studies were defined based on the established and adapted criteria schemes for cost impact assessments. In a second step, a systematic literature review based on qualitative and quantitative inclusion and exclusion criteria was conducted to identify relevant publications for an in-depth analysis of the economic impact assessment. In a final step, the quality of the identified economic impact studies was evaluated based on the defined quality criteria for cost-effectiveness studies. Results: Very few publications have thoroughly addressed the economic impact assessment, and the economic assessment quality of the reviewed publications on AI shows severe methodological deficits. Only 6 out of 66 publications could be included in the second step of the analysis based on the inclusion criteria. Out of these 6 studies, none comprised a methodologically complete cost impact analysis. There are two areas for improvement in future studies. First, the initial investment and operational costs for the AI infrastructure and service need to be included. Second, alternatives to achieve similar impact must be evaluated to provide a comprehensive comparison. Conclusions: This systematic literature analysis proved that the existing impact assessments show methodological deficits and that upcoming evaluations require more comprehensive economic analyses to enable economic decisions for or against implementing AI technology in health care. %M 32130134 %R 10.2196/16866 %U http://www.jmir.org/2020/2/e16866/ %U https://doi.org/10.2196/16866 %U http://www.ncbi.nlm.nih.gov/pubmed/32130134 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 2 %P e17061 %T Detection of Postictal Generalized Electroencephalogram Suppression: Random Forest Approach %A Li,Xiaojin %A Tao,Shiqiang %A Jamal-Omidi,Shirin %A Huang,Yan %A Lhatoo,Samden D %A Zhang,Guo-Qiang %A Cui,Licong %+ School of Biomedical Informatics, University of Texas Health Science Center, 7000 Fannin St, Houston, TX, 77030, United States, 1 7135003791, licong.cui@uth.tmc.edu %K epilepsy %K generalized tonic-clonic seizure %K postictal generalized EEG suppression %K EEG %K random forest %D 2020 %7 14.2.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Sudden unexpected death in epilepsy (SUDEP) is second only to stroke in neurological events resulting in years of potential life lost. Postictal generalized electroencephalogram (EEG) suppression (PGES) is a period of suppressed brain activity often occurring after generalized tonic-clonic seizure, a most significant risk factor for SUDEP. Therefore, PGES has been considered as a potential biomarker for SUDEP risk. Automatic PGES detection tools can address the limitations of labor-intensive, and sometimes inconsistent, visual analysis. A successful approach to automatic PGES detection must overcome computational challenges involved in the detection of subtle amplitude changes in EEG recordings, which may contain physiological and acquisition artifacts. Objective: This study aimed to present a random forest approach for automatic PGES detection using multichannel human EEG recordings acquired in epilepsy monitoring units. Methods: We used a combination of temporal, frequency, wavelet, and interchannel correlation features derived from EEG signals to train a random forest classifier. We also constructed and applied confidence-based correction rules based on PGES state changes. Motivated by practical utility, we introduced a new, time distance–based evaluation method for assessing the performance of PGES detection algorithms. Results: The time distance–based evaluation showed that our approach achieved a 5-second tolerance-based positive prediction rate of 0.95 for artifact-free signals. For signals with different artifact levels, our prediction rates varied from 0.68 to 0.81. Conclusions: We introduced a feature-based, random forest approach for automatic PGES detection using multichannel EEG recordings. Our approach achieved increasingly better time distance–based performance with reduced signal artifact levels. Further study is needed for PGES detection algorithms to perform well irrespective of the levels of signal artifacts. %M 32130173 %R 10.2196/17061 %U https://medinform.jmir.org/2020/2/e17061 %U https://doi.org/10.2196/17061 %U http://www.ncbi.nlm.nih.gov/pubmed/32130173 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 1 %P e15510 %T Longitudinal Risk Prediction of Chronic Kidney Disease in Diabetic Patients Using a Temporal-Enhanced Gradient Boosting Machine: Retrospective Cohort Study %A Song,Xing %A Waitman,Lemuel R %A Yu,Alan SL %A Robbins,David C %A Hu,Yong %A Liu,Mei %+ University of Kansas Medical Center, Department of Internal Medicine, Division of Medical Informatics, 3901 Rainbow Boulevard, Kansas City, KS, 66160, United States, 1 9139456446, meiliu@kumc.edu %K diabetic kidney disease %K diabetic nephropathy %K chronic kidney disease %K machine learning %D 2020 %7 31.1.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Artificial intelligence–enabled electronic health record (EHR) analysis can revolutionize medical practice from the diagnosis and prediction of complex diseases to making recommendations in patient care, especially for chronic conditions such as chronic kidney disease (CKD), which is one of the most frequent complications in patients with diabetes and is associated with substantial morbidity and mortality. Objective: The longitudinal prediction of health outcomes requires effective representation of temporal data in the EHR. In this study, we proposed a novel temporal-enhanced gradient boosting machine (GBM) model that dynamically updates and ensembles learners based on new events in patient timelines to improve the prediction accuracy of CKD among patients with diabetes. Methods: Using a broad spectrum of deidentified EHR data on a retrospective cohort of 14,039 adult patients with type 2 diabetes and GBM as the base learner, we validated our proposed Landmark-Boosting model against three state-of-the-art temporal models for rolling predictions of 1-year CKD risk. Results: The proposed model uniformly outperformed other models, achieving an area under receiver operating curve of 0.83 (95% CI 0.76-0.85), 0.78 (95% CI 0.75-0.82), and 0.82 (95% CI 0.78-0.86) in predicting CKD risk with automatic accumulation of new data in later years (years 2, 3, and 4 since diabetes mellitus onset, respectively). The Landmark-Boosting model also maintained the best calibration across moderate- and high-risk groups and over time. The experimental results demonstrated that the proposed temporal model can not only accurately predict 1-year CKD risk but also improve performance over time with additionally accumulated data, which is essential for clinical use to improve renal management of patients with diabetes. Conclusions: Incorporation of temporal information in EHR data can significantly improve predictive model performance and will particularly benefit patients who follow-up with their physicians as recommended. %M 32012067 %R 10.2196/15510 %U http://medinform.jmir.org/2020/1/e15510/ %U https://doi.org/10.2196/15510 %U http://www.ncbi.nlm.nih.gov/pubmed/32012067 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 1 %P e14679 %T Patient Perspectives on the Usefulness of an Artificial Intelligence–Assisted Symptom Checker: Cross-Sectional Survey Study %A Meyer,Ashley N D %A Giardina,Traber D %A Spitzmueller,Christiane %A Shahid,Umber %A Scott,Taylor M T %A Singh,Hardeep %+ Center for Innovations in Quality, Effectiveness and Safety, Michael E DeBakey Veterans Affairs Medical Center and Baylor College of Medicine, 2002 Holcombe Blvd #152, Houston, TX, United States, 1 7134404660, ameyer@bcm.edu %K clinical decision support systems %K technology %K diagnosis %K patient safety %K symptom checker %K computer-assisted diagnosis %D 2020 %7 30.1.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Patients are increasingly seeking Web-based symptom checkers to obtain diagnoses. However, little is known about the characteristics of the patients who use these resources, their rationale for use, and whether they find them accurate and useful. Objective: The study aimed to examine patients’ experiences using an artificial intelligence (AI)–assisted online symptom checker. Methods: An online survey was administered between March 2, 2018, through March 15, 2018, to US users of the Isabel Symptom Checker within 6 months of their use. User characteristics, experiences of symptom checker use, experiences discussing results with physicians, and prior personal history of experiencing a diagnostic error were collected. Results: A total of 329 usable responses was obtained. The mean respondent age was 48.0 (SD 16.7) years; most were women (230/304, 75.7%) and white (271/304, 89.1%). Patients most commonly used the symptom checker to better understand the causes of their symptoms (232/304, 76.3%), followed by for deciding whether to seek care (101/304, 33.2%) or where (eg, primary or urgent care: 63/304, 20.7%), obtaining medical advice without going to a doctor (48/304, 15.8%), and understanding their diagnoses better (39/304, 12.8%). Most patients reported receiving useful information for their health problems (274/304, 90.1%), with half reporting positive health effects (154/302, 51.0%). Most patients perceived it to be useful as a diagnostic tool (253/301, 84.1%), as a tool providing insights leading them closer to correct diagnoses (231/303, 76.2%), and reported they would use it again (278/304, 91.4%). Patients who discussed findings with their physicians (103/213, 48.4%) more often felt physicians were interested (42/103, 40.8%) than not interested in learning about the tool’s results (24/103, 23.3%) and more often felt physicians were open (62/103, 60.2%) than not open (21/103, 20.4%) to discussing the results. Compared with patients who had not previously experienced diagnostic errors (missed or delayed diagnoses: 123/304, 40.5%), patients who had previously experienced diagnostic errors (181/304, 59.5%) were more likely to use the symptom checker to determine where they should seek care (15/123, 12.2% vs 48/181, 26.5%; P=.002), but they less often felt that physicians were interested in discussing the tool’s results (20/34, 59% vs 22/69, 32%; P=.04). Conclusions: Despite ongoing concerns about symptom checker accuracy, a large patient-user group perceived an AI-assisted symptom checker as useful for diagnosis. Formal validation studies evaluating symptom checker accuracy and effectiveness in real-world practice could provide additional useful information about their benefit. %M 32012052 %R 10.2196/14679 %U http://www.jmir.org/2020/1/e14679/ %U https://doi.org/10.2196/14679 %U http://www.ncbi.nlm.nih.gov/pubmed/32012052 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 1 %P e15645 %T The Detection of Opioid Misuse and Heroin Use From Paramedic Response Documentation: Machine Learning for Improved Surveillance %A Prieto,José Tomás %A Scott,Kenneth %A McEwen,Dean %A Podewils,Laura J %A Al-Tayyib,Alia %A Robinson,James %A Edwards,David %A Foldy,Seth %A Shlay,Judith C %A Davidson,Arthur J %+ Division of Scientific Education and Professional Development, Centers for Disease Control and Prevention, 1600 Clifton Rd, Atlanta, GA, 30333, United States, 1 3036024487, josetomasprieto@gmail.com %K naloxone %K emergency medical services %K natural language processing %K heroin %K substance-related disorders %K opioid crisis %K artificial intelligence %D 2020 %7 3.1.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Timely, precise, and localized surveillance of nonfatal events is needed to improve response and prevention of opioid-related problems in an evolving opioid crisis in the United States. Records of naloxone administration found in prehospital emergency medical services (EMS) data have helped estimate opioid overdose incidence, including nonhospital, field-treated cases. However, as naloxone is often used by EMS personnel in unconsciousness of unknown cause, attributing naloxone administration to opioid misuse and heroin use (OM) may misclassify events. Better methods are needed to identify OM. Objective: This study aimed to develop and test a natural language processing method that would improve identification of potential OM from paramedic documentation. Methods: First, we searched Denver Health paramedic trip reports from August 2017 to April 2018 for keywords naloxone, heroin, and both combined, and we reviewed narratives of identified reports to determine whether they constituted true cases of OM. Then, we used this human classification as reference standard and trained 4 machine learning models (random forest, k-nearest neighbors, support vector machines, and L1-regularized logistic regression). We selected the algorithm that produced the highest area under the receiver operating curve (AUC) for model assessment. Finally, we compared positive predictive value (PPV) of the highest performing machine learning algorithm with PPV of searches of keywords naloxone, heroin, and combination of both in the binary classification of OM in unseen September 2018 data. Results: In total, 54,359 trip reports were filed from August 2017 to April 2018. Approximately 1.09% (594/54,359) indicated naloxone administration. Among trip reports with reviewer agreement regarding OM in the narrative, 57.6% (292/516) were considered to include information revealing OM. Approximately 1.63% (884/54,359) of all trip reports mentioned heroin in the narrative. Among trip reports with reviewer agreement, 95.5% (784/821) were considered to include information revealing OM. Combined results accounted for 2.39% (1298/54,359) of trip reports. Among trip reports with reviewer agreement, 77.79% (907/1166) were considered to include information consistent with OM. The reference standard used to train and test machine learning models included details of 1166 trip reports. L1-regularized logistic regression was the highest performing algorithm (AUC=0.94; 95% CI 0.91-0.97) in identifying OM. Tested on 5983 unseen reports from September 2018, the keyword naloxone inaccurately identified and underestimated probable OM trip report cases (63 cases; PPV=0.68). The keyword heroin yielded more cases with improved performance (129 cases; PPV=0.99). Combined keyword and L1-regularized logistic regression classifier further improved performance (146 cases; PPV=0.99). Conclusions: A machine learning application enhanced the effectiveness of finding OM among documented paramedic field responses. This approach to refining OM surveillance may lead to improved first-responder and public health responses toward prevention of overdoses and other opioid-related problems in US communities. %M 31899451 %R 10.2196/15645 %U https://www.jmir.org/2020/1/e15645 %U https://doi.org/10.2196/15645 %U http://www.ncbi.nlm.nih.gov/pubmed/31899451 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 8 %N 1 %P e13244 %T Applicability of the User Engagement Scale to Mobile Health: A Survey-Based Quantitative Study %A Holdener,Marianne %A Gut,Alain %A Angerer,Alfred %+ Winterthur Institute of Health Economics, School of Management and Law, Zurich University of Applied Sciences, Gertrudstrasse 15, Winterthur, 8401, Switzerland, 41 798145158, marianneholdener@bluemail.ch %K mobile health %K mhealth %K mobile apps %K user engagement %K measurement %K user engagement scale %K chatbot %D 2020 %7 3.1.2020 %9 Original Paper %J JMIR Mhealth Uhealth %G English %X Background: There has recently been exponential growth in the development and use of health apps on mobile phones. As with most mobile apps, however, the majority of users abandon them quickly and after minimal use. One of the most critical factors for the success of a health app is how to support users’ commitment to their health. Despite increased interest from researchers in mobile health, few studies have examined the measurement of user engagement with health apps. Objective: User engagement is a multidimensional, complex phenomenon. The aim of this study was to understand the concept of user engagement and, in particular, to demonstrate the applicability of a user engagement scale (UES) to mobile health apps. Methods: To determine the measurability of user engagement in a mobile health context, a UES was employed, which is a psychometric tool to measure user engagement with a digital system. This was adapted to Ada, developed by Ada Health, an artificial intelligence–powered personalized health guide that helps people understand their health. A principal component analysis (PCA) with varimax rotation was conducted on 30 items. In addition, sum scores as means of each subscale were calculated. Results: Survey data from 73 Ada users were analyzed. PCA was determined to be suitable, as verified by the sampling adequacy of Kaiser-Meyer-Olkin=0.858, a significant Bartlett test of sphericity (χ2300=1127.1; P<.001), and communalities mostly within the 0.7 range. Although 5 items had to be removed because of low factor loadings, the results of the remaining 25 items revealed 4 attributes: perceived usability, aesthetic appeal, reward, and focused attention. Ada users showed the highest engagement level with perceived usability, with a value of 294, followed by aesthetic appeal, reward, and focused attention. Conclusions: Although the UES was deployed in German and adapted to another digital domain, PCA yielded consistent subscales and a 4-factor structure. This indicates that user engagement with health apps can be assessed with the German version of the UES. These results can benefit related mobile health app engagement research and may be of importance to marketers and app developers. %M 31899454 %R 10.2196/13244 %U https://mhealth.jmir.org/2020/1/e13244 %U https://doi.org/10.2196/13244 %U http://www.ncbi.nlm.nih.gov/pubmed/31899454 %0 Journal Article %@ 2561-7605 %I JMIR Publications %V 2 %N 2 %P e15381 %T Exploring Older Adults’ Beliefs About the Use of Intelligent Assistants for Consumer Health Information Management: A Participatory Design Study %A Martin-Hammond,Aqueasha %A Vemireddy,Sravani %A Rao,Kartik %+ Department of Human-Centered Computing, School of Informatics and Computing, Indiana University-Purdue University Indianapolis, 535 West Michigan St, Indianapolis, IN, 46202, United States, 1 3172787686, aqumarti@iupui.edu %K intelligent assistants %K artificial intelligence %K chatbots %K conversational agents %K digital health %K elderly %K aging in place %K participatory design %K co-design %K health information seeking %D 2019 %7 11.12.2019 %9 Original Paper %J JMIR Aging %G English %X Background: Intelligent assistants (IAs), also known as intelligent agents, use artificial intelligence to help users achieve a goal or complete a task. IAs represent a potential solution for providing older adults with individualized assistance at home, for example, to reduce social isolation, serve as memory aids, or help with disease management. However, to design IAs for health that are beneficial and accepted by older adults, it is important to understand their beliefs about IAs, how they would like to interact with IAs for consumer health, and how they desire to integrate IAs into their homes. Objective: We explore older adults’ mental models and beliefs about IAs, the tasks they want IAs to support, and how they would like to interact with IAs for consumer health. For the purpose of this study, we focus on IAs in the context of consumer health information management and search. Methods: We present findings from an exploratory, qualitative study that investigated older adults’ perspectives of IAs that aid with consumer health information search and management tasks. Eighteen older adults participated in a multiphase, participatory design workshop in which we engaged them in discussion, brainstorming, and design activities that helped us identify their current challenges managing and finding health information at home. We also explored their beliefs and ideas for an IA to assist them with consumer health tasks. We used participatory design activities to identify areas in which they felt IAs might be useful, but also to uncover the reasoning behind the ideas they presented. Discussions were audio-recorded and later transcribed. We compiled design artifacts collected during the study to supplement researcher transcripts and notes. Thematic analysis was used to analyze data. Results: We found that participants saw IAs as potentially useful for providing recommendations, facilitating collaboration between themselves and other caregivers, and for alerts of serious illness. However, they also desired familiar and natural interactions with IAs (eg, using voice) that could, if need be, provide fluid and unconstrained interactions, reason about their symptoms, and provide information or advice. Other participants discussed the need for flexible IAs that could be used by those with low technical resources or skills. Conclusions: From our findings, we present a discussion of three key components of participants’ mental models, including the people, behaviors, and interactions they described that were important for IAs for consumer health information management and seeking. We then discuss the role of access, transparency, caregivers, and autonomy in design for addressing participants’ concerns about privacy and trust as well as its role in assisting others that may interact with an IA on the older adults’ behalf. International Registered Report Identifier (IRRID): RR2-10.1145/3240925.3240972 %M 31825322 %R 10.2196/15381 %U http://aging.jmir.org/2019/2/e15381/ %U https://doi.org/10.2196/15381 %U http://www.ncbi.nlm.nih.gov/pubmed/31825322 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 7 %N 4 %P e13430 %T Impact of Automatic Query Generation and Quality Recognition Using Deep Learning to Curate Evidence From Biomedical Literature: Empirical Study %A Afzal,Muhammad %A Hussain,Maqbool %A Malik,Khalid Mahmood %A Lee,Sungyoung %+ Department of Computer Science and Engineering, Kyung Hee University, Room 313, Yongin, 446-701, Republic of Korea, 82 312012514, sylee@oslab.khu.ac.kr %K data curation %K evidence-based medicine %K clinical decision support systems %K precision medicine %K biomedical research %K machine learning %K deep learning %D 2019 %7 9.12.2019 %9 Original Paper %J JMIR Med Inform %G English %X Background: The quality of health care is continuously improving and is expected to improve further because of the advancement of machine learning and knowledge-based techniques along with innovation and availability of wearable sensors. With these advancements, health care professionals are now becoming more interested and involved in seeking scientific research evidence from external sources for decision making relevant to medical diagnosis, treatments, and prognosis. Not much work has been done to develop methods for unobtrusive and seamless curation of data from the biomedical literature. Objective: This study aimed to design a framework that can enable bringing quality publications intelligently to the users’ desk to assist medical practitioners in answering clinical questions and fulfilling their informational needs. Methods: The proposed framework consists of methods for efficient biomedical literature curation, including the automatic construction of a well-built question, the recognition of evidence quality by proposing extended quality recognition model (E-QRM), and the ranking and summarization of the extracted evidence. Results: Unlike previous works, the proposed framework systematically integrates the echelons of biomedical literature curation by including methods for searching queries, content quality assessments, and ranking and summarization. Using an ensemble approach, our high-impact classifier E-QRM obtained significantly improved accuracy than the existing quality recognition model (1723/1894, 90.97% vs 1462/1894, 77.21%). Conclusions: Our proposed methods and evaluation demonstrate the validity and rigorousness of the results, which can be used in different applications, including evidence-based medicine, precision medicine, and medical education. %M 31815673 %R 10.2196/13430 %U http://medinform.jmir.org/2019/4/e13430/ %U https://doi.org/10.2196/13430 %U http://www.ncbi.nlm.nih.gov/pubmed/31815673 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 5 %N 2 %P e16048 %T Introducing Artificial Intelligence Training in Medical Education %A Paranjape,Ketan %A Schinkel,Michiel %A Nannan Panday,Rishi %A Car,Josip %A Nanayakkara,Prabath %+ Amsterdam University Medical Center, De Boelelaan 1117, 1081 HV, Amsterdam, Netherlands, 31 3174108035, ketanp@alumni.gsb.stanford.edu %K algorithm %K artificial intelligence %K black box %K deep learning %K machine learning %K medical education %K continuing education %K data sciences %K curriculum %D 2019 %7 3.12.2019 %9 Viewpoint %J JMIR Med Educ %G English %X Health care is evolving and with it the need to reform medical education. As the practice of medicine enters the age of artificial intelligence (AI), the use of data to improve clinical decision making will grow, pushing the need for skillful medicine-machine interaction. As the rate of medical knowledge grows, technologies such as AI are needed to enable health care professionals to effectively use this knowledge to practice medicine. Medical professionals need to be adequately trained in this new technology, its advantages to improve cost, quality, and access to health care, and its shortfalls such as transparency and liability. AI needs to be seamlessly integrated across different aspects of the curriculum. In this paper, we have addressed the state of medical education at present and have recommended a framework on how to evolve the medical education curriculum to include AI. %M 31793895 %R 10.2196/16048 %U http://mededu.jmir.org/2019/2/e16048/ %U https://doi.org/10.2196/16048 %U http://www.ncbi.nlm.nih.gov/pubmed/31793895 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 11 %P e15406 %T Artificial Intelligence Technologies for Coping with Alarm Fatigue in Hospital Environments Because of Sensory Overload: Algorithm Development and Validation %A Fernandes,Chrystinne Oliveira %A Miles,Simon %A Lucena,Carlos José Pereira De %A Cowan,Donald %+ Department of Informatics, Pontifical Catholic University of Rio de Janeiro, Rio Datacenter, 4th Fl, 225 Marquês de São Vicente St, Rio de Janeiro, 22451-900, Brazil, 55 21 3527 1510, chrystinne@gmail.com %K alert fatigue health personnel %K health information systems %K patient monitoring %K alert systems %K artificial intelligence %D 2019 %7 26.11.2019 %9 Original Paper %J J Med Internet Res %G English %X Background: Informed estimates claim that 80% to 99% of alarms set off in hospital units are false or clinically insignificant, representing a cacophony of sounds that do not present a real danger to patients. These false alarms can lead to an alert overload that causes a health care provider to miss important events that could be harmful or even life-threatening. As health care units become more dependent on monitoring devices for patient care purposes, the alarm fatigue issue has to be addressed as a major concern for the health care team as well as to enhance patient safety. Objective: The main goal of this paper was to propose a feasible solution for the alarm fatigue problem by using an automatic reasoning mechanism to decide how to notify members of the health care team. The aim was to reduce the number of notifications sent by determining whether or not to group a set of alarms that occur over a short period of time to deliver them together, without compromising patient safety. Methods: This paper describes: (1) a model for supporting reasoning algorithms that decide how to notify caregivers to avoid alarm fatigue; (2) an architecture for health systems that support patient monitoring and notification capabilities; and (3) a reasoning algorithm that specifies how to notify caregivers by deciding whether to aggregate a group of alarms to avoid alarm fatigue. Results: Experiments were used to demonstrate that providing a reasoning system can reduce the notifications received by the caregivers by up to 99.3% (582/586) of the total alarms generated. Our experiments were evaluated through the use of a dataset comprising patient monitoring data and vital signs recorded during 32 surgical cases where patients underwent anesthesia at the Royal Adelaide Hospital. We present the results of our algorithm by using graphs we generated using the R language, where we show whether the algorithm decided to deliver an alarm immediately or after a delay. Conclusions: The experimental results strongly suggest that this reasoning algorithm is a useful strategy for avoiding alarm fatigue. Although we evaluated our algorithm in an experimental environment, we tried to reproduce the context of a clinical environment by using real-world patient data. Our future work is to reproduce the evaluation study based on more realistic clinical conditions by increasing the number of patients, monitoring parameters, and types of alarm. %M 31769762 %R 10.2196/15406 %U http://www.jmir.org/2019/11/e15406/ %U https://doi.org/10.2196/15406 %U http://www.ncbi.nlm.nih.gov/pubmed/31769762 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 11 %P e16295 %T The Real Era of the Art of Medicine Begins with Artificial Intelligence %A Meskó,Bertalan %+ The Medical Futurist Institute, Povl Bang-Jensen u. 2/B1. 4/1, Budapest, 1118, Hungary, 36 703807260, berci@medicalfuturist.com %K future %K artificial intelligence %K digital health %K technology %K art of medicine %D 2019 %7 18.11.2019 %9 Viewpoint %J J Med Internet Res %G English %X Physicians have been performing the art of medicine for hundreds of years, and since the ancient era, patients have turned to physicians for help, advice, and cures. When the fathers of medicine started writing down their experience, knowledge, and observations, treating medical conditions became a structured process, with textbooks and professors sharing their methods over generations. After evidence-based medicine was established as the new form of medical science, the art and science of medicine had to be connected. As a result, by the end of the 20th century, health care had become highly dependent on technology. From electronic medical records, telemedicine, three-dimensional printing, algorithms, and sensors, technology has started to influence medical decisions and the lives of patients. While digital health technologies might be considered a threat to the art of medicine, I argue that advanced technologies, such as artificial intelligence, will initiate the real era of the art of medicine. Through the use of reinforcement learning, artificial intelligence could become the stethoscope of the 21st century. If we embrace these tools, the real art of medicine will begin now with the era of artificial intelligence. %M 31738169 %R 10.2196/16295 %U http://www.jmir.org/2019/11/e16295/ %U https://doi.org/10.2196/16295 %U http://www.ncbi.nlm.nih.gov/pubmed/31738169 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 8 %N 11 %P e14245 %T Real-Time Detection of Behavioral Anomalies of Older People Using Artificial Intelligence (The 3-PEGASE Study): Protocol for a Real-Life Prospective Trial %A Piau,Antoine %A Lepage,Benoit %A Bernon,Carole %A Gleizes,Marie-Pierre %A Nourhashemi,Fati %+ Gérontopôle, University Hospital of Toulouse, 4 Rue du Pont St Pierre, Toulouse, F-31400, France, 33 659561628, antoinepiau@hotmail.com %K frailty %K monitoring %K sensors %K artificial intelligence %K older adults %K participatory design %D 2019 %7 18.11.2019 %9 Protocol %J JMIR Res Protoc %G English %X Background: Most frail older persons are living at home, and we face difficulties in achieving seamless monitoring to detect adverse health changes. Even more important, this lack of follow-up could have a negative impact on the living choices made by older individuals and their care partners. People could give up their homes for the more reassuring environment of a medicalized living facility. We have developed a low-cost unobtrusive sensor-based solution to trigger automatic alerts in case of an acute event or subtle changes over time. It could facilitate older adults’ follow-up in their own homes, and thus support independent living. Objective: The primary objective of this prospective open-label study is to evaluate the relevance of the automatic alerts generated by our artificial intelligence–driven monitoring solution as judged by the recipients: older adults, caregivers, and professional support workers. The secondary objective is to evaluate its ability to detect subtle functional and cognitive decline and major medical events. Methods: The primary outcome will be evaluated for each successive 2-month follow-up period to estimate the progression of our learning algorithm performance over time. In total, 25 frail or disabled participants, aged 75 years and above and living alone in their own homes, will be enrolled for a 6-month follow-up period. Results: The first phase with 5 participants for a 4-month feasibility period has been completed and the expected completion date for the second phase of the study (20 participants for 6 months) is July 2020. Conclusions: The originality of our real-life project lies in the choice of the primary outcome and in our user-centered evaluation. We will evaluate the relevance of the alerts and the algorithm performance over time according to the end users. The first-line recipients of the information are the older adults and their care partners rather than health care professionals. Despite the fast pace of electronic health devices development, few studies have addressed the specific everyday needs of older adults and their families. Trial Registration: ClinicalTrials.gov NCT03484156; https://clinicaltrials.gov/ct2/show/NCT03484156 International Registered Report Identifier (IRRID): PRR1-10.2196/14245 %M 31738180 %R 10.2196/14245 %U http://www.researchprotocols.org/2019/11/e14245/ %U https://doi.org/10.2196/14245 %U http://www.ncbi.nlm.nih.gov/pubmed/31738180 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 11 %P e16607 %T Unlocking the Power of Artificial Intelligence and Big Data in Medicine %A Lovis,Christian %+ Division of Medical Information Sciences, University Hospitals of Geneva, Gabrielle Perret Gentil 4, Geneva, 1205, Switzerland, 41 22 37 26201, Christian.Lovis@hcuge.ch %K medical informatics %K artificial intelligence %K big data %D 2019 %7 8.11.2019 %9 Viewpoint %J J Med Internet Res %G English %X Data-driven science and its corollaries in machine learning and the wider field of artificial intelligence have the potential to drive important changes in medicine. However, medicine is not a science like any other: It is deeply and tightly bound with a large and wide network of legal, ethical, regulatory, economical, and societal dependencies. As a consequence, the scientific and technological progresses in handling information and its further processing and cross-linking for decision support and predictive systems must be accompanied by parallel changes in the global environment, with numerous stakeholders, including citizen and society. What can be seen at the first glance as a barrier and a mechanism slowing down the progression of data science must, however, be considered an important asset. Only global adoption can transform the potential of big data and artificial intelligence into an effective breakthroughs in handling health and medicine. This requires science and society, scientists and citizens, to progress together. %M 31702565 %R 10.2196/16607 %U https://www.jmir.org/2019/11/e16607 %U https://doi.org/10.2196/16607 %U http://www.ncbi.nlm.nih.gov/pubmed/31702565 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 11 %P e15360 %T The Personalization of Conversational Agents in Health Care: Systematic Review %A Kocaballi,Ahmet Baki %A Berkovsky,Shlomo %A Quiroz,Juan C %A Laranjo,Liliana %A Tong,Huong Ly %A Rezazadegan,Dana %A Briatore,Agustina %A Coiera,Enrico %+ Australian Institute of Health Innovation
, Faculty of Medicine and Health Sciences, Macquarie University, Level 6, 75 Talavera Road, Sydney, 2109, Australia, 61 298502465, baki.kocaballi@mq.edu.au %K conversational interfaces %K conversational agents %K dialogue systems %K personalization %K customization %K adaptive systems %K health care %D 2019 %7 7.11.2019 %9 Review %J J Med Internet Res %G English %X Background: The personalization of conversational agents with natural language user interfaces is seeing increasing use in health care applications, shaping the content, structure, or purpose of the dialogue between humans and conversational agents. Objective: The goal of this systematic review was to understand the ways in which personalization has been used with conversational agents in health care and characterize the methods of its implementation. Methods: We searched on PubMed, Embase, CINAHL, PsycInfo, and ACM Digital Library using a predefined search strategy. The studies were included if they: (1) were primary research studies that focused on consumers, caregivers, or health care professionals; (2) involved a conversational agent with an unconstrained natural language interface; (3) tested the system with human subjects; and (4) implemented personalization features. Results: The search found 1958 publications. After abstract and full-text screening, 13 studies were included in the review. Common examples of personalized content included feedback, daily health reports, alerts, warnings, and recommendations. The personalization features were implemented without a theoretical framework of customization and with limited evaluation of its impact. While conversational agents with personalization features were reported to improve user satisfaction, user engagement and dialogue quality, the role of personalization in improving health outcomes was not assessed directly. Conclusions: Most of the studies in our review implemented the personalization features without theoretical or evidence-based support for them and did not leverage the recent developments in other domains of personalization. Future research could incorporate personalization as a distinct design factor with a more careful consideration of its impact on health outcomes and its implications on patient safety, privacy, and decision-making. %M 31697237 %R 10.2196/15360 %U https://www.jmir.org/2019/11/e15360 %U https://doi.org/10.2196/15360 %U http://www.ncbi.nlm.nih.gov/pubmed/31697237 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 11 %P e15511 %T Modeling Research Topics for Artificial Intelligence Applications in Medicine: Latent Dirichlet Allocation Application Study %A Tran,Bach Xuan %A Nghiem,Son %A Sahin,Oz %A Vu,Tuan Manh %A Ha,Giang Hai %A Vu,Giang Thu %A Pham,Hai Quang %A Do,Hoa Thi %A Latkin,Carl A %A Tam,Wilson %A Ho,Cyrus S H %A Ho,Roger C M %+ Institute for Preventive Medicine and Public Health, Hanoi Medical University, No 1 Ton That Tung Street, Hanoi, 100000, Vietnam, 84 98 222 8662, bach.ipmph@gmail.com %K artificial intelligence %K applications %K medicine %K scientometric %K bibliometric %K latent Dirichlet allocation %D 2019 %7 1.11.2019 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI)–based technologies develop rapidly and have myriad applications in medicine and health care. However, there is a lack of comprehensive reporting on the productivity, workflow, topics, and research landscape of AI in this field. Objective: This study aimed to evaluate the global development of scientific publications and constructed interdisciplinary research topics on the theory and practice of AI in medicine from 1977 to 2018. Methods: We obtained bibliographic data and abstract contents of publications published between 1977 and 2018 from the Web of Science database. A total of 27,451 eligible articles were analyzed. Research topics were classified by latent Dirichlet allocation, and principal component analysis was used to identify the construct of the research landscape. Results: The applications of AI have mainly impacted clinical settings (enhanced prognosis and diagnosis, robot-assisted surgery, and rehabilitation), data science and precision medicine (collecting individual data for precision medicine), and policy making (raising ethical and legal issues, especially regarding privacy and confidentiality of data). However, AI applications have not been commonly used in resource-poor settings due to the limit in infrastructure and human resources. Conclusions: The application of AI in medicine has grown rapidly and focuses on three leading platforms: clinical practices, clinical material, and policies. AI might be one of the methods to narrow down the inequality in health care and medicine between developing and developed countries. Technology transfer and support from developed countries are essential measures for the advancement of AI application in health care in developing countries. %M 31682577 %R 10.2196/15511 %U https://www.jmir.org/2019/11/e15511 %U https://doi.org/10.2196/15511 %U http://www.ncbi.nlm.nih.gov/pubmed/31682577 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 7 %N 11 %P e14452 %T Development of a Deep Learning Model for Dynamic Forecasting of Blood Glucose Level for Type 2 Diabetes Mellitus: Secondary Analysis of a Randomized Controlled Trial %A Faruqui,Syed Hasib Akhter %A Du,Yan %A Meka,Rajitha %A Alaeddini,Adel %A Li,Chengdong %A Shirinkam,Sara %A Wang,Jing %+ Center on Smart and Connected Health Technologies, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Drive, San Antonio, TX, United States, 1 210 450 8561, wangj1@uthscsa.edu %K type 2 diabetes %K long short-term memory (LSTM)-based recurrent neural networks (RNNs) %K glucose level prediction %K mobile health lifestyle data %D 2019 %7 1.11.2019 %9 Original Paper %J JMIR Mhealth Uhealth %G English %X Background: Type 2 diabetes mellitus (T2DM) is a major public health burden. Self-management of diabetes including maintaining a healthy lifestyle is essential for glycemic control and to prevent diabetes complications. Mobile-based health data can play an important role in the forecasting of blood glucose levels for lifestyle management and control of T2DM. Objective: The objective of this work was to dynamically forecast daily glucose levels in patients with T2DM based on their daily mobile health lifestyle data including diet, physical activity, weight, and glucose level from the day before. Methods: We used data from 10 T2DM patients who were overweight or obese in a behavioral lifestyle intervention using mobile tools for daily monitoring of diet, physical activity, weight, and blood glucose over 6 months. We developed a deep learning model based on long short-term memory–based recurrent neural networks to forecast the next-day glucose levels in individual patients. The neural network used several layers of computational nodes to model how mobile health data (food intake including consumed calories, fat, and carbohydrates; exercise; and weight) were progressing from one day to another from noisy data. Results: The model was validated based on a data set of 10 patients who had been monitored daily for over 6 months. The proposed deep learning model demonstrated considerable accuracy in predicting the next day glucose level based on Clark Error Grid and ±10% range of the actual values. Conclusions: Using machine learning methodologies may leverage mobile health lifestyle data to develop effective individualized prediction plans for T2DM management. However, predicting future glucose levels is challenging as glucose level is determined by multiple factors. Future study with more rigorous study design is warranted to better predict future glucose levels for T2DM management. %M 31682586 %R 10.2196/14452 %U https://mhealth.jmir.org/2019/11/e14452 %U https://doi.org/10.2196/14452 %U http://www.ncbi.nlm.nih.gov/pubmed/31682586 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 7 %N 4 %P e15980 %T Cohort Selection for Clinical Trials From Longitudinal Patient Records: Text Mining Approach %A Spasic,Irena %A Krzeminski,Dominik %A Corcoran,Padraig %A Balinsky,Alexander %+ School of Computer Science & Informatics, Cardiff University, 5 The Parade, Cardiff, CF24 3AA, United Kingdom, 44 02920870320, spasici@cardiff.ac.uk %K natural language processing %K machine learning %K electronic medical records %K clinical trial %K eligibility determination %D 2019 %7 31.10.2019 %9 Original Paper %J JMIR Med Inform %G English %X Background: Clinical trials are an important step in introducing new interventions into clinical practice by generating data on their safety and efficacy. Clinical trials need to ensure that participants are similar so that the findings can be attributed to the interventions studied and not to some other factors. Therefore, each clinical trial defines eligibility criteria, which describe characteristics that must be shared by the participants. Unfortunately, the complexities of eligibility criteria may not allow them to be translated directly into readily executable database queries. Instead, they may require careful analysis of the narrative sections of medical records. Manual screening of medical records is time consuming, thus negatively affecting the timeliness of the recruitment process. Objective: Track 1 of the 2018 National Natural Language Processing Clinical Challenge focused on the task of cohort selection for clinical trials, aiming to answer the following question: Can natural language processing be applied to narrative medical records to identify patients who meet eligibility criteria for clinical trials? The task required the participating systems to analyze longitudinal patient records to determine if the corresponding patients met the given eligibility criteria. We aimed to describe a system developed to address this task. Methods: Our system consisted of 13 classifiers, one for each eligibility criterion. All classifiers used a bag-of-words document representation model. To prevent the loss of relevant contextual information associated with such representation, a pattern-matching approach was used to extract context-sensitive features. They were embedded back into the text as lexically distinguishable tokens, which were consequently featured in the bag-of-words representation. Supervised machine learning was chosen wherever a sufficient number of both positive and negative instances was available to learn from. A rule-based approach focusing on a small set of relevant features was chosen for the remaining criteria. Results: The system was evaluated using microaveraged F measure. Overall, 4 machine algorithms, including support vector machine, logistic regression, naïve Bayesian classifier, and gradient tree boosting (GTB), were evaluated on the training data using 10–fold cross-validation. Overall, GTB demonstrated the most consistent performance. Its performance peaked when oversampling was used to balance the training data. The final evaluation was performed on previously unseen test data. On average, the F measure of 89.04% was comparable to 3 of the top ranked performances in the shared task (91.11%, 90.28%, and 90.21%). With an F measure of 88.14%, we significantly outperformed these systems (81.03%, 78.50%, and 70.81%) in identifying patients with advanced coronary artery disease. Conclusions: The holdout evaluation provides evidence that our system was able to identify eligible patients for the given clinical trial with high accuracy. Our approach demonstrates how rule-based knowledge infusion can improve the performance of machine learning algorithms even when trained on a relatively small dataset. %M 31674914 %R 10.2196/15980 %U http://medinform.jmir.org/2019/4/e15980/ %U https://doi.org/10.2196/15980 %U http://www.ncbi.nlm.nih.gov/pubmed/31674914 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 10 %P e16222 %T Trust Me, I’m a Chatbot: How Artificial Intelligence in Health Care Fails the Turing Test %A Powell,John %+ Nuffield Department of Primary Care Health Sciences, Medical Sciences Division, University of Oxford, Radcliffe Observatory Quarter, 43 Woodstock Road, Oxford, OX2 6GG, United Kingdom, 44 1865617768 ext 617768, john.powell@phc.ox.ac.uk %K artificial intelligence %K machine learning %K medical informatics %K digital health %K ehealth %K chatbots %K conversational agents %D 2019 %7 28.10.2019 %9 Viewpoint %J J Med Internet Res %G English %X Over the next decade, one issue which will dominate sociotechnical studies in health informatics is the extent to which the promise of artificial intelligence in health care will be realized, along with the social and ethical issues which accompany it. A useful thought experiment is the application of the Turing test to user-facing artificial intelligence systems in health care (such as chatbots or conversational agents). In this paper I argue that many medical decisions require value judgements and the doctor-patient relationship requires empathy and understanding to arrive at a shared decision, often handling large areas of uncertainty and balancing competing risks. Arguably, medicine requires wisdom more than intelligence, artificial or otherwise. Artificial intelligence therefore needs to supplement rather than replace medical professionals, and identifying the complementary positioning of artificial intelligence in medical consultation is a key challenge for the future. In health care, artificial intelligence needs to pass the implementation game, not the imitation game. %M 31661083 %R 10.2196/16222 %U http://www.jmir.org/2019/10/e16222/ %U https://doi.org/10.2196/16222 %U http://www.ncbi.nlm.nih.gov/pubmed/31661083 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 6 %N 10 %P e14166 %T Conversational Agents in the Treatment of Mental Health Problems: Mixed-Method Systematic Review %A Gaffney,Hannah %A Mansell,Warren %A Tai,Sara %+ , School of Health Sciences, Faculty of Biology, Medicine and Health, University of Manchester, 2nd Floor, Zochonis Building, Manchester, M13 9PL, United Kingdom, 44 161 306 0400, hannah.gaffney-2@postgrad.manchester.ac.uk %K artificial intelligence %K mental health %K stress, pychological %K psychiatry %K therapy, computer-assisted %K conversational agent %K chatbot %K digital health %D 2019 %7 18.10.2019 %9 Review %J JMIR Ment Health %G English %X Background: The use of conversational agent interventions (including chatbots and robots) in mental health is growing at a fast pace. Recent existing reviews have focused exclusively on a subset of embodied conversational agent interventions despite other modalities aiming to achieve the common goal of improved mental health. Objective: This study aimed to review the use of conversational agent interventions in the treatment of mental health problems. Methods: We performed a systematic search using relevant databases (MEDLINE, EMBASE, PsycINFO, Web of Science, and Cochrane library). Studies that reported on an autonomous conversational agent that simulated conversation and reported on a mental health outcome were included. Results: A total of 13 studies were included in the review. Among them, 4 full-scale randomized controlled trials (RCTs) were included. The rest were feasibility, pilot RCTs and quasi-experimental studies. Interventions were diverse in design and targeted a range of mental health problems using a wide variety of therapeutic orientations. All included studies reported reductions in psychological distress postintervention. Furthermore, 5 controlled studies demonstrated significant reductions in psychological distress compared with inactive control groups. In addition, 3 controlled studies comparing interventions with active control groups failed to demonstrate superior effects. Broader utility in promoting well-being in nonclinical populations was unclear. Conclusions: The efficacy and acceptability of conversational agent interventions for mental health problems are promising. However, a more robust experimental design is required to demonstrate efficacy and efficiency. A focus on streamlining interventions, demonstrating equivalence to other treatment modalities, and elucidating mechanisms of action has the potential to increase acceptance by users and clinicians and maximize reach. %M 31628789 %R 10.2196/14166 %U https://mental.jmir.org/2019/10/e14166 %U https://doi.org/10.2196/14166 %U http://www.ncbi.nlm.nih.gov/pubmed/31628789 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 10 %P e14316 %T Psychosocial Factors Affecting Artificial Intelligence Adoption in Health Care in China: Cross-Sectional Study %A Ye,Tiantian %A Xue,Jiaolong %A He,Mingguang %A Gu,Jing %A Lin,Haotian %A Xu,Bin %A Cheng,Yu %+ Department of Medical Humanities, The Seventh Affiliated Hospital, Sun Yat-sen University, No 628, Zhenyuan Rd Guangming (New) Dist, Shenzhen, 518107, China, 86 02084114275, chengyu@mail.sysu.edu.cn %K artificial intelligence %K adoption %K technology acceptance model %K structural equation model %K intention %K subjective norms %K trust %K moderation %D 2019 %7 17.10.2019 %9 Original Paper %J J Med Internet Res %G English %X Background: Poor quality primary health care is a major issue in China, particularly in blindness prevention. Artificial intelligence (AI) could provide early screening and accurate auxiliary diagnosis to improve primary care services and reduce unnecessary referrals, but the application of AI in medical settings is still an emerging field. Objective: This study aimed to investigate the general public’s acceptance of ophthalmic AI devices, with reference to those already used in China, and the interrelated influencing factors that shape people’s intention to use these devices. Methods: We proposed a model of ophthalmic AI acceptance based on technology acceptance theories and variables from other health care–related studies. The model was verified via a 32-item questionnaire with 7-point Likert scales completed by 474 respondents (nationally random sampled). Structural equation modeling was used to evaluate item and construct reliability and validity via a confirmatory factor analysis, and the model’s path effects, significance, goodness of fit, and mediation and moderation effects were analyzed. Results: Standardized factor loadings of items were between 0.583 and 0.876. Composite reliability of 9 constructs ranged from 0.673 to 0.841. The discriminant validity of all constructs met the Fornell and Larcker criteria. Model fit indicators such as standardized root mean square residual (0.057), comparative fit index (0.915), and root mean squared error of approximation (0.049) demonstrated good fit. Intention to use (R2=0.515) is significantly affected by subjective norms (beta=.408; P<.001), perceived usefulness (beta=.336; P=.03), and resistance bias (beta=–.237; P=.02). Subjective norms and perceived behavior control had an indirect impact on intention to use through perceived usefulness and perceived ease of use. Eye health consciousness had an indirect positive effect on intention to use through perceived usefulness. Trust had a significant moderation effect (beta=–.095; P=.049) on the effect path of perceived usefulness to intention to use. Conclusions: The item, construct, and model indicators indicate reliable interpretation power and help explain the levels of public acceptance of ophthalmic AI devices in China. The influence of subjective norms can be linked to Confucian culture, collectivism, authoritarianism, and conformity mentality in China. Overall, the use of AI in diagnostics and clinical laboratory analysis is underdeveloped, and the Chinese public are generally mistrustful of medical staff and the Chinese medical system. Stakeholders such as doctors and AI suppliers should therefore avoid making misleading or over-exaggerated claims in the promotion of AI health care products. %M 31625950 %R 10.2196/14316 %U http://www.jmir.org/2019/10/e14316/ %U https://doi.org/10.2196/14316 %U http://www.ncbi.nlm.nih.gov/pubmed/31625950 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 7 %N 4 %P e14806 %T A Deep Learning Approach for Managing Medical Consumable Materials in Intensive Care Units via Convolutional Neural Networks: Technical Proof-of-Concept Study %A Peine,Arne %A Hallawa,Ahmed %A Schöffski,Oliver %A Dartmann,Guido %A Fazlic,Lejla Begic %A Schmeink,Anke %A Marx,Gernot %A Martin,Lukas %+ Department of Intensive Care Medicine and Intermediate Care, University Hospital Rheinisch-Westfälische Technische Hochschule Aachen, Pauwelsstr 30, Aachen, 52074, Germany, 49 241 800, apeine@ukaachen.de %K convolutional neural networks %K deep learning, critical care %K intensive care %K image recognition %K medical economics %K medical consumables %K artificial intelligence %K machine learning %D 2019 %7 10.10.2019 %9 Original Paper %J JMIR Med Inform %G English %X Background: High numbers of consumable medical materials (eg, sterile needles and swabs) are used during the daily routine of intensive care units (ICUs) worldwide. Although medical consumables largely contribute to total ICU hospital expenditure, many hospitals do not track the individual use of materials. Current tracking solutions meeting the specific requirements of the medical environment, like barcodes or radio frequency identification, require specialized material preparation and high infrastructure investment. This impedes the accurate prediction of consumption, leads to high storage maintenance costs caused by large inventories, and hinders scientific work due to inaccurate documentation. Thus, new cost-effective and contactless methods for object detection are urgently needed. Objective: The goal of this work was to develop and evaluate a contactless visual recognition system for tracking medical consumable materials in ICUs using a deep learning approach on a distributed client-server architecture. Methods: We developed Consumabot, a novel client-server optical recognition system for medical consumables, based on the convolutional neural network model MobileNet implemented in Tensorflow. The software was designed to run on single-board computer platforms as a detection unit. The system was trained to recognize 20 different materials in the ICU, while 100 sample images of each consumable material were provided. We assessed the top-1 recognition rates in the context of different real-world ICU settings: materials presented to the system without visual obstruction, 50% covered materials, and scenarios of multiple items. We further performed an analysis of variance with repeated measures to quantify the effect of adverse real-world circumstances. Results: Consumabot reached a >99% reliability of recognition after about 60 steps of training and 150 steps of validation. A desirable low cross entropy of <0.03 was reached for the training set after about 100 iteration steps and after 170 steps for the validation set. The system showed a high top-1 mean recognition accuracy in a real-world scenario of 0.85 (SD 0.11) for objects presented to the system without visual obstruction. Recognition accuracy was lower, but still acceptable, in scenarios where the objects were 50% covered (P<.001; mean recognition accuracy 0.71; SD 0.13) or multiple objects of the target group were present (P=.01; mean recognition accuracy 0.78; SD 0.11), compared to a nonobstructed view. The approach met the criteria of absence of explicit labeling (eg, barcodes, radio frequency labeling) while maintaining a high standard for quality and hygiene with minimal consumption of resources (eg, cost, time, training, and computational power). Conclusions: Using a convolutional neural network architecture, Consumabot consistently achieved good results in the classification of consumables and thus is a feasible way to recognize and register medical consumables directly to a hospital’s electronic health record. The system shows limitations when the materials are partially covered, therefore identifying characteristics of the consumables are not presented to the system. Further development of the assessment in different medical circumstances is needed. %M 31603430 %R 10.2196/14806 %U http://medinform.jmir.org/2019/4/e14806/ %U https://doi.org/10.2196/14806 %U http://www.ncbi.nlm.nih.gov/pubmed/31603430 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 7 %N 4 %P e14401 %T Characterizing Artificial Intelligence Applications in Cancer Research: A Latent Dirichlet Allocation Analysis %A Tran,Bach Xuan %A Latkin,Carl A %A Sharafeldin,Noha %A Nguyen,Katherina %A Vu,Giang Thu %A Tam,Wilson W S %A Cheung,Ngai-Man %A Nguyen,Huong Lan Thi %A Ho,Cyrus S H %A Ho,Roger C M %+ Institute for Preventive Medicine and Public Health, Hanoi Medical University, No 1 Ton That Tung Street, Hanoi, 100000, Vietnam, 84 982228662, bach.ipmph@gmail.com %K scientometrics %K cancer %K artificial intelligence %K global %K mapping %D 2019 %7 15.9.2019 %9 Original Paper %J JMIR Med Inform %G English %X Background: Artificial intelligence (AI)–based therapeutics, devices, and systems are vital innovations in cancer control; particularly, they allow for diagnosis, screening, precise estimation of survival, informing therapy selection, and scaling up treatment services in a timely manner. Objective: The aim of this study was to analyze the global trends, patterns, and development of interdisciplinary landscapes in AI and cancer research. Methods: An exploratory factor analysis was conducted to identify research domains emerging from abstract contents. The Jaccard similarity index was utilized to identify the most frequently co-occurring terms. Latent Dirichlet Allocation was used for classifying papers into corresponding topics. Results: From 1991 to 2018, the number of studies examining the application of AI in cancer care has grown to 3555 papers covering therapeutics, capacities, and factors associated with outcomes. Topics with the highest volume of publications include (1) machine learning, (2) comparative effectiveness evaluation of AI-assisted medical therapies, and (3) AI-based prediction. Noticeably, this classification has revealed topics examining the incremental effectiveness of AI applications, the quality of life, and functioning of patients receiving these innovations. The growing research productivity and expansion of multidisciplinary approaches are largely driven by machine learning, artificial neural networks, and AI in various clinical practices. Conclusions: The research landscapes show that the development of AI in cancer care is focused on not only improving prediction in cancer screening and AI-assisted therapeutics but also on improving other corresponding areas such as precision and personalized medicine and patient-reported outcomes. %M 31573929 %R 10.2196/14401 %U https://medinform.jmir.org/2019/4/e14401 %U https://doi.org/10.2196/14401 %U http://www.ncbi.nlm.nih.gov/pubmed/31573929 %0 Journal Article %@ 2369-1999 %I JMIR Publications %V 5 %N 2 %P e12163 %T Developing Machine Learning Algorithms for the Prediction of Early Death in Elderly Cancer Patients: Usability Study %A Sena,Gabrielle Ribeiro %A Lima,Tiago Pessoa Ferreira %A Mello,Maria Julia Gonçalves %A Thuler,Luiz Claudio Santos %A Lima,Jurema Telles Oliveira %+ Department of Geriatric Oncology, Instituto de Medicina Integral Prof Fernando Figueira, Rua dos Coelhos 300, Recife, 50070-902, Brazil, 55 81 21224100, gabriellesena8@gmail.com %K geriatric assessment %K aged %K machine learning %K medical oncology %K death %D 2019 %7 26.9.2019 %9 Original Paper %J JMIR Cancer %G English %X Background: The importance of classifying cancer patients into high- or low-risk groups has led many research teams, from the biomedical and bioinformatics fields, to study the application of machine learning (ML) algorithms. The International Society of Geriatric Oncology recommends the use of the comprehensive geriatric assessment (CGA), a multidisciplinary tool to evaluate health domains, for the follow-up of elderly cancer patients. However, no applications of ML have been proposed using CGA to classify elderly cancer patients. Objective: The aim of this study was to propose and develop predictive models, using ML and CGA, to estimate the risk of early death in elderly cancer patients. Methods: The ability of ML algorithms to predict early mortality in a cohort involving 608 elderly cancer patients was evaluated. The CGA was conducted during admission by a multidisciplinary team and included the following questionnaires: mini-mental state examination (MMSE), geriatric depression scale-short form, international physical activity questionnaire-short form, timed up and go, Katz index of independence in activities of daily living, Charlson comorbidity index, Karnofsky performance scale (KPS), polypharmacy, and mini nutritional assessment-short form (MNA-SF). The 10-fold cross-validation algorithm was used to evaluate all possible combinations of these questionnaires to estimate the risk of early death, considered when occurring within 6 months of diagnosis, in a variety of ML classifiers, including Naive Bayes (NB), decision tree algorithm J48 (J48), and multilayer perceptron (MLP). On each fold of evaluation, tiebreaking is handled by choosing the smallest set of questionnaires. Results: It was possible to select CGA questionnaire subsets with high predictive capacity for early death, which were either statistically similar (NB) or higher (J48 and MLP) when compared with the use of all questionnaires investigated. These results show that CGA questionnaire selection can improve accuracy rates and decrease the time spent to evaluate elderly cancer patients. Conclusions: A simplified predictive model aiming to estimate the risk of early death in elderly cancer patients is proposed herein, minimally composed by the MNA-SF and KPS. We strongly recommend that these questionnaires be incorporated into regular geriatric assessment of older patients with cancer. %M 31573896 %R 10.2196/12163 %U https://cancer.jmir.org/2019/2/e12163 %U https://doi.org/10.2196/12163 %U http://www.ncbi.nlm.nih.gov/pubmed/31573896 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 7 %N 3 %P e14830 %T Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)–Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study %A Li,Fei %A Jin,Yonghao %A Liu,Weisong %A Rawat,Bhanu Pratap Singh %A Cai,Pengshan %A Yu,Hong %+ Department of Computer Science, University of Massachusetts Lowell, 1 University Avenue, Lowell, MA,, United States, 1 978 934 6132, Hong_Yu@uml.edu %K natural language processing %K entity normalization %K deep learning %K electronic health record note %K BERT %D 2019 %7 12.09.2019 %9 Original Paper %J JMIR Med Inform %G English %X Background: The bidirectional encoder representations from transformers (BERT) model has achieved great success in many natural language processing (NLP) tasks, such as named entity recognition and question answering. However, little prior work has explored this model to be used for an important task in the biomedical and clinical domains, namely entity normalization. Objective: We aim to investigate the effectiveness of BERT-based models for biomedical or clinical entity normalization. In addition, our second objective is to investigate whether the domains of training data influence the performances of BERT-based models as well as the degree of influence. Methods: Our data was comprised of 1.5 million unlabeled electronic health record (EHR) notes. We first fine-tuned BioBERT on this large collection of unlabeled EHR notes. This generated our BERT-based model trained using 1.5 million electronic health record notes (EhrBERT). We then further fine-tuned EhrBERT, BioBERT, and BERT on three annotated corpora for biomedical and clinical entity normalization: the Medication, Indication, and Adverse Drug Events (MADE) 1.0 corpus, the National Center for Biotechnology Information (NCBI) disease corpus, and the Chemical-Disease Relations (CDR) corpus. We compared our models with two state-of-the-art normalization systems, namely MetaMap and disease name normalization (DNorm). Results: EhrBERT achieved 40.95% F1 in the MADE 1.0 corpus for mapping named entities to the Medical Dictionary for Regulatory Activities and the Systematized Nomenclature of Medicine—Clinical Terms (SNOMED-CT), which have about 380,000 terms. In this corpus, EhrBERT outperformed MetaMap by 2.36% in F1. For the NCBI disease corpus and CDR corpus, EhrBERT also outperformed DNorm by improving the F1 scores from 88.37% and 89.92% to 90.35% and 93.82%, respectively. Compared with BioBERT and BERT, EhrBERT outperformed them on the MADE 1.0 corpus and the CDR corpus. Conclusions: Our work shows that BERT-based models have achieved state-of-the-art performance for biomedical and clinical entity normalization. BERT-based models can be readily fine-tuned to normalize any kind of named entities. %M 31516126 %R 10.2196/14830 %U http://medinform.jmir.org/2019/3/e14830/ %U https://doi.org/10.2196/14830 %U http://www.ncbi.nlm.nih.gov/pubmed/31516126 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 7 %N 8 %P e11966 %T Deep Learning Intervention for Health Care Challenges: Some Biomedical Domain Considerations %A Tobore,Igbe %A Li,Jingzhen %A Yuhang,Liu %A Al-Handarish,Yousef %A Kandwal,Abhishek %A Nie,Zedong %A Wang,Lei %+ Center for Medical Robotics and Minimally Invasive Surgical Devices, Shenzhen Institutes of Advance Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University, Xili Town, Nanshan District, Shenzhen,, China, 86 755 86585213, zd.nie@siat.ac.cn %K machine learning %K deep learning %K big data %K mHealth %K medical imaging %K electronic health record %K biologicals %K biomedical %K ECG %K EEG %K artificial intelligence %D 2019 %7 02.08.2019 %9 Viewpoint %J JMIR Mhealth Uhealth %G English %X The use of deep learning (DL) for the analysis and diagnosis of biomedical and health care problems has received unprecedented attention in the last decade. The technique has recorded a number of achievements for unearthing meaningful features and accomplishing tasks that were hitherto difficult to solve by other methods and human experts. Currently, biological and medical devices, treatment, and applications are capable of generating large volumes of data in the form of images, sounds, text, graphs, and signals creating the concept of big data. The innovation of DL is a developing trend in the wake of big data for data representation and analysis. DL is a type of machine learning algorithm that has deeper (or more) hidden layers of similar function cascaded into the network and has the capability to make meaning from medical big data. Current transformation drivers to achieve personalized health care delivery will be possible with the use of mobile health (mHealth). DL can provide the analysis for the deluge of data generated from mHealth apps. This paper reviews the fundamentals of DL methods and presents a general view of the trends in DL by capturing literature from PubMed and the Institute of Electrical and Electronics Engineers database publications that implement different variants of DL. We highlight the implementation of DL in health care, which we categorize into biological system, electronic health record, medical image, and physiological signals. In addition, we discuss some inherent challenges of DL affecting biomedical and health domain, as well as prospective research directions that focus on improving health management by promoting the application of physiological signals and modern internet technology. %M 31376272 %R 10.2196/11966 %U https://mhealth.jmir.org/2019/8/e11966/ %U https://doi.org/10.2196/11966 %U http://www.ncbi.nlm.nih.gov/pubmed/31376272 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 7 %N 3 %P e14499 %T Projection Word Embedding Model With Hybrid Sampling Training for Classifying ICD-10-CM Codes: Longitudinal Observational Study %A Lin,Chin %A Lou,Yu-Sheng %A Tsai,Dung-Jang %A Lee,Chia-Cheng %A Hsu,Chia-Jung %A Wu,Ding-Chung %A Wang,Mei-Chuen %A Fang,Wen-Hui %+ Department of Family and Community Medicine, Tri-Service General Hospital, National Defense Medical Center, No. 325, Section 2, Chenggong Road, Neihu District, Taipei, 11490, Taiwan, 886 02 87923100 ext 18448, rumaf.fang@gmail.com %K word embedding %K convolutional neural network %K artificial intelligence %K natural language processing %K electronic health records %D 2019 %7 23.7.2019 %9 Original Paper %J JMIR Med Inform %G English %X Background: Most current state-of-the-art models for searching the International Classification of Diseases, Tenth Revision Clinical Modification (ICD-10-CM) codes use word embedding technology to capture useful semantic properties. However, they are limited by the quality of initial word embeddings. Word embedding trained by electronic health records (EHRs) is considered the best, but the vocabulary diversity is limited by previous medical records. Thus, we require a word embedding model that maintains the vocabulary diversity of open internet databases and the medical terminology understanding of EHRs. Moreover, we need to consider the particularity of the disease classification, wherein discharge notes present only positive disease descriptions. Objective: We aimed to propose a projection word2vec model and a hybrid sampling method. In addition, we aimed to conduct a series of experiments to validate the effectiveness of these methods. Methods: We compared the projection word2vec model and traditional word2vec model using two corpora sources: English Wikipedia and PubMed journal abstracts. We used seven published datasets to measure the medical semantic understanding of the word2vec models and used these embeddings to identify the three–character-level ICD-10-CM diagnostic codes in a set of discharge notes. On the basis of embedding technology improvement, we also tried to apply the hybrid sampling method to improve accuracy. The 94,483 labeled discharge notes from the Tri-Service General Hospital of Taipei, Taiwan, from June 1, 2015, to June 30, 2017, were used. To evaluate the model performance, 24,762 discharge notes from July 1, 2017, to December 31, 2017, from the same hospital were used. Moreover, 74,324 additional discharge notes collected from seven other hospitals were tested. The F-measure, which is the major global measure of effectiveness, was adopted. Results: In medical semantic understanding, the original EHR embeddings and PubMed embeddings exhibited superior performance to the original Wikipedia embeddings. After projection training technology was applied, the projection Wikipedia embeddings exhibited an obvious improvement but did not reach the level of original EHR embeddings or PubMed embeddings. In the subsequent ICD-10-CM coding experiment, the model that used both projection PubMed and Wikipedia embeddings had the highest testing mean F-measure (0.7362 and 0.6693 in Tri-Service General Hospital and the seven other hospitals, respectively). Moreover, the hybrid sampling method was found to improve the model performance (F-measure=0.7371/0.6698). Conclusions: The word embeddings trained using EHR and PubMed could understand medical semantics better, and the proposed projection word2vec model improved the ability of medical semantics extraction in Wikipedia embeddings. Although the improvement from the projection word2vec model in the real ICD-10-CM coding task was not substantial, the models could effectively handle emerging diseases. The proposed hybrid sampling method enables the model to behave like a human expert. %R 10.2196/14499 %U http://medinform.jmir.org/2019/3/e14499/ %U https://doi.org/10.2196/14499 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 7 %P e13659 %T Artificial Intelligence and the Implementation Challenge %A Shaw,James %A Rudzicz,Frank %A Jamieson,Trevor %A Goldfarb,Avi %+ Women's College Hospital, Institute for Health System Solutions and Virtual Care, 76 Grenville Street, Toronto, ON, M5G2A2, Canada, 1 4163236400, jay.shaw@wchospital.ca %K artificial intelligence %K machine learning %K implementation science %K ethics %D 2019 %7 10.07.2019 %9 Viewpoint %J J Med Internet Res %G English %X Background: Applications of artificial intelligence (AI) in health care have garnered much attention in recent years, but the implementation issues posed by AI have not been substantially addressed. Objective: In this paper, we have focused on machine learning (ML) as a form of AI and have provided a framework for thinking about use cases of ML in health care. We have structured our discussion of challenges in the implementation of ML in comparison with other technologies using the framework of Nonadoption, Abandonment, and Challenges to the Scale-Up, Spread, and Sustainability of Health and Care Technologies (NASSS). Methods: After providing an overview of AI technology, we describe use cases of ML as falling into the categories of decision support and automation. We suggest these use cases apply to clinical, operational, and epidemiological tasks and that the primary function of ML in health care in the near term will be decision support. We then outline unique implementation issues posed by ML initiatives in the categories addressed by the NASSS framework, specifically including meaningful decision support, explainability, privacy, consent, algorithmic bias, security, scalability, the role of corporations, and the changing nature of health care work. Results: Ultimately, we suggest that the future of ML in health care remains positive but uncertain, as support from patients, the public, and a wide range of health care stakeholders is necessary to enable its meaningful implementation. Conclusions: If the implementation science community is to facilitate the adoption of ML in ways that stand to generate widespread benefits, the issues raised in this paper will require substantial attention in the coming years. %M 31293245 %R 10.2196/13659 %U https://www.jmir.org/2019/7/e13659/ %U https://doi.org/10.2196/13659 %U http://www.ncbi.nlm.nih.gov/pubmed/31293245 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 7 %P e13664 %T Reducing Patient Loneliness With Artificial Agents: Design Insights From Evolutionary Neuropsychiatry %A Loveys,Kate %A Fricchione,Gregory %A Kolappa,Kavitha %A Sagar,Mark %A Broadbent,Elizabeth %+ Department of Psychological Medicine, The University of Auckland, Auckland City Hospital, Level 12 Support Building, 85 Park Road, Grafton, Auckland, 1023, New Zealand, 64 9 373 7599 ext 84340, k.loveys@auckland.ac.nz %K loneliness %K neuropsychiatry %K biological evolution %K psychological bonding %K interpersonal relations %K artificial intelligence %K social support %K eHealth %D 2019 %7 08.07.2019 %9 Viewpoint %J J Med Internet Res %G English %X Loneliness is a growing public health issue that substantially increases the risk of morbidity and mortality. Artificial agents, such as robots, embodied conversational agents, and chatbots, present an innovation in care delivery and have been shown to reduce patient loneliness by providing social support. However, similar to doctor and patient relationships, the quality of a patient’s relationship with an artificial agent can impact support effectiveness as well as care engagement. Incorporating mammalian attachment-building behavior in neural network processing as part of an agent’s capabilities may improve relationship quality and engagement between patients and artificial agents. We encourage developers of artificial agents intended to relieve patient loneliness to incorporate design insights from evolutionary neuropsychiatry. %M 31287067 %R 10.2196/13664 %U https://www.jmir.org/2019/7/e13664/ %U https://doi.org/10.2196/13664 %U http://www.ncbi.nlm.nih.gov/pubmed/31287067 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 5 %N 1 %P e13930 %T Applications and Challenges of Implementing Artificial Intelligence in Medical Education: Integrative Review %A Chan,Kai Siang %A Zary,Nabil %+ Mohammed Bin Rashid University of Medicine and Health Sciences, Building 14, Dubai Healthcare City, PO Box 505055, Dubai,, United Arab Emirates, 971 83571846, nabil.zary@icloud.com %K medical education %K evaluation of AIED systems %K real world applications of AIED systems %K artificial intelligence %D 2019 %7 15.6.2019 %9 Review %J JMIR Med Educ %G English %X Background: Since the advent of artificial intelligence (AI) in 1955, the applications of AI have increased over the years within a rapidly changing digital landscape where public expectations are on the rise, fed by social media, industry leaders, and medical practitioners. However, there has been little interest in AI in medical education until the last two decades, with only a recent increase in the number of publications and citations in the field. To our knowledge, thus far, a limited number of articles have discussed or reviewed the current use of AI in medical education. Objective: This study aims to review the current applications of AI in medical education as well as the challenges of implementing AI in medical education. Methods: Medline (Ovid), EBSCOhost Education Resources Information Center (ERIC) and Education Source, and Web of Science were searched with explicit inclusion and exclusion criteria. Full text of the selected articles was analyzed using the Extension of Technology Acceptance Model and the Diffusions of Innovations theory. Data were subsequently pooled together and analyzed quantitatively. Results: A total of 37 articles were identified. Three primary uses of AI in medical education were identified: learning support (n=32), assessment of students’ learning (n=4), and curriculum review (n=1). The main reasons for use of AI are its ability to provide feedback and a guided learning pathway and to decrease costs. Subgroup analysis revealed that medical undergraduates are the primary target audience for AI use. In addition, 34 articles described the challenges of AI implementation in medical education; two main reasons were identified: difficulty in assessing the effectiveness of AI in medical education and technical challenges while developing AI applications. Conclusions: The primary use of AI in medical education was for learning support mainly due to its ability to provide individualized feedback. Little emphasis was placed on curriculum review and assessment of students’ learning due to the lack of digitalization and sensitive nature of examinations, respectively. Big data manipulation also warrants the need to ensure data integrity. Methodological improvements are required to increase AI adoption by addressing the technical difficulties of creating an AI application and using novel methods to assess the effectiveness of AI. To better integrate AI into the medical profession, measures should be taken to introduce AI into the medical school curriculum for medical professionals to better understand AI algorithms and maximize its use. %M 31199295 %R 10.2196/13930 %U http://mededu.jmir.org/2019/1/e13930/ %U https://doi.org/10.2196/13930 %U http://www.ncbi.nlm.nih.gov/pubmed/31199295 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 5 %P e13216 %T Your Robot Therapist Will See You Now: Ethical Implications of Embodied Artificial Intelligence in Psychiatry, Psychology, and Psychotherapy %A Fiske,Amelia %A Henningsen,Peter %A Buyx,Alena %+ Institute for History and Ethics of Medicine, Technical University of Munich School of Medicine, Technical University of Munich, Ismaninger Straße 22, Munich, 81675, Germany, 49 8941404041, a.fiske@tum.de %K artificial intelligence %K robotics %K ethics %K psychiatry %K psychology %K psychotherapy %K medicine %D 2019 %7 09.05.2019 %9 Original Paper %J J Med Internet Res %G English %X Background: Research in embodied artificial intelligence (AI) has increasing clinical relevance for therapeutic applications in mental health services. With innovations ranging from ‘virtual psychotherapists’ to social robots in dementia care and autism disorder, to robots for sexual disorders, artificially intelligent virtual and robotic agents are increasingly taking on high-level therapeutic interventions that used to be offered exclusively by highly trained, skilled health professionals. In order to enable responsible clinical implementation, ethical and social implications of the increasing use of embodied AI in mental health need to be identified and addressed. Objective: This paper assesses the ethical and social implications of translating embodied AI applications into mental health care across the fields of Psychiatry, Psychology and Psychotherapy. Building on this analysis, it develops a set of preliminary recommendations on how to address ethical and social challenges in current and future applications of embodied AI. Methods: Based on a thematic literature search and established principles of medical ethics, an analysis of the ethical and social aspects of currently embodied AI applications was conducted across the fields of Psychiatry, Psychology, and Psychotherapy. To enable a comprehensive evaluation, the analysis was structured around the following three steps: assessment of potential benefits; analysis of overarching ethical issues and concerns; discussion of specific ethical and social issues of the interventions. Results: From an ethical perspective, important benefits of embodied AI applications in mental health include new modes of treatment, opportunities to engage hard-to-reach populations, better patient response, and freeing up time for physicians. Overarching ethical issues and concerns include: harm prevention and various questions of data ethics; a lack of guidance on development of AI applications, their clinical integration and training of health professionals; ‘gaps’ in ethical and regulatory frameworks; the potential for misuse including using the technologies to replace established services, thereby potentially exacerbating existing health inequalities. Specific challenges identified and discussed in the application of embodied AI include: matters of risk-assessment, referrals, and supervision; the need to respect and protect patient autonomy; the role of non-human therapy; transparency in the use of algorithms; and specific concerns regarding long-term effects of these applications on understandings of illness and the human condition. Conclusions: We argue that embodied AI is a promising approach across the field of mental health; however, further research is needed to address the broader ethical and societal concerns of these technologies to negotiate best research and medical practices in innovative mental health care. We conclude by indicating areas of future research and developing recommendations for high-priority areas in need of concrete ethical guidance. %M 31094356 %R 10.2196/13216 %U https://www.jmir.org/2019/5/e13216/ %U https://doi.org/10.2196/13216 %U http://www.ncbi.nlm.nih.gov/pubmed/31094356 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 5 %P e11030 %T Data-Driven Blood Glucose Pattern Classification and Anomalies Detection: Machine-Learning Applications in Type 1 Diabetes %A Woldaregay,Ashenafi Zebene %A Årsand,Eirik %A Botsis,Taxiarchis %A Albers,David %A Mamykina,Lena %A Hartvigsen,Gunnar %+ Department of Computer Science, University of Tromsø – The Arctic University of Norway, Realfagbygget, Hansine Hansens vei 54, Tromsø,, Norway, 47 77646444, ashenafi.z.woldaregay@uit.no %K type 1 diabetes %K blood glucose dynamics %K anomalies detection %K machine learning %D 2019 %7 01.05.2019 %9 Review %J J Med Internet Res %G English %X Background: Diabetes mellitus is a chronic metabolic disorder that results in abnormal blood glucose (BG) regulations. The BG level is preferably maintained close to normality through self-management practices, which involves actively tracking BG levels and taking proper actions including adjusting diet and insulin medications. BG anomalies could be defined as any undesirable reading because of either a precisely known reason (normal cause variation) or an unknown reason (special cause variation) to the patient. Recently, machine-learning applications have been widely introduced within diabetes research in general and BG anomaly detection in particular. However, irrespective of their expanding and increasing popularity, there is a lack of up-to-date reviews that materialize the current trends in modeling options and strategies for BG anomaly classification and detection in people with diabetes. Objective: This review aimed to identify, assess, and analyze the state-of-the-art machine-learning strategies and their hybrid systems focusing on BG anomaly classification and detection including glycemic variability (GV), hyperglycemia, and hypoglycemia in type 1 diabetes within the context of personalized decision support systems and BG alarm events applications, which are important constituents for optimal diabetes self-management. Methods: A rigorous literature search was conducted between September 1 and October 1, 2017, and October 15 and November 5, 2018, through various Web-based databases. Peer-reviewed journals and articles were considered. Information from the selected literature was extracted based on predefined categories, which were based on previous research and further elaborated through brainstorming. Results: The initial results were vetted using the title, abstract, and keywords and retrieved 496 papers. After a thorough assessment and screening, 47 articles remained, which were critically analyzed. The interrater agreement was measured using a Cohen kappa test, and disagreements were resolved through discussion. The state-of-the-art classes of machine learning have been developed and tested up to the task and achieved promising performance including artificial neural network, support vector machine, decision tree, genetic algorithm, Gaussian process regression, Bayesian neural network, deep belief network, and others. Conclusions: Despite the complexity of BG dynamics, there are many attempts to capture hypoglycemia and hyperglycemia incidences and the extent of an individual’s GV using different approaches. Recently, the advancement of diabetes technologies and continuous accumulation of self-collected health data have paved the way for popularity of machine learning in these tasks. According to the review, most of the identified studies used a theoretical threshold, which suffers from inter- and intrapatient variation. Therefore, future studies should consider the difference among patients and also track its temporal change over time. Moreover, studies should also give more emphasis on the types of inputs used and their associated time lag. Generally, we foresee that these developments might encourage researchers to further develop and test these systems on a large-scale basis. %M 31042157 %R 10.2196/11030 %U https://www.jmir.org/2019/5/e11030/ %U https://doi.org/10.2196/11030 %U http://www.ncbi.nlm.nih.gov/pubmed/31042157 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 7 %N 2 %P e13445 %T The Use of Artificially Intelligent Self-Diagnosing Digital Platforms by the General Public: Scoping Review %A Aboueid,Stephanie %A Liu,Rebecca H %A Desta,Binyam Negussie %A Chaurasia,Ashok %A Ebrahim,Shanil %+ Applied Health Sciences, University of Waterloo, 200 University Avenue West, Waterloo, ON, N2L 3G5, Canada, 1 6134061899, seaboueid@uwaterloo.ca %K diagnosis %K artificial intelligence %K symptom checkers %K diagnostic self evaluation %K self-care %D 2019 %7 01.05.2019 %9 Review %J JMIR Med Inform %G English %X Background: Self-diagnosis is the process of diagnosing or identifying a medical condition in oneself. Artificially intelligent digital platforms for self-diagnosis are becoming widely available and are used by the general public; however, little is known about the body of knowledge surrounding this technology. Objective: The objectives of this scoping review were to (1) systematically map the extent and nature of the literature and topic areas pertaining to digital platforms that use computerized algorithms to provide users with a list of potential diagnoses and (2) identify key knowledge gaps. Methods: The following databases were searched: PubMed (Medline), Scopus, Association for Computing Machinery Digital Library, Institute of Electrical and Electronics Engineers, Google Scholar, Open Grey, and ProQuest Dissertations and Theses. The search strategy was developed and refined with the assistance of a librarian and consisted of 3 main concepts: (1) self-diagnosis; (2) digital platforms; and (3) public or patients. The search generated 2536 articles from which 217 were duplicates. Following the Tricco et al 2018 checklist, 2 researchers screened the titles and abstracts (n=2316) and full texts (n=104), independently. A total of 19 articles were included for review, and data were retrieved following a data-charting form that was pretested by the research team. Results: The included articles were mainly conducted in the United States (n=10) or the United Kingdom (n=4). Among the articles, topic areas included accuracy or correspondence with a doctor’s diagnosis (n=6), commentaries (n=2), regulation (n=3), sociological (n=2), user experience (n=2), theoretical (n=1), privacy and security (n=1), ethical (n=1), and design (n=1). Individuals who do not have access to health care and perceive to have a stigmatizing condition are more likely to use this technology. The accuracy of this technology varied substantially based on the disease examined and platform used. Women and those with higher education were more likely to choose the right diagnosis out of the potential list of diagnoses. Regulation of this technology is lacking in most parts of the world; however, they are currently under development. Conclusions: There are prominent research gaps in the literature surrounding the use of artificially intelligent self-diagnosing digital platforms. Given the variety of digital platforms and the wide array of diseases they cover, measuring accuracy is cumbersome. More research is needed to understand the user experience and inform regulations. %M 31042151 %R 10.2196/13445 %U http://medinform.jmir.org/2019/2/e13445/ %U https://doi.org/10.2196/13445 %U http://www.ncbi.nlm.nih.gov/pubmed/31042151 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 4 %P e13822 %T Detecting Developmental Delay and Autism Through Machine Learning Models Using Home Videos of Bangladeshi Children: Development and Validation Study %A Tariq,Qandeel %A Fleming,Scott Lanyon %A Schwartz,Jessey Nicole %A Dunlap,Kaitlyn %A Corbin,Conor %A Washington,Peter %A Kalantarian,Haik %A Khan,Naila Z %A Darmstadt,Gary L %A Wall,Dennis Paul %+ Division of Systems Medicine, Department of Pediatrics, Stanford University, 1265 Welch Road, Palo Alto, CA, 94305, United States, 1 6173946031, dpwall@stanford.edu %K autism %K autism spectrum disorder %K machine learning %K developmental delays %K clinical resources %K Bangladesh %K Biomedical Data Science %D 2019 %7 24.04.2019 %9 Original Paper %J J Med Internet Res %G English %X Background: Autism spectrum disorder (ASD) is currently diagnosed using qualitative methods that measure between 20-100 behaviors, can span multiple appointments with trained clinicians, and take several hours to complete. In our previous work, we demonstrated the efficacy of machine learning classifiers to accelerate the process by collecting home videos of US-based children, identifying a reduced subset of behavioral features that are scored by untrained raters using a machine learning classifier to determine children’s “risk scores” for autism. We achieved an accuracy of 92% (95% CI 88%-97%) on US videos using a classifier built on five features. Objective: Using videos of Bangladeshi children collected from Dhaka Shishu Children’s Hospital, we aim to scale our pipeline to another culture and other developmental delays, including speech and language conditions. Methods: Although our previously published and validated pipeline and set of classifiers perform reasonably well on Bangladeshi videos (75% accuracy, 95% CI 71%-78%), this work improves on that accuracy through the development and application of a powerful new technique for adaptive aggregation of crowdsourced labels. We enhance both the utility and performance of our model by building two classification layers: The first layer distinguishes between typical and atypical behavior, and the second layer distinguishes between ASD and non-ASD. In each of the layers, we use a unique rater weighting scheme to aggregate classification scores from different raters based on their expertise. We also determine Shapley values for the most important features in the classifier to understand how the classifiers’ process aligns with clinical intuition. Results: Using these techniques, we achieved an accuracy (area under the curve [AUC]) of 76% (SD 3%) and sensitivity of 76% (SD 4%) for identifying atypical children from among developmentally delayed children, and an accuracy (AUC) of 85% (SD 5%) and sensitivity of 76% (SD 6%) for identifying children with ASD from those predicted to have other developmental delays. Conclusions: These results show promise for using a mobile video-based and machine learning–directed approach for early and remote detection of autism in Bangladeshi children. This strategy could provide important resources for developmental health in developing countries with few clinical resources for diagnosis, helping children get access to care at an early age. Future research aimed at extending the application of this approach to identify a range of other conditions and determine the population-level burden of developmental disabilities and impairments will be of high value. %M 31017583 %R 10.2196/13822 %U http://www.jmir.org/2019/4/e13822/ %U https://doi.org/10.2196/13822 %U http://www.ncbi.nlm.nih.gov/pubmed/31017583 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 4 %P e12887 %T Physicians’ Perceptions of Chatbots in Health Care: Cross-Sectional Web-Based Survey %A Palanica,Adam %A Flaschner,Peter %A Thommandram,Anirudh %A Li,Michael %A Fossat,Yan %+ Labs Department, Klick Health, Klick Inc, 175 Bloor St E, Suite 300, Toronto, ON, M4W 3R8, Canada, 1 416 214 4977, apalanica@klick.com %K physician satisfaction %K health care %K telemedicine %K mobile health %K health surveys %D 2019 %7 05.04.2019 %9 Original Paper %J J Med Internet Res %G English %X Background: Many potential benefits for the uses of chatbots within the context of health care have been theorized, such as improved patient education and treatment compliance. However, little is known about the perspectives of practicing medical physicians on the use of chatbots in health care, even though these individuals are the traditional benchmark of proper patient care. Objective: This study aimed to investigate the perceptions of physicians regarding the use of health care chatbots, including their benefits, challenges, and risks to patients. Methods: A total of 100 practicing physicians across the United States completed a Web-based, self-report survey to examine their opinions of chatbot technology in health care. Descriptive statistics and frequencies were used to examine the characteristics of participants. Results: A wide variety of positive and negative perspectives were reported on the use of health care chatbots, including the importance to patients for managing their own health and the benefits on physical, psychological, and behavioral health outcomes. More consistent agreement occurred with regard to administrative benefits associated with chatbots; many physicians believed that chatbots would be most beneficial for scheduling doctor appointments (78%, 78/100), locating health clinics (76%, 76/100), or providing medication information (71%, 71/100). Conversely, many physicians believed that chatbots cannot effectively care for all of the patients’ needs (76%, 76/100), cannot display human emotion (72%, 72/100), and cannot provide detailed diagnosis and treatment because of not knowing all of the personal factors associated with the patient (71%, 71/100). Many physicians also stated that health care chatbots could be a risk to patients if they self-diagnose too often (714%, 74/100) and do not accurately understand the diagnoses (74%, 74/100). Conclusions: Physicians believed in both costs and benefits associated with chatbots, depending on the logistics and specific roles of the technology. Chatbots may have a beneficial role to play in health care to support, motivate, and coach patients as well as for streamlining organizational tasks; in essence, chatbots could become a surrogate for nonmedical caregivers. However, concerns remain on the inability of chatbots to comprehend the emotional state of humans as well as in areas where expert medical knowledge and intelligence is required. %M 30950796 %R 10.2196/12887 %U https://www.jmir.org/2019/4/e12887/ %U https://doi.org/10.2196/12887 %U http://www.ncbi.nlm.nih.gov/pubmed/30950796 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 4 %P e12286 %T Applications of Machine Learning in Real-Life Digital Health Interventions: Review of the Literature %A Triantafyllidis,Andreas K %A Tsanas,Athanasios %+ Information Technologies Institute, Centre for Research and Technology Hellas, 6th km Charilaou-Thermi Rd, GR 57001 Thermi, Thessaloniki, 60361, Greece, 30 2310 498100, atriand@gmail.com %K machine learning %K data mining %K artificial intelligence %K digital health %K review %K telemedicine %D 2019 %7 05.04.2019 %9 Review %J J Med Internet Res %G English %X Background: Machine learning has attracted considerable research interest toward developing smart digital health interventions. These interventions have the potential to revolutionize health care and lead to substantial outcomes for patients and medical professionals. Objective: Our objective was to review the literature on applications of machine learning in real-life digital health interventions, aiming to improve the understanding of researchers, clinicians, engineers, and policy makers in developing robust and impactful data-driven interventions in the health care domain. Methods: We searched the PubMed and Scopus bibliographic databases with terms related to machine learning, to identify real-life studies of digital health interventions incorporating machine learning algorithms. We grouped those interventions according to their target (ie, target condition), study design, number of enrolled participants, follow-up duration, primary outcome and whether this had been statistically significant, machine learning algorithms used in the intervention, and outcome of the algorithms (eg, prediction). Results: Our literature search identified 8 interventions incorporating machine learning in a real-life research setting, of which 3 (37%) were evaluated in a randomized controlled trial and 5 (63%) in a pilot or experimental single-group study. The interventions targeted depression prediction and management, speech recognition for people with speech disabilities, self-efficacy for weight loss, detection of changes in biopsychosocial condition of patients with multiple morbidity, stress management, treatment of phantom limb pain, smoking cessation, and personalized nutrition based on glycemic response. The average number of enrolled participants in the studies was 71 (range 8-214), and the average follow-up study duration was 69 days (range 3-180). Of the 8 interventions, 6 (75%) showed statistical significance (at the P=.05 level) in health outcomes. Conclusions: This review found that digital health interventions incorporating machine learning algorithms in real-life studies can be useful and effective. Given the low number of studies identified in this review and that they did not follow a rigorous machine learning evaluation methodology, we urge the research community to conduct further studies in intervention settings following evaluation principles and demonstrating the potential of machine learning in clinical practice. %M 30950797 %R 10.2196/12286 %U https://www.jmir.org/2019/4/e12286/ %U https://doi.org/10.2196/12286 %U http://www.ncbi.nlm.nih.gov/pubmed/30950797 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 8 %N 2 %P e12100 %T Artificial Intelligence in Clinical Health Care Applications: Viewpoint %A van Hartskamp,Michael %A Consoli,Sergio %A Verhaegh,Wim %A Petkovic,Milan %A van de Stolpe,Anja %+ Philips Research, HTC11, p247, High Tech Campus, Eindhoven, 5656AE, Netherlands, 31 612784841, anja.van.de.stolpe@philips.com %K artificial intelligence %K deep learning %K clinical data %K Bayesian modeling %K medical informatics %D 2019 %7 05.04.2019 %9 Viewpoint %J Interact J Med Res %G English %X The idea of artificial intelligence (AI) has a long history. It turned out, however, that reaching intelligence at human levels is more complicated than originally anticipated. Currently, we are experiencing a renewed interest in AI, fueled by an enormous increase in computing power and an even larger increase in data, in combination with improved AI technologies like deep learning. Healthcare is considered the next domain to be revolutionized by artificial intelligence. While AI approaches are excellently suited to develop certain algorithms, for biomedical applications there are specific challenges. We propose six recommendations—the 6Rs—to improve AI projects in the biomedical space, especially clinical health care, and to facilitate communication between AI scientists and medical doctors: (1) Relevant and well-defined clinical question first; (2) Right data (ie, representative and of good quality); (3) Ratio between number of patients and their variables should fit the AI method; (4) Relationship between data and ground truth should be as direct and causal as possible; (5) Regulatory ready; enabling validation; and (6) Right AI method. %M 30950806 %R 10.2196/12100 %U https://www.i-jmr.org/2019/2/e12100/ %U https://doi.org/10.2196/12100 %U http://www.ncbi.nlm.nih.gov/pubmed/30950806 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 3 %P e12422 %T Physician Confidence in Artificial Intelligence: An Online Mobile Survey %A Oh,Songhee %A Kim,Jae Heon %A Choi,Sung-Woo %A Lee,Hee Jeong %A Hong,Jungrak %A Kwon,Soon Hyo %+ Division of Nephrology, Department of Internal Medicine, Soonchunhyang University Hospital, Daesagwanro 59 Youngsangu, Seoul, 04401, Republic of Korea, 82 01034413147, ksoonhyo@schmc.ac.kr %K artificial intelligence %K AI %K awareness %K physicians %D 2019 %7 25.03.2019 %9 Original Paper %J J Med Internet Res %G English %X Background: It is expected that artificial intelligence (AI) will be used extensively in the medical field in the future. Objective: The purpose of this study is to investigate the awareness of AI among Korean doctors and to assess physicians’ attitudes toward the medical application of AI. Methods: We conducted an online survey composed of 11 closed-ended questions using Google Forms. The survey consisted of questions regarding the recognition of and attitudes toward AI, the development direction of AI in medicine, and the possible risks of using AI in the medical field. Results: A total of 669 participants completed the survey. Only 40 (5.9%) answered that they had good familiarity with AI. However, most participants considered AI useful in the medical field (558/669, 83.4% agreement). The advantage of using AI was seen as the ability to analyze vast amounts of high-quality, clinically relevant data in real time. Respondents agreed that the area of medicine in which AI would be most useful is disease diagnosis (558/669, 83.4% agreement). One possible problem cited by the participants was that AI would not be able to assist in unexpected situations owing to inadequate information (196/669, 29.3%). Less than half of the participants(294/669, 43.9%) agreed that AI is diagnostically superior to human doctors. Only 237 (35.4%) answered that they agreed that AI could replace them in their jobs. Conclusions: This study suggests that Korean doctors and medical students have favorable attitudes toward AI in the medical field. The majority of physicians surveyed believed that AI will not replace their roles in the future. %M 30907742 %R 10.2196/12422 %U http://www.jmir.org/2019/3/e12422/ %U https://doi.org/10.2196/12422 %U http://www.ncbi.nlm.nih.gov/pubmed/30907742 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 3 %P e12802 %T Artificial Intelligence and the Future of Primary Care: Exploratory Qualitative Study of UK General Practitioners’ Views %A Blease,Charlotte %A Kaptchuk,Ted J %A Bernstein,Michael H %A Mandl,Kenneth D %A Halamka,John D %A DesRoches,Catherine M %+ General Medicine and Primary Care, Beth Israel Deaconess Medical Center, Harvard Medical School, 330 Brookline Avenue, Boston, MA, MA 02215, United States, 1 617 754 1457, cblease@bidmc.harvard.edu %K artificial intelligence %K attitudes %K future %K general practice %K machine learning %K opinions %K primary care %K qualitative research %K technology %D 2019 %7 20.03.2019 %9 Original Paper %J J Med Internet Res %G English %X Background: The potential for machine learning to disrupt the medical profession is the subject of ongoing debate within biomedical informatics and related fields. Objective: This study aimed to explore general practitioners’ (GPs’) opinions about the potential impact of future technology on key tasks in primary care. Methods: In June 2018, we conducted a Web-based survey of 720 UK GPs’ opinions about the likelihood of future technology to fully replace GPs in performing 6 key primary care tasks, and, if respondents considered replacement for a particular task likely, to estimate how soon the technological capacity might emerge. This study involved qualitative descriptive analysis of written responses (“comments”) to an open-ended question in the survey. Results: Comments were classified into 3 major categories in relation to primary care: (1) limitations of future technology, (2) potential benefits of future technology, and (3) social and ethical concerns. Perceived limitations included the beliefs that communication and empathy are exclusively human competencies; many GPs also considered clinical reasoning and the ability to provide value-based care as necessitating physicians’ judgments. Perceived benefits of technology included expectations about improved efficiencies, in particular with respect to the reduction of administrative burdens on physicians. Social and ethical concerns encompassed multiple, divergent themes including the need to train more doctors to overcome workforce shortfalls and misgivings about the acceptability of future technology to patients. However, some GPs believed that the failure to adopt technological innovations could incur harms to both patients and physicians. Conclusions: This study presents timely information on physicians’ views about the scope of artificial intelligence (AI) in primary care. Overwhelmingly, GPs considered the potential of AI to be limited. These views differ from the predictions of biomedical informaticians. More extensive, stand-alone qualitative work would provide a more in-depth understanding of GPs’ views. %M 30892270 %R 10.2196/12802 %U http://www.jmir.org/2019/3/e12802/ %U https://doi.org/10.2196/12802 %U http://www.ncbi.nlm.nih.gov/pubmed/30892270 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 3 %P e11990 %T Detecting Hypoglycemia Incidents Reported in Patients’ Secure Messages: Using Cost-Sensitive Learning and Oversampling to Reduce Data Imbalance %A Chen,Jinying %A Lalor,John %A Liu,Weisong %A Druhl,Emily %A Granillo,Edgard %A Vimalananda,Varsha G %A Yu,Hong %+ Department of Population and Quantitative Health Sciences, University of Massachusetts Medical School, Albert Sherman Center, 9th Floor, 368 Plantation Street, Worcester, MA, 01605, United States, 1 508 856 6063, jinying.chen@umassmed.edu %K secure messaging %K natural language processing %K hypoglycemia %K supervised machine learning %K imbalanced data %K adverse event detection %K drug-related side effects and adverse reactions %D 2019 %7 11.03.2019 %9 Original Paper %J J Med Internet Res %G English %X Background: Improper dosing of medications such as insulin can cause hypoglycemic episodes, which may lead to severe morbidity or even death. Although secure messaging was designed for exchanging nonurgent messages, patients sometimes report hypoglycemia events through secure messaging. Detecting these patient-reported adverse events may help alert clinical teams and enable early corrective actions to improve patient safety. Objective: We aimed to develop a natural language processing system, called HypoDetect (Hypoglycemia Detector), to automatically identify hypoglycemia incidents reported in patients’ secure messages. Methods: An expert in public health annotated 3000 secure message threads between patients with diabetes and US Department of Veterans Affairs clinical teams as containing patient-reported hypoglycemia incidents or not. A physician independently annotated 100 threads randomly selected from this dataset to determine interannotator agreement. We used this dataset to develop and evaluate HypoDetect. HypoDetect incorporates 3 machine learning algorithms widely used for text classification: linear support vector machines, random forest, and logistic regression. We explored different learning features, including new knowledge-driven features. Because only 114 (3.80%) messages were annotated as positive, we investigated cost-sensitive learning and oversampling methods to mitigate the challenge of imbalanced data. Results: The interannotator agreement was Cohen kappa=.976. Using cross-validation, logistic regression with cost-sensitive learning achieved the best performance (area under the receiver operating characteristic curve=0.954, sensitivity=0.693, specificity 0.974, F1 score=0.590). Cost-sensitive learning and the ensembled synthetic minority oversampling technique improved the sensitivity of the baseline systems substantially (by 0.123 to 0.728 absolute gains). Our results show that a variety of features contributed to the best performance of HypoDetect. Conclusions: Despite the challenge of data imbalance, HypoDetect achieved promising results for the task of detecting hypoglycemia incidents from secure messages. The system has a great potential to facilitate early detection and treatment of hypoglycemia. %M 30855231 %R 10.2196/11990 %U http://www.jmir.org/2019/3/e11990/ %U https://doi.org/10.2196/11990 %U http://www.ncbi.nlm.nih.gov/pubmed/30855231 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 7 %N 1 %P e10788 %T Detection of Bleeding Events in Electronic Health Record Notes Using Convolutional Neural Network Models Enhanced With Recurrent Neural Network Autoencoders: Deep Learning Approach %A Li,Rumeng %A Hu,Baotian %A Liu,Feifan %A Liu,Weisong %A Cunningham,Francesca %A McManus,David D %A Yu,Hong %+ Department of Computer Science, University of Massachusetts Lowell, 1 University Avenue, Lowell, MA, 01854, United States, 1 9789343620, hong_yu@uml.edu %K autoencoder %K BiLSTM %K bleeding %K convolutional neural networks %K electronic health record %D 2019 %7 08.02.2019 %9 Original Paper %J JMIR Med Inform %G English %X Background: Bleeding events are common and critical and may cause significant morbidity and mortality. High incidences of bleeding events are associated with cardiovascular disease in patients on anticoagulant therapy. Prompt and accurate detection of bleeding events is essential to prevent serious consequences. As bleeding events are often described in clinical notes, automatic detection of bleeding events from electronic health record (EHR) notes may improve drug-safety surveillance and pharmacovigilance. Objective: We aimed to develop a natural language processing (NLP) system to automatically classify whether an EHR note sentence contains a bleeding event. Methods: We expert annotated 878 EHR notes (76,577 sentences and 562,630 word-tokens) to identify bleeding events at the sentence level. This annotated corpus was used to train and validate our NLP systems. We developed an innovative hybrid convolutional neural network (CNN) and long short-term memory (LSTM) autoencoder (HCLA) model that integrates a CNN architecture with a bidirectional LSTM (BiLSTM) autoencoder model to leverage large unlabeled EHR data. Results: HCLA achieved the best area under the receiver operating characteristic curve (0.957) and F1 score (0.938) to identify whether a sentence contains a bleeding event, thereby surpassing the strong baseline support vector machines and other CNN and autoencoder models. Conclusions: By incorporating a supervised CNN model and a pretrained unsupervised BiLSTM autoencoder, the HCLA achieved high performance in detecting bleeding events. %M 30735140 %R 10.2196/10788 %U http://medinform.jmir.org/2019/1/e10788/ %U https://doi.org/10.2196/10788 %U http://www.ncbi.nlm.nih.gov/pubmed/30735140 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 5 %N 4 %P e64 %T Using Psychological Artificial Intelligence (Tess) to Relieve Symptoms of Depression and Anxiety: Randomized Controlled Trial %A Fulmer,Russell %A Joerin,Angela %A Gentile,Breanna %A Lakerink,Lysanne %A Rauws,Michiel %+ Northwestern University, 633 Clark Street, Evanston, IL, United States, 1 312 609 5300 ext 699, russell.fulmer@northwestern.edu %K artificial intelligence %K mental health services %K depression %K anxiety %K students %D 2018 %7 13.12.2018 %9 Original Paper %J JMIR Ment Health %G English %X Background: Students in need of mental health care face many barriers including cost, location, availability, and stigma. Studies show that computer-assisted therapy and 1 conversational chatbot delivering cognitive behavioral therapy (CBT) offer a less-intensive and more cost-effective alternative for treating depression and anxiety. Although CBT is one of the most effective treatment methods, applying an integrative approach has been linked to equally effective posttreatment improvement. Integrative psychological artificial intelligence (AI) offers a scalable solution as the demand for affordable, convenient, lasting, and secure support grows. Objective: This study aimed to assess the feasibility and efficacy of using an integrative psychological AI, Tess, to reduce self-identified symptoms of depression and anxiety in college students. Methods: In this randomized controlled trial, 75 participants were recruited from 15 universities across the United States. All participants completed Web-based surveys, including the Patient Health Questionnaire (PHQ-9), Generalized Anxiety Disorder Scale (GAD-7), and Positive and Negative Affect Scale (PANAS) at baseline and 2 to 4 weeks later (T2). The 2 test groups consisted of 50 participants in total and were randomized to receive unlimited access to Tess for either 2 weeks (n=24) or 4 weeks (n=26). The information-only control group participants (n=24) received an electronic link to the National Institute of Mental Health’s (NIMH) eBook on depression among college students and were only granted access to Tess after completion of the study. Results: A sample of 74 participants completed this study with 0% attrition from the test group and less than 1% attrition from the control group (1/24). The average age of participants was 22.9 years, with 70% of participants being female (52/74), mostly Asian (37/74, 51%), and white (32/74, 41%). Group 1 received unlimited access to Tess, with daily check-ins for 2 weeks. Group 2 received unlimited access to Tess with biweekly check-ins for 4 weeks. The information-only control group was provided with an electronic link to the NIMH’s eBook. Multivariate analysis of covariance was conducted. We used an alpha level of .05 for all statistical tests. Results revealed a statistically significant difference between the control group and group 1, such that group 1 reported a significant reduction in symptoms of depression as measured by the PHQ-9 (P=.03), whereas those in the control group did not. A statistically significant difference was found between the control group and both test groups 1 and 2 for symptoms of anxiety as measured by the GAD-7. Group 1 (P=.045) and group 2 (P=.02) reported a significant reduction in symptoms of anxiety, whereas the control group did not. A statistically significant difference was found on the PANAS between the control group and group 1 (P=.03) and suggests that Tess did impact scores. Conclusions: This study offers evidence that AI can serve as a cost-effective and accessible therapeutic agent. Although not designed to appropriate the role of a trained therapist, integrative psychological AI emerges as a feasible option for delivering support. Trial Registration: International Standard Randomized Controlled Trial Number: ISRCTN61214172; https://doi.org/10.1186/ISRCTN61214172. %M 30545815 %R 10.2196/mental.9782 %U http://mental.jmir.org/2018/4/e64/ %U https://doi.org/10.2196/mental.9782 %U http://www.ncbi.nlm.nih.gov/pubmed/30545815 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 6 %N 11 %P e198 %T The Perceived Benefits of an Artificial Intelligence–Embedded Mobile App Implementing Evidence-Based Guidelines for the Self-Management of Chronic Neck and Back Pain: Observational Study %A Lo,Wai Leung Ambrose %A Lei,Di %A Li,Le %A Huang,Dong Feng %A Tong,Kin-Fai %+ Guangdong Engineering and Technology Research Center for Rehabilitation Medicine and Translation, Department of Rehabilitation Medicine, The First Affiliated Hospital, Sun Yat-sen University, 58 Zhongshan Second Road, Guangzhou, 510000, China, 86 87332200 ext 8536, ambroselo0726@outlook.com %K low back pain %K neck pain %K mobile app %K exercise therapy %K mHealth %D 2018 %7 26.11.2018 %9 Original Paper %J JMIR Mhealth Uhealth %G English %X Background: Chronic musculoskeletal neck and back pain are disabling conditions among adults. Use of technology has been suggested as an alternative way to increase adherence to exercise therapy, which may improve clinical outcomes. Objective: The aim was to investigate the self-perceived benefits of an artificial intelligence (AI)–embedded mobile app to self-manage chronic neck and back pain. Methods: A total of 161 participants responded to the invitation. The evaluation questionnaire included 14 questions that were intended to explore if using the AI rehabilitation system may (1) increase time spent on therapeutic exercise, (2) affect pain level (assessed by the 0-10 Numerical Pain Rating Scale), and (3) reduce the need for other interventions. Results: An increase in time spent on therapeutic exercise per day was observed. The median Numerical Pain Rating Scale scores were 6 (interquartile range [IQR] 5-8) before and 4 (IQR 3-6) after using the AI-embedded mobile app (95% CI 1.18-1.81). A 3-point reduction was reported by the participants who used the AI-embedded mobile app for more than 6 months. Reduction in the usage of other interventions while using the AI-embedded mobile app was also reported. Conclusions: This study demonstrated the positive self-perceived beneficiary effect of using the AI-embedded mobile app to provide a personalized therapeutic exercise program. The positive results suggest that it at least warrants further study to investigate the physiological effect of the AI-embedded mobile app and how it compares with routine clinical care. %M 30478019 %R 10.2196/mhealth.8127 %U http://mhealth.jmir.org/2018/11/e198/ %U https://doi.org/10.2196/mhealth.8127 %U http://www.ncbi.nlm.nih.gov/pubmed/30478019 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 20 %N 9 %P e11510 %T Patient and Consumer Safety Risks When Using Conversational Assistants for Medical Information: An Observational Study of Siri, Alexa, and Google Assistant %A Bickmore,Timothy W %A Trinh,Ha %A Olafsson,Stefan %A O'Leary,Teresa K %A Asadi,Reza %A Rickles,Nathaniel M %A Cruz,Ricardo %+ College of Computer and Information Science, Northeastern University, 910-177, 360 Huntington Avenue, Boston, MA, 02115, United States, 1 6173735477, bickmore@ccs.neu.edu %K conversational assistant %K conversational interface %K dialogue system %K medical error %K patient safety %D 2018 %7 04.09.2018 %9 Original Paper %J J Med Internet Res %G English %X Background: Conversational assistants, such as Siri, Alexa, and Google Assistant, are ubiquitous and are beginning to be used as portals for medical services. However, the potential safety issues of using conversational assistants for medical information by patients and consumers are not understood. Objective: To determine the prevalence and nature of the harm that could result from patients or consumers using conversational assistants for medical information. Methods: Participants were given medical problems to pose to Siri, Alexa, or Google Assistant, and asked to determine an action to take based on information from the system. Assignment of tasks and systems were randomized across participants, and participants queried the conversational assistants in their own words, making as many attempts as needed until they either reported an action to take or gave up. Participant-reported actions for each medical task were rated for patient harm using an Agency for Healthcare Research and Quality harm scale. Results: Fifty-four subjects completed the study with a mean age of 42 years (SD 18). Twenty-nine (54%) were female, 31 (57%) Caucasian, and 26 (50%) were college educated. Only 8 (15%) reported using a conversational assistant regularly, while 22 (41%) had never used one, and 24 (44%) had tried one “a few times.“ Forty-four (82%) used computers regularly. Subjects were only able to complete 168 (43%) of their 394 tasks. Of these, 49 (29%) reported actions that could have resulted in some degree of patient harm, including 27 (16%) that could have resulted in death. Conclusions: Reliance on conversational assistants for actionable medical information represents a safety risk for patients and consumers. Patients should be cautioned to not use these technologies for answers to medical questions they intend to act on without further consultation from a health care provider. %M 30181110 %R 10.2196/11510 %U http://www.jmir.org/2018/9/e11510/ %U https://doi.org/10.2196/11510 %U http://www.ncbi.nlm.nih.gov/pubmed/30181110 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 5 %N 3 %P e10454 %T An Embodied Conversational Agent for Unguided Internet-Based Cognitive Behavior Therapy in Preventative Mental Health: Feasibility and Acceptability Pilot Trial %A Suganuma,Shinichiro %A Sakamoto,Daisuke %A Shimoyama,Haruhiko %+ Department of Clinical Psychology, Graduate School of Education, The University of Tokyo, , Tokyo,, Japan, 81 3 5841 8068, sgnm.shin@gmail.com %K embodied conversational agent %K cognitive behavioral therapy %K psychological distress %K mental well‐being %K artificial intelligence technology %D 2018 %7 31.07.2018 %9 Original Paper %J JMIR Ment Health %G English %X Background: Recent years have seen an increase in the use of internet-based cognitive behavioral therapy in the area of mental health. Although lower effectiveness and higher dropout rates of unguided than those of guided internet-based cognitive behavioral therapy remain critical issues, not incurring ongoing human clinical resources makes it highly advantageous. Objective: Current research in psychotherapy, which acknowledges the importance of therapeutic alliance, aims to evaluate the feasibility and acceptability, in terms of mental health, of an application that is embodied with a conversational agent. This application was enabled for use as an internet-based cognitive behavioral therapy preventative mental health measure. Methods: Analysis of the data from the 191 participants of the experimental group with a mean age of 38.07 (SD 10.75) years and the 263 participants of the control group with a mean age of 38.05 (SD 13.45) years using a 2-way factorial analysis of variance (group × time) was performed. Results: There was a significant main effect (P=.02) and interaction for time on the variable of positive mental health (P=.02), and for the treatment group, a significant simple main effect was also found (P=.002). In addition, there was a significant main effect (P=.02) and interaction for time on the variable of negative mental health (P=.005), and for the treatment group, a significant simple main effect was also found (P=.001). Conclusions: This research can be seen to represent a certain level of evidence for the mental health application developed herein, indicating empirically that internet-based cognitive behavioral therapy with the embodied conversational agent can be used in mental health care. In the pilot trial, given the issues related to feasibility and acceptability, it is necessary to pursue higher quality evidence while continuing to further improve the application, based on the findings of the current research. %M 30064969 %R 10.2196/10454 %U http://mental.jmir.org/2018/3/e10454/ %U https://doi.org/10.2196/10454 %U http://www.ncbi.nlm.nih.gov/pubmed/30064969 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 20 %N 6 %P e10148 %T Towards an Artificially Empathic Conversational Agent for Mental Health Applications: System Design and User Perceptions %A Morris,Robert R %A Kouddous,Kareem %A Kshirsagar,Rohan %A Schueller,Stephen M %+ Koko, 155 Rivington St, New York, NY, 10002, United States, 1 617 851 4967, rob@koko.ai %K conversational agents %K mental health %K empathy %K crowdsourcing %K peer support %D 2018 %7 26.06.2018 %9 Original Paper %J J Med Internet Res %G English %X Background: Conversational agents cannot yet express empathy in nuanced ways that account for the unique circumstances of the user. Agents that possess this faculty could be used to enhance digital mental health interventions. Objective: We sought to design a conversational agent that could express empathic support in ways that might approach, or even match, human capabilities. Another aim was to assess how users might appraise such a system. Methods: Our system used a corpus-based approach to simulate expressed empathy. Responses from an existing pool of online peer support data were repurposed by the agent and presented to the user. Information retrieval techniques and word embeddings were used to select historical responses that best matched a user’s concerns. We collected ratings from 37,169 users to evaluate the system. Additionally, we conducted a controlled experiment (N=1284) to test whether the alleged source of a response (human or machine) might change user perceptions. Results: The majority of responses created by the agent (2986/3770, 79.20%) were deemed acceptable by users. However, users significantly preferred the efforts of their peers (P<.001). This effect was maintained in a controlled study (P=.02), even when the only difference in responses was whether they were framed as coming from a human or a machine. Conclusions: Our system illustrates a novel way for machines to construct nuanced and personalized empathic utterances. However, the design had significant limitations and further research is needed to make this approach viable. Our controlled study suggests that even in ideal conditions, nonhuman agents may struggle to express empathy as well as humans. The ethical implications of empathic agents, as well as their potential iatrogenic effects, are also discussed. %M 29945856 %R 10.2196/10148 %U http://www.jmir.org/2018/6/e10148/ %U https://doi.org/10.2196/10148 %U http://www.ncbi.nlm.nih.gov/pubmed/29945856 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 5 %N 2 %P e32 %T Ethical Issues for Direct-to-Consumer Digital Psychotherapy Apps: Addressing Accountability, Data Protection, and Consent %A Martinez-Martin,Nicole %A Kreitmair,Karola %+ Stanford Center for Biomedical Ethics, 1215 Welch Road, Stanford, CA, 94305, United States, 1 650 723 5760, nicolemz@stanford.edu %K ethics %K ethical issues %K mental health %K technology %K telemedicine %K mHealth %K psychotherapy %D 2018 %7 23.04.2018 %9 Viewpoint %J JMIR Ment Health %G English %X This paper focuses on the ethical challenges presented by direct-to-consumer (DTC) digital psychotherapy services that do not involve oversight by a professional mental health provider. DTC digital psychotherapy services can potentially assist in improving access to mental health care for the many people who would otherwise not have the resources or ability to connect with a therapist. However, the lack of adequate regulation in this area exacerbates concerns over how safety, privacy, accountability, and other ethical obligations to protect an individual in therapy are addressed within these services. In the traditional therapeutic relationship, there are ethical obligations that serve to protect the interests of the client and provide warnings. In contrast, in a DTC therapy app, there are no clear lines of accountability or associated ethical obligations to protect the user seeking mental health services. The types of DTC services that present ethical challenges include apps that use a digital platform to connect users to minimally trained nonprofessional counselors, as well as services that provide counseling steered by artificial intelligence and conversational agents. There is a need for adequate oversight of DTC nonprofessional psychotherapy services and additional empirical research to inform policy that will provide protection to the consumer. %M 29685865 %R 10.2196/mental.9423 %U http://mental.jmir.org/2018/2/e32/ %U https://doi.org/10.2196/mental.9423 %U http://www.ncbi.nlm.nih.gov/pubmed/29685865 %0 Journal Article %@ 2369-6893 %I JMIR Publications %V 3 %N 1 %P e37 %T Feasibility of an Automated System Counselor for Survivors of Sexual Assault %A Howe,Esther %A Pedrelli,Paola %A Morris,Robert %A Nyer,Maren %A Mischoulon,David %A Picard,Rosalind %+ Department of Psychiatry, Massachusetts General Hospital, 6th Floor, 1 Bowdoin Square, Boston, MA,, United States, 1 6176437690, ehowe3@mgh.harvard.edu %K CBT %K web chat %D 2017 %7 22.09.2017 %9 Abstract %J iproc %G English %X Background: Sexual assault (SA) is common and costly to individuals and society, and increases risk of mental health disorders. Stigma and cost of care discourage survivors from seeking help. Norms profiling survivors as heterosexual, cisgendered women dissuade LGBTQIA+ individuals and men from accessing care. Because individuals prefer disclosing sensitive information online rather than in-person, online systems—like instant messaging and chatbots—for counseling may bypass concerns about stigma. These systems’ anonymity may increase disclosure and decrease impression management, the process by which individuals attempt to influence others’ perceptions. Their low cost may expand reach of care. There are no known evidence-based chat platforms for SA survivors. Objective: To examine feasibility of a chat platform with peer and automated system (chatbot) counseling interfaces to provide cognitive reappraisals (a cognitive behavioral therapy technique) to survivors. Methods: Participants are English-speaking, US-based survivors, 18+ years old. Participants are told they will be randomized to chat with a peer or automated system counselor 5 times over 2 weeks. In reality, all participants chat with a peer counselor. Chats employ a modified-for-context evidence-based cognitive reappraisal script developed by Koko, a company offering support services for emotional distress via social networks. At baseline, participants indicate counselor type preference and complete a basic demographic form, the Brief Fear of Negative Evaluation Scale, and self-disclosure items from the International Personality Item Pool. After 5 chats, participants complete questions from the Client Satisfaction Questionnaire (CSQ), Self-Reported Attitudes Toward Agent, and the Working Alliance Inventory. Hypotheses: 1) Online chatting and automated systems will be acceptable and feasible means of delivering cognitive reappraisals to survivors. 2) High impression management (IM≥25) and low self-disclosure (SD≤45) will be associated with preference for an automated system. 3) IM and SD will separately moderate the relationship between counselor assignment and participant satisfaction. Results: Ten participants have completed the study. Recruitment is ongoing. We will enroll 50+ participants by 10/2017 and outline findings at the Connected Health Conference. To date, 70% of participants completed all chats within 24 hours of enrollment, and 60% indicated a pre-chat preference for an automated system, suggesting acceptability of the concept. The post-chat CSQ mean total score of 3.98 on a 5-point Likert scale (1=Poor; 5=Excellent) suggests platform acceptability. Of the 50% reporting high IM, 60% indicated preference for an automated system. Of the 30% reporting low SD, 33% reported preference for an automated system. At recruitment completion, ANOVA analyses will elucidate relationships between IM, SD, and counselor assignment. Correlation and linear regression analyses will show any moderating effect of IM and SD on the relationship between counselor assignment and participant satisfaction. Conclusions: Preliminary results suggest acceptability and feasibility of cognitive reappraisals via chat for survivors, and of the automated system counselor concept. Final results will explore relationships between SD, IM, counselor type, and participant satisfaction to inform the development of new platforms for survivors. %R 10.2196/iproc.8585 %U http://www.iproc.org/2017/1/e37/ %U https://doi.org/10.2196/iproc.8585 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 19 %N 4 %P e122 %T Impact of Information and Communication Technologies on Nursing Care: Results of an Overview of Systematic Reviews %A Rouleau,Geneviève %A Gagnon,Marie-Pierre %A Côté,José %A Payne-Gagnon,Julie %A Hudson,Emilie %A Dubois,Carl-Ardy %+ Faculty of Nursing Sciences, Université Laval, Pavillon Ferdinand-Vandry, 1050 Avenue de la Médecine, Quebec, QC, G1V 0A6, Canada, 1 418 525 4444 ext 53169, marie-pierre.gagnon@fsi.ulaval.ca %K information and communication technology %K eHealth %K telehealth %K nursing care %K review, overview of systematic review %D 2017 %7 25.04.2017 %9 Original Paper %J J Med Internet Res %G English %X Background: Information and communication technologies (ICTs) are becoming an impetus for quality health care delivery by nurses. The use of ICTs by nurses can impact their practice, modifying the ways in which they plan, provide, document, and review clinical care. Objective: An overview of systematic reviews was conducted to develop a broad picture of the dimensions and indicators of nursing care that have the potential to be influenced by the use of ICTs. Methods: Quantitative, mixed-method, and qualitative reviews that aimed to evaluate the influence of four eHealth domains (eg, management, computerized decision support systems [CDSSs], communication, and information systems) on nursing care were included. We used the nursing care performance framework (NCPF) as an extraction grid and analytical tool. This model illustrates how the interplay between nursing resources and the nursing services can produce changes in patient conditions. The primary outcomes included nurses’ practice environment, nursing processes, professional satisfaction, and nursing-sensitive outcomes. The secondary outcomes included satisfaction or dissatisfaction with ICTs according to nurses’ and patients’ perspectives. Reviews published in English, French, or Spanish from January 1, 1995 to January 15, 2015, were considered. Results: A total of 5515 titles or abstracts were assessed for eligibility and full-text papers of 72 articles were retrieved for detailed evaluation. It was found that 22 reviews published between 2002 and 2015 met the eligibility criteria. Many nursing care themes (ie, indicators) were influenced by the use of ICTs, including time management; time spent on patient care; documentation time; information quality and access; quality of documentation; knowledge updating and utilization; nurse autonomy; intra and interprofessional collaboration; nurses’ competencies and skills; nurse-patient relationship; assessment, care planning, and evaluation; teaching of patients and families; communication and care coordination; perspectives of the quality of care provided; nurses and patients satisfaction or dissatisfaction with ICTs; patient comfort and quality of life related to care; empowerment; and functional status. Conclusions: The findings led to the identification of 19 indicators related to nursing care that are impacted by the use of ICTs. To the best of our knowledge, this was the first attempt to apply NCPF in the ICTs’ context. This broad representation could be kept in mind when it will be the time to plan and to implement emerging ICTs in health care settings. Trial Registration: PROSPERO International Prospective Register of Systematic Reviews: CRD42014014762; http://www.crd.york.ac.uk/PROSPERO/display_record.asp?ID=CRD42014014762 (Archived by WebCite at http://www.webcitation.org/6pIhMLBZh) %M 28442454 %R 10.2196/jmir.6686 %U http://www.jmir.org/2017/4/e122/ %U https://doi.org/10.2196/jmir.6686 %U http://www.ncbi.nlm.nih.gov/pubmed/28442454